Backup disk (SMR) in standby detaches randomly (READ_FPDMA_QUEUED)

systemofapwne · Dec 31, 2022

Hello,

on my TrueNAS box, I use one single consumer SMR* drive as an on-site backup/archive (Seagate ST8000DM004). The disk is brand-new and shows no signs of degradation so far.
Every night, the data is replicated from the main pool to the backup pool, living on that disk. The majority of the time, the disk is in standby (spins down after 5 min of idle). This works flawless so far.
The problem now is, that the disk randomly detaches from the HBA due to SATA timeouts.

Code:

Dec 31 04:00:16 truenas ahcich3: Timeout on slot 19 port 0
Dec 31 04:00:16 truenas ahcich3: is 00000000 cs 00000000 ss 00080000 rs 00080000 tfd 40 serr 00000000 cmd 0060d317
Dec 31 04:00:16 truenas (ada3:ahcich3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 d8 85 31 40 7a 00 00 00 00 00
Dec 31 04:00:16 truenas (ada3:ahcich3:0:0:0): CAM status: Command timeout
Dec 31 04:00:16 truenas (ada3:ahcich3:0:0:0): Retrying command, 3 more tries remain
Dec 31 04:10:16 truenas ahcich3: Timeout on slot 16 port 0
Dec 31 04:10:16 truenas ahcich3: is 00000000 cs 00000000 ss 00010000 rs 00010000 tfd 40 serr 00000000 cmd 0060d017
Dec 31 04:10:16 truenas (ada3:ahcich3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 f8 b9 50 40 52 00 00 00 00 00
Dec 31 04:10:16 truenas (ada3:ahcich3:0:0:0): CAM status: Command timeout
Dec 31 04:10:16 truenas (ada3:ahcich3:0:0:0): Retrying command, 3 more tries remain
Dec 31 04:20:17 truenas ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080)
Dec 31 04:20:17 truenas ahcich3: Timeout on slot 25 port 0
Dec 31 04:20:17 truenas ahcich3: is 00000000 cs 00000000 ss 02000000 rs 02000000 tfd 40 serr 00000000 cmd 0060d917
Dec 31 04:20:17 truenas (ada3:ahcich3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 d8 85 31 40 7a 00 00 00 00 00
Dec 31 04:20:17 truenas (ada3:ahcich3:0:0:0): CAM status: Command timeout
Dec 31 04:20:17 truenas (ada3:ahcich3:0:0:0): Retrying command, 3 more tries remain
Dec 31 04:21:18 truenas ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080)
Dec 31 04:21:18 truenas ahcich3: Timeout on slot 26 port 0
Dec 31 04:21:18 truenas ahcich3: is 00000000 cs 04000000 ss 00000000 rs 04000000 tfd 00 serr 00000000 cmd 0060da17
Dec 31 04:21:18 truenas (aprobe0:ahcich3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
Dec 31 04:21:18 truenas (aprobe0:ahcich3:0:0:0): CAM status: Command timeout
Dec 31 04:21:18 truenas (aprobe0:ahcich3:0:0:0): Retrying command, 0 more tries remain
Dec 31 04:30:17 truenas ahcich3: Timeout on slot 3 port 0
Dec 31 04:30:17 truenas ahcich3: is 00000000 cs 00000000 ss 00000008 rs 00000008 tfd 40 serr 00000000 cmd 0060c317
Dec 31 04:30:17 truenas (ada3:ahcich3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 d8 85 31 40 7a 00 00 00 00 00
Dec 31 04:30:17 truenas (ada3:ahcich3:0:0:0): CAM status: Command timeout
Dec 31 04:30:17 truenas (ada3:ahcich3:0:0:0): Retrying command, 3 more tries remain
Dec 31 04:31:18 truenas ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080)
Dec 31 04:31:18 truenas ahcich3: Timeout on slot 4 port 0
Dec 31 04:31:18 truenas ahcich3: is 00000000 cs 00000010 ss 00000000 rs 00000010 tfd 00 serr 00000000 cmd 0060c417
Dec 31 04:31:18 truenas (aprobe0:ahcich3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
Dec 31 04:31:18 truenas (aprobe0:ahcich3:0:0:0): CAM status: Command timeout
Dec 31 04:31:18 truenas (aprobe0:ahcich3:0:0:0): Retrying command, 0 more tries remain
Dec 31 04:40:17 truenas ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080)
Dec 31 04:40:17 truenas ahcich3: Timeout on slot 13 port 0
Dec 31 04:40:17 truenas ahcich3: is 00000000 cs 00000000 ss 00002000 rs 00002000 tfd 40 serr 00000000 cmd 0060cd17
Dec 31 04:40:17 truenas (ada3:ahcich3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 d8 85 31 40 7a 00 00 00 00 00
Dec 31 04:40:17 truenas (ada3:ahcich3:0:0:0): CAM status: Command timeout
Dec 31 04:40:17 truenas (ada3:ahcich3:0:0:0): Retrying command, 3 more tries remain
Dec 31 04:41:18 truenas ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080)
Dec 31 04:41:18 truenas ahcich3: Timeout on slot 14 port 0
Dec 31 04:41:18 truenas ahcich3: is 00000000 cs 00004000 ss 00000000 rs 00004000 tfd 00 serr 00000000 cmd 0060ce17
Dec 31 04:41:18 truenas (aprobe0:ahcich3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
Dec 31 04:41:18 truenas (aprobe0:ahcich3:0:0:0): CAM status: Command timeout
Dec 31 04:41:18 truenas (aprobe0:ahcich3:0:0:0): Retrying command, 0 more tries remain
Dec 31 04:42:19 truenas ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080)
Dec 31 04:42:19 truenas ahcich3: Timeout on slot 15 port 0
Dec 31 04:42:19 truenas ahcich3: is 00000000 cs 00008000 ss 00000000 rs 00008000 tfd 00 serr 00000000 cmd 0060cf17
Dec 31 04:42:19 truenas (aprobe0:ahcich3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
Dec 31 04:42:19 truenas (aprobe0:ahcich3:0:0:0): CAM status: Command timeout
Dec 31 04:42:19 truenas (aprobe0:ahcich3:0:0:0): Error 5, Retries exhausted
Dec 31 04:43:20 truenas ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080)
Dec 31 04:43:20 truenas ahcich3: Timeout on slot 16 port 0
Dec 31 04:43:20 truenas ahcich3: is 00000000 cs 00010000 ss 00000000 rs 00010000 tfd 00 serr 00000000 cmd 0060d017
Dec 31 04:43:20 truenas (aprobe0:ahcich3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
Dec 31 04:43:20 truenas (aprobe0:ahcich3:0:0:0): CAM status: Command timeout
Dec 31 04:43:20 truenas (aprobe0:ahcich3:0:0:0): Error 5, Retry was blocked
Dec 31 04:43:20 truenas ada3 at ahcich3 bus 0 scbus3 target 0 lun 0
Dec 31 04:43:20 truenas ada3: <ST8000DM004-2U9188 0001> s/n WSC0LZHJ detached
Dec 31 04:44:00 truenas ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080)
Dec 31 04:44:00 truenas (ada3:ahcich3:0:0:0): Periph destroyed
Dec 31 04:44:00 truenas ada3 at ahcich3 bus 0 scbus3 target 0 lun 0
Dec 31 04:44:00 truenas ada3: <ST8000DM004-2U9188 0001> ACS-3 ATA SATA 3.x device
Dec 31 04:44:00 truenas ada3: Serial Number WSC0LZHJ
Dec 31 04:44:00 truenas ada3: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
Dec 31 04:44:00 truenas ada3: Command Queueing enabled
Dec 31 04:44:00 truenas ada3: 7630885MB (15628053168 512 byte sectors)
Dec 31 04:44:00 truenas ada3: quirks=0x1<4K>
Dec 31 04:44:00 truenas Solaris: WARNING: Pool 'Backup' has encountered an uncorrectable I/O failure and has been suspended.

Even though this log started almost exactly at 4:00 in the night before the disk has been detached, there is no task** or whatsoever at that time (to my knowledge), that could have triggered it. The previous time this happened, it was at 2:07 (also in the night).

I was able to re-attach the disk and getting the pool running again by un- & re-plugging the disk and running zpool clear Backup.
The pool now shows errors, which vanish after running a scrub twice with always the same result: scrub repaired 0B in 05:39:57 with 0 errors

Here is the current state after todays incident:

Code:

#zpool status -v Backup
  pool: Backup
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 05:39:57 with 0 errors on Wed Dec 28 02:35:22 2022
config:

        NAME                                          STATE     READ WRITE CKSUM
        Backup                                        ONLINE       0     0     0
          gptid/fe817121-7be9-11ed-a0fd-977792743385  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        <metadata>:<0x0>
        <metadata>:<0x1>
        <metadata>:<0x3e>
        <metadata>:<0x51>
        Backup/Server/vm/Debian:<0x0>

What could be the reason for that behavior? How to fix it? And how to clear the "Permanent errors" (which aren't real) without scrubbing the disk twice?

Here is a SMART log right after performing a short self-test.

Code:

#smartctl -s on -a /dev/ada3      
smartctl 7.2 2021-09-14 r5236 [FreeBSD 13.1-RELEASE-p2 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate BarraCuda 3.5 (SMR)
Device Model:     ST8000DM004-2U9188
Serial Number:    WSC0LZHJ
LU WWN Device Id: 5 000c50 0dfe51df6
Add. Product Id:  DELL(tm)
Firmware Version: 0001
User Capacity:    8,001,563,222,016 bytes [8.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Dec 31 17:00:39 2022 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF ENABLE/DISABLE COMMANDS SECTION ===
SMART Enabled.

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x73) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 974) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x30a5) SCT Status supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   073   070   006    Pre-fail  Always       -       20939536
  3 Spin_Up_Time            0x0003   081   081   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       17
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   100   253   045    Pre-fail  Always       -       32657
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       353h+08m+52.522s
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       18
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       2 2 2
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   048   044   040    Old_age   Always       -       52 (Min/Max 45/52)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       8
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       115
194 Temperature_Celsius     0x0022   052   056   000    Old_age   Always       -       52 (0 20 0 0 0)
195 Hardware_ECC_Recovered  0x001a   073   070   000    Old_age   Always       -       20939536
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       26h+01m+33.183s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       5786563929
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       16949118876

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%       353         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

*: I know, that SMR is bad for ZFS, especially when resilvering. That drive is not in an active pool and is just for archiving and works absolutely fine.
**: I disabled SMART on that disk vie the TrueNAS UI, since I noticed that on a different spun-down disk, the drive would wake whenever it gets queried for its temperature. A SMART self-check is anway supposed to run once a week.

joeschmuck · Dec 31, 2022

systemofapwne said:
This works flawless so far.

I am curious how long this worked flawlessly. It's a serious question and here is where I'm going with it.

You have an SMR drive and you have a ZFS pool created on that 8TB SMR drive (I will assume you have just short of 8TB of storage for argument sake). You are writing data to the SMR drive, could be small chunks of data every night. The SMR drive does not rewrite any data at all, it write the new data in a nice clean available spot and if it is data replacing older data, the older data is just marked as unused space now. This continues to happen until the drive has actually written about 8TB of data. Now the drive is forced into rewriting the shingled layers of data throughout the drive as new data arrives. I suspect you have hit this point and as far as I know (I could be very wrong) this is what ZFS doesn't like. It takes lots of extra time to rewrite all this data. Could it be taking 31 seconds? I think the answer is Yes it could be and that is why the drive isn't ready, because it's very busy.

The only way to prove this is to reformat the drive entirely and zero it out so it's like a new drive and then see if the problem goes away. You could also just run the backup when you know you have a lot of data to push, and see if the drive is active before it falls off.

This is just a theory and I could be very wrong, I don't use SMR drives in general except for Windoze file copies, definitely not for ZFS.

systemofapwne · Dec 31, 2022

joeschmuck said:
I am curious how long this worked flawlessly. It's a serious question and here is where I'm going with it.

You have an SMR drive and you have a ZFS pool created on that 8TB SMR drive (I will assume you have just short of 8TB of storage for argument sake). You are writing data to the SMR drive, could be small chunks of data every night. The SMR drive does not rewrite any data at all, it write the new data in a nice clean available spot and if it is data replacing older data, the older data is just marked as unused space now. This continues to happen until the drive has actually written about 8TB of data. Now the drive is forced into rewriting the shingled layers of data throughout the drive as new data arrives. I suspect you have hit this point and as far as I know (I could be very wrong) this is what ZFS doesn't like. It takes lots of extra time to rewrite all this data. Could it be taking 31 seconds? I think the answer is Yes it could be and that is why the drive isn't ready, because it's very busy.

The only way to prove this is to reformat the drive entirely and zero it out so it's like a new drive and then see if the problem goes away. You could also just run the backup when you know you have a lot of data to push, and see if the drive is active before it falls off.

This is just a theory and I could be very wrong, I don't use SMR drives in general except for Windoze file copies, definitely not for ZFS.

As I totally agree on your point against SMR drives (I'm also against them for active/in production in mirrored Pools due to obvious reasons).

For your argument for how the SMR drive works: Yes, data generally is first written to buffer tracks, that are not shingled and then, at some point, rewritten to SMR tracks, once the drive is "idle" - Fully opaque for the OS here. But this, contrarily to your statement, happens way before the full capacity (8 TB here) is written.
Once this happens, any new data to the drive might need to wait a bit but is accepted in the drives cache and old & new data now will be written to SMR tracks directly (which will make the drive significantly slower). However this generally should not (and I emphasize on should not) cause the drive to be unresponsive for 30s and above without allowing any other query (here a read query) to be accepted. I might be wrong here though.
Anyhow: This specialty of SMR drives should not be of any concern here, since the drive is spun down when this happens*

To the format argument: The drive basically was "all zero" before I added it about a few weeks ago and I intentionally bought an 8 TB drive for backing up my 4 TB main Pool. So it is (at least) 4 TB (if not more) "still zero"/empty - Enough buffer tracks IMHO.

*Sidenote: The drive spins down after 5 minute of noops. I wonder if "activity" on the drive, which keeps it active, is counted even when the drive is self-managing its SMR tracks. If not, increasing the spin-down timer to lets say 30min - 1 h might do the trick (depending on the data replicated to it on a daily basis). That should then give the drive enough time to rewrite buffer tracks to SMR tracks before it spins down. I will try this and report back whether this happens or does not happen again.

Arwen · Dec 31, 2022

I have a Seagate 8TB Archive drive used with ZFS, for backups. I connect it, run a scrub and then perform a backup. It works fine, though slow because of the SMR thing.

However, I HAVE seen slow downs lasting more than 1 minute. I am not sure what is going on, (other than SMR re-writes), but the writer program is paused. Now in my case, nothing really bad happens, after the disk becomes usable again, the backup continues as normal. Because of the success of the scrub & backup, I did not investigate the slow down further. So I am not absolutely certain it was 1 minute.

joeschmuck · Dec 31, 2022

I knew @Arwen would chime in. I recall you stating you used an SMR drive without any problems a few months ago. Could have be 6 months ago, time flys.

ChrisRJ · Jan 1, 2023

systemofapwne said:
on my TrueNAS box, I use one single consumer SMR* drive as an on-site backup/archive (Seagate ST8000DM004).

Without more details I do not see how this setup actually provides safety to your data. Can you provide some more information here?

systemofapwne said:
The disk is brand-new and shows no signs of degradation so far.

The temperature of 52 C is too high, esp. if the lower threshold is 45 C in addition. It may be within the specification, but you will statistically face a degraded lifetime. My disks are running at 31/32 C when idle and do not exceed 35 C under full load. Whether you want to go that far can certainly be debated ;-). But 52 C is a risk in my view.

systemofapwne said:
The problem now is, that the disk randomly detaches from the HBA due to SATA timeouts.

Can you connect to a normal SATA port on the motherboard to see whether this changes anything?

systemofapwne · Jan 1, 2023

ChrisRJ said:
Without more details I do not see how this setup actually provides safety to your data. Can you provide some more information here?

The temperature of 52 C is too high, esp. if the lower threshold is 45 C in addition. It may be within the specification, but you will statistically face a degraded lifetime. My disks are running at 31/32 C when idle and do not exceed 35 C under full load. Whether you want to go that far can certainly be debated ;-). But 52 C is a risk in my view.

Can you connect to a normal SATA port on the motherboard to see whether this changes anything?

To question 1)
A more detailed setup of my system can be found in my signature. Basically I have one main pool with two mirrored CMR NAS drives. The pool is constantly written to (mainly VMs) by a few GB a day. The snapshots are then replicated to an offsite NAS (with similar redundancy) and also to an onsite backup pool: The single SMR drive in question. That drive is offline 23+h a day and only spins up at night for the replication. Consider this drive as a disaster-recovery drive with quick access to it.

To the temperature: Yes, it is a bit high indeed. Possibly due to the drive-bay not having enough air circulation. Thanks for pointing this out. I will add an additional fan here. All other drives are chilling around 40-45 °C.

To Question 2)
The drive is directly connected via SATA to the motherboard already.

Important Announcement for the TrueNAS Community.

Backup disk (SMR) in standby detaches randomly (READ_FPDMA_QUEUED)

systemofapwne

Dabbler

joeschmuck

Old Man

systemofapwne

Dabbler

Arwen

MVP

joeschmuck

Old Man

ChrisRJ

Wizard

systemofapwne

Dabbler

Similar threads