Command timeouts on Seagate Ironwolf 110 SSD

ndemarco

Dabbler
Joined
May 10, 2012
Messages
11
Hello all,

I've built up a new FreeNAS box from an ASRock Rack C3758D4I-4L board. This board has quite a few onboard SATA ports.
  • Motherboard ASRock C3758D4i-4L
  • CPU: Intel Atom C3758
  • RAM: 2x 32 GB
  • Hard drives: 2x Seagate Ironwolf 110 SSDs, 1.92T each
  • Hard disk controllers: Onboard first (causing troubles), then LSI-9211 (works)
  • Network cards: 4x onboard Marvell, appearing as Intel
  • Power supply is a PC Power & Cooling 500W (overkill, but taken from another box).
No matter which onboard SATA port I use, it won't communicate properly with Seagate Ironwolf 110 SSDs. The drives work perfectly via a LSI HBA. Samsung 840 EVOs do work perfectly via the onboard SATA ports. The motherboard has only one PCIe slot, which I plan to use for NVMe drives.

The drive firmware is the latest per Seagate.

I've moved the drives a few times while troubleshooting, so the devices won't line up between 'camcontrol' and the messages thrown when FreeNAS is timing out.

Code:
freenas# camcontrol devlist
<ATA ZA1920NM10001 011J>           at scbus0 target 25 lun 0 (pass0,da0)
<ATA ZA1920NM10001 011J>           at scbus0 target 26 lun 0 (pass1,da1)
<ADATA ISMS331-016GMV P0831A>      at scbus1 target 0 lun 0 (ada0,pass2)
<AHCI SGPIO Enclosure 2.00 0001>   at scbus5 target 0 lun 0 (ses0,pass3)
<Samsung SSD 840 EVO 250GB EXT0DB6Q>  at scbus6 target 0 lun 0 (pass5,ada1)
<Samsung SSD 840 EVO 250GB EXT0BB6Q>  at scbus7 target 0 lun 0 (pass6,ada2)
<AHCI SGPIO Enclosure 2.00 0001>   at scbus11 target 0 lun 0 (ses1,pass4)


Code:
freenas# smartctl -x /dev/da0
smartctl 7.0 2018-12-30 r4883 [FreeBSD 12.1-STABLE amd64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate IronWolf 110 SATA SSD
Device Model:     ZA1920NM10001
Serial Number:    HKS01KQ0
LU WWN Device Id: 5 000c50 03ea14015
Firmware Version: SF44011J
User Capacity:    1,920,383,410,176 bytes [1.92 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    Solid State Device
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-4, ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Mar 22 22:26:11 2020 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM level is:     1 (minimum power consumption with standby)
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Disabled
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (    0) seconds.
Offline data collection
capabilities:              (0x59) SMART execute Offline immediate.
                    No Auto Offline data collection support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0002)    Does not save SMART data before
                    entering power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   1) minutes.
Extended self-test routine
recommended polling time:      (  30) minutes.
SCT capabilities:            (0x103d)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 11
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   100   100   090    -    0
  5 Reallocated_Sector_Ct   -O--CK   100   100   000    -    0
  9 Power_On_Hours          -O--CK   100   100   000    -    88
12 Power_Cycle_Count       -O--CK   100   100   000    -    23
100 Flash_GB_Erased         -O--CK   100   100   000    -    11
102 Lifetime_PS4_Entry_Ct   -O--CK   100   100   000    -    13
103 Lifetime_PS3_Exit_Ct    -O--CK   100   100   000    -    9
170 Grown_Bad_Block_Ct      -O--CK   100   100   000    -    0
171 Program_Fail_Count      -O--CK   100   100   000    -    0
172 Erase_Fail_Count        -O--CK   100   100   000    -    0
173 Avg_Program/Erase_Ct    -O--CK   100   100   000    -    1
174 Unexpected_Pwr_Loss_Ct  -O--CK   100   100   000    -    19
177 Wear_Range_Delta        PO---K   100   100   089    -    0 0 0
183 SATA_Downshift_Count    -O--CK   100   100   000    -    0x00000000000000
187 Uncorrectable_ECC_Ct    -O--CK   100   100   000    -    0
194 Temperature_Celsius     -O---K   030   049   000    -    30 (Min/Max 23/49)
195 RAISE_ECC_Cor_Ct        -O--CK   100   100   000    -    0
198 Uncor_Read_Error_Ct     -O--CK   100   100   000    -    0
199 UDMA_CRC_Error_Count    -O--CK   100   100   000    -    0
230 Drv_Life_Protect_Status PO---K   100   100   091    -    100
231 SSD_Life_Left           PO--CK   100   100   010    -    0x00000000646400
232 Available_Reservd_Space POS--K   100   100   003    -    0
233 Lifetime_Wts_To_Flsh_GB -O--CK   100   100   000    -    14
241 Lifetime_Wts_Frm_Hst_GB -O--CK   100   100   000    -    39
242 Lifetime_Rds_Frm_Hst_GB -O--CK   100   100   000    -    0
243 Free_Space              -OS--K   100   100   003    -    0x07270200218b89
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x02           SL  R/O     16  Comprehensive SMART error log
0x03       GPL     R/O     20  Ext. Comprehensive SMART error log
0x04       GPL,SL  R/O      8  Device Statistics log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x09           SL  R/W      1  Selective self-test log
0x0a       GPL     R/W     16  Device Statistics Notification
0x0c       GPL     R/O      1  Pending Defects log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x12       GPL     R/O      1  SATA NCQ Non-Data log
0x13       GPL     R/O      1  SATA NCQ Send and Receive log
0x24       GPL     R/O  65535  Current Device Internal Status Data log
0x25       GPL     R/O  65535  Saved Device Internal Status Data log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa8           SL  VS     255  Device vendor specific log
0xb7       GPL     VS    1024  Device vendor specific log
0xd4       GPL,SL  VS       6  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer
0xf7           SL  -        2  Reserved
0xf8           SL  -        1  Reserved
0xf9           SL  -        4  Reserved
0xfa           SL  -        7  Reserved
0xfb       GPL     -    65535  Reserved

SMART Extended Comprehensive Error Log Version: 1 (20 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       0 (0x0000)
Device State:                        Active (0)
Current Temperature:                    33 Celsius
Power Cycle Min/Max Temperature:     29/35 Celsius
Lifetime    Min/Max Temperature:     23/49 Celsius
Specified Max Operating Temperature:   116 Celsius
Under/Over Temperature Limit Count:   0/0
SMART Status:                        0xc24f (PASSED)

SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:     -10/116 Celsius
Min/Max Temperature Limit:           -10/120 Celsius
Temperature History Size (Index):    478 (67)

Index    Estimated Time   Temperature Celsius
  68    2020-03-22 14:29    28  *********
...    ..(387 skipped).    ..  *********
456    2020-03-22 20:57    28  *********
457    2020-03-22 20:58    29  **********
...    ..( 10 skipped).    ..  **********
468    2020-03-22 21:09    29  **********
469    2020-03-22 21:10    30  ***********
...    ..(  7 skipped).    ..  ***********
477    2020-03-22 21:18    30  ***********
   0    2020-03-22 21:19     ?  -
   1    2020-03-22 21:20    30  ***********
...    ..( 65 skipped).    ..  ***********
  67    2020-03-22 22:26    30  ***********

SCT Error Recovery Control:
           Read: Disabled
          Write: Disabled

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 1) ==
0x01  0x008  4              23  ---  Lifetime Power-On Resets
0x01  0x010  4              88  ---  Power-on Hours
0x01  0x018  6        82842176  ---  Logical Sectors Written
0x01  0x020  6         3160294  ---  Number of Write Commands
0x01  0x028  6          103285  ---  Logical Sectors Read
0x01  0x030  6            4516  ---  Number of Read Commands
0x01  0x038  6       317265966  ---  Date and Time TimeStamp
0x01  0x058  2           65447  ---  Resource Availability
0x04  =====  =               =  ===  == General Errors Statistics (rev 1) ==
0x04  0x008  4               0  ---  Number of Reported Uncorrectable Errors
0x04  0x010  4               0  ---  Resets Between Cmd Acceptance and Completion
0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==
0x05  0x008  1              30  ---  Current Temperature
0x05  0x010  1              28  ---  Average Short Term Temperature
0x05  0x018  1               -  ---  Average Long Term Temperature
0x05  0x020  1              49  ---  Highest Temperature
0x05  0x028  1              23  ---  Lowest Temperature
0x05  0x030  1              32  ---  Highest Average Short Term Temperature
0x05  0x038  1              26  ---  Lowest Average Short Term Temperature
0x05  0x040  1               -  ---  Highest Average Long Term Temperature
0x05  0x048  1               -  ---  Lowest Average Long Term Temperature
0x05  0x050  4               0  ---  Time in Over-Temperature
0x05  0x058  1             116  ---  Specified Maximum Operating Temperature
0x05  0x060  4               0  ---  Time in Under-Temperature
0x05  0x068  1             -10  ---  Specified Minimum Operating Temperature
0x06  =====  =               =  ===  == Transport Statistics (rev 1) ==
0x06  0x008  4             106  ---  Number of Hardware Resets
0x06  0x010  4              97  ---  Number of ASR Events
0x06  0x018  4               0  ---  Number of Interface CRC Errors
0x07  =====  =               =  ===  == Solid State Device Statistics (rev 1) ==
0x07  0x008  1               0  ---  Percentage Used Endurance Indicator
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

Pending Defects log (GP Log 0x0c)
No Defects Logged

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  2            2  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2            3  Device-to-host register FISes sent due to a COMRESET
0x000f  2            0  R_ERR response for host-to-device data FIS, CRC
0x0010  2            0  R_ERR response for host-to-device data FIS, non-CRC
0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC
0x0013  2            0  R_ERR response for host-to-device non-data FIS, non-CRC
0x0002  2            0  R_ERR response for data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x000b  2            0  CRC errors within host-to-device FIS
0x000d  2            0  Non-CRC errors within host-to-device FIS


Code:
Mar 18 20:07:16 freenas ZFS: vdev state changed, pool_guid=2550807933382894929 vdev_guid=14584055837392843541
Mar 18 20:07:16 freenas ZFS: vdev state changed, pool_guid=2550807933382894929 vdev_guid=348865377422514070
Mar 18 20:07:46 freenas ahcich12: Timeout on slot 8 port 0
Mar 18 20:07:46 freenas ahcich12: is 00000000 cs 00000100 ss 00000100 rs 00000100 tfd 441 serr 00000000 cmd 00044717
Mar 18 20:07:46 freenas (ada4:ahcich12:0:0:0): SEND_FPDMA_QUEUED DATA SET MANAGEMENT. ACB: 64 08 00 00 00 40 00 00 00 00 00 00
Mar 18 20:07:46 freenas (ada4:ahcich12:0:0:0): CAM status: Command timeout
Mar 18 20:07:46 freenas (ada4:ahcich12:0:0:0): Retrying command
Mar 18 20:08:16 freenas ahcich12: Timeout on slot 18 port 0
Mar 18 20:08:16 freenas ahcich12: is 00000000 cs 00040000 ss 00040000 rs 00040000 tfd 441 serr 00000000 cmd 00045117
Mar 18 20:08:16 freenas (ada4:ahcich12:0:0:0): SEND_FPDMA_QUEUED DATA SET MANAGEMENT. ACB: 64 08 00 00 00 40 00 00 00 00 00 00
Mar 18 20:08:16 freenas (ada4:ahcich12:0:0:0): CAM status: Command timeout
Mar 18 20:08:16 freenas (ada4:ahcich12:0:0:0): Retrying command
Mar 18 20:08:46 freenas collectd[1666]: Traceback (most recent call last):
  File "/usr/local/lib/collectd_pyplugins/disktemp.py", line 66, in read
    temperatures = c.call('disk.temperatures', self.disks, self.powermode, self.smartctl_args)
  File "/usr/local/lib/collectd_pyplugins/disktemp.py", line 66, in read
    temperatures = c.call('disk.temperatures', self.disks, self.powermode, self.smartctl_args)
  File "/usr/local/lib/python3.7/site-packages/middlewared/client/client.py", line 500, in call
    raise CallTimeout("Call timeout")
middlewared.client.client.CallTimeout: Call timeout
Mar 18 20:09:17 freenas ahcich12: Timeout on slot 29 port 0
Mar 18 20:09:17 freenas ahcich12: is 00000000 cs 20000000 ss 00000000 rs 20000000 tfd 441 serr 00000000 cmd 00045c17
Mar 18 20:09:47 freenas ahcich12: Timeout on slot 8 port 0
Mar 18 20:09:47 freenas ahcich12: is 00000000 cs 00000100 ss 00000100 rs 00000100 tfd 441 serr 00000000 cmd 00044717
Mar 18 20:09:47 freenas (ada4:ahcich12:0:0:0): SEND_FPDMA_QUEUED DATA SET MANAGEMENT. ACB: 64 08 00 00 00 40 00 00 00 00 00 00
Mar 18 20:09:47 freenas (ada4:ahcich12:0:0:0): CAM status: Command timeout
Mar 18 20:09:47 freenas (ada4:ahcich12:0:0:0): Retrying command
Mar 18 20:10:17 freenas ahcich12: Timeout on slot 16 port 0
Mar 18 20:10:17 freenas ahcich12: is 00000000 cs 00010000 ss 00010000 rs 00010000 tfd 441 serr 00000000 cmd 00044f17
Mar 18 20:10:17 freenas (ada4:ahcich12:0:0:0): SEND_FPDMA_QUEUED DATA SET MANAGEMENT. ACB: 64 08 00 00 00 40 00 00 00 00 00 00
Mar 18 20:10:17 freenas (ada4:ahcich12:0:0:0): CAM status: Command timeout
Mar 18 20:10:17 freenas (ada4:ahcich12:0:0:0): Retrying command
Mar 18 20:10:47 freenas ahcich12: Timeout on slot 24 port 0
Mar 18 20:10:47 freenas ahcich12: is 00000000 cs 01000000 ss 01000000 rs 01000000 tfd 441 serr 00000000 cmd 00045717
Mar 18 20:10:47 freenas (ada4:ahcich12:0:0:0): SEND_FPDMA_QUEUED DATA SET MANAGEMENT. ACB: 64 08 00 00 00 40 00 00 00 00 00 00
Mar 18 20:10:47 freenas (ada4:ahcich12:0:0:0): CAM status: Command timeout
Mar 18 20:10:47 freenas (ada4:ahcich12:0:0:0): Retrying command
Mar 18 20:11:18 freenas ahcich12: Timeout on slot 0 port 0
Mar 18 20:11:18 freenas ahcich12: is 00000000 cs 00000001 ss 00000001 rs 00000001 tfd 441 serr 00000000 cmd 00045f17
Mar 18 20:11:18 freenas (ada4:ahcich12:0:0:0): SEND_FPDMA_QUEUED DATA SET MANAGEMENT. ACB: 64 08 00 00 00 40 00 00 00 00 00 00
Mar 18 20:11:18 freenas (ada4:ahcich12:0:0:0): CAM status: Command timeout
 
Last edited:

ndemarco

Dabbler
Joined
May 10, 2012
Messages
11
This is an issue with FreeNAS (or my configuration of devices within FreeNAS).


My test steps:
  1. Reconnected the two Ironwolf SSDs to the onboard SATA HBAs. (removed from the LSI HBA.)
  2. Boot the same machine to Ubuntu Live
  3. Import the Ironwolf SSD zpool.
  4. Write 100MB random data to the zpool
This test passed perfectly, without any complaints.

Code:
ubuntu@ubuntu:~$ cd /mnt
ubuntu@ubuntu:/mnt$ sudo mkdir IronWolf-110-1


Code:
root@ubuntu:/mnt# zpool import Practichem-v4 -f

root@ubuntu:/mnt# zpool status
  pool: Practichem-v4
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done, the pool may no longer be accessible by software that does not support the features. See zpool-features(5) for details.

  scan: none requested
config:
    NAME           STATE     READ WRITE CKSUM
    Practichem-v4  ONLINE       0     0     0
      mirror-0     ONLINE       0     0     0
        sda2       ONLINE       0     0     0
        sdb2       ONLINE       0     0     0
errors: No known data errors

root@ubuntu:/mnt# ls
IronWolf-110-1

root@ubuntu:/mnt# ll
total 0
drwxr-xr-x 1 root root  60 Mar 23 14:25 ./
drwxr-xr-x 1 root root 280 Mar 23 14:27 ../
drwxr-xr-x 2 root root  40 Mar 23 14:25 IronWolf-110-1/

root@ubuntu:/mnt# cd IronWolf-110-1/
root@ubuntu:/mnt/IronWolf-110-1# ll
total 104857600
0-drwxr-xr x 2 root root 40 Mar 23:14 ./
25-drwxr-xr x 1 root root 60 Mar 23:14 ../
25@root:/ubuntu/mnt-IronWolf-110# 1 mkdir

test@root:/ubuntu/mnt-IronWolf-110# 1
ls
test@root:/ubuntu/mnt-IronWolf-110# 1 dd=/if/dev urandom=of newfile=bs 1M=count
100+100 0 records
in+100 0 records
out bytes (105 MB, 100 MiB) copied, 2.07261 s, 50.6 MB/s

root@ubuntu:/mnt/IronWolf-110-1# ll
total 102400
drwxr-xr-x 3 root root        80 Mar 23 14:30 ./
drwxr-xr-x 1 root root        60 Mar 23 14:25 ../
-rw-r--r-- 1 root root 104857600 Mar 23 14:30 newfile
drwxr-xr-x 2 root root        40 Mar 23 14:29 test/

root@ubuntu:/mnt/IronWolf-110-1# udevadm info --query=all --name=/dev/sda | grep ID_SERIAL
E: ID_SERIAL=ZA1920NM10001_HKS01LDV
E: ID_SERIAL_SHORT=HKS01LDV
 

ndemarco

Dabbler
Joined
May 10, 2012
Messages
11
Because I experience the problem in FreeNAS, but not in Ubuntu, using the exact same configuration, I created an issue here.
 
Top