2 (of 7) SSDs, 1 error, '/dev/da0(&da2) [SAT], 1 Currently unreadable (pending) sectors'

SamM

Dabbler
Joined
May 29, 2017
Messages
39
The short version:
I did a long test on these drives, but got no obvious (to novice me...) errors. Perhaps I'm missing something and/or don't know how to read the results.

The longer version:
I recently built two identical FreeNAS servers.
-HP DL380e G8, LFF x12 on SAS2 backplane w/ dual connectors
-96GB RAM
-Currently six 4Tb HGST SATA drives (in the server itself). Soon to be replaced by all Seagate Exos 10Tb SATA drives.
-Currently a pair of HP/LSI9217-4i4e HBA's
- - Have tried previous cards, like the nearly identical 9207-4i4e, along with others including HP P822 in HBA mode.
- - Current setup on 9217 HBA's is that the internal port of both cards is connected to the server's SAS2 backplane. The external port of both cards is plugged into a HP D2700 SFF x25 SAS enclosure.
- - - This enclosure currently has 6 Crucial MX500 1000Gb SSD's, two of which have the "1 Currently unreadable (pending) sectors" message.
- - - When I tried using a single card with two internal ports, both connected to the backplane, and a single card with two external ports, both connected to a different controller on same the enclosure, FreeNAS (and Windows when I had it installed for testing, and even HP's RAID/HBA bootable utility) didn't seem to notice or care (let alone use for extra bandwidth) about the second port connections unless the primary one failed. Since I don't seem to be getting extra bandwidth/speed, I figure I'll just use dual HBA's over the same number of internal & external ports, and at least gain controller redundancy (though not full multipathing due to SATA drives instead of SAS drives).

Some caveats:
-The HGST drives are mostly new or low usage drives.
-The Crucial SSD's are brand new, which is why I find so many questionable drives errors to be suspicious.
-Neither of these servers are in production or have data on them, yet. I can wipe and/or rebuild the pools or even systems as needed, though time is becoming 'of the essence'.
-When I was trying the previous 9207 HBA's, I got many "Currently unreadable (pending) sectors" errors across random drives. When I changed the cards (to 9217) on both servers and wiped the drives, most of these went away and have yet to return. I suspected something funny about those specific cards.
-The 9207's & 9217's are using the 19 (as opposed to 20) IT-mode firmware. FreeNAS doesn't seem to mind. The reason I'm not using the latest version 20 (from Broadcom's website) is that during the POST sequence of both servers, the HBA would say something to the effect of 'Press Ctrl C to enter the SAS BIOS config'. Doing so with v20 would crash the server with an MNI fault. V19 does not have this issue.
-My experience with hardware and FreeNAS in general (largely limited to the web GUI) is decent, but my understanding of of SSH, and the underlying OS is quite novice. That said, I figured out how to get these outputs, and I've seen other posts about this error, but none mention on one error per drive. I'm also not sure what to make of the lack of obvious error counts.
-I read one thread (can't find it now) that suggested this could be a false positive caused by having SMART tests and Pool Scrubs scheduled to closely together. Previously, I had a short test scheduled around midnight for ALL drives (Mechanical, SSD, and all SSD SLOG & cache drives), plus a once-a-week pool scrubs (one pool for mechanical data dvDevs, the other pool for SSD data vDevs) starting around 1am. Perhaps these need to be seperated much more. I saw cyberjock's "Scrub and SMART testing schedules" article after this point.

Long test output of da0:
Code:
root@FreeNAS02[~]# smartctl -a /dev/da0
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     CT1000MX500SSD1
Serial Number:    1852E1E0B169
LU WWN Device Id: 5 00a075 1e1e0b169
Firmware Version: M3CR023
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Tue Jun  4 15:43:33 2019 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  30) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x0031) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0
  5 Reallocated_Sector_Ct   0x0032   100   100   010    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       820
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       53
171 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
172 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
173 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       2
174 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       17
180 Unused_Rsvd_Blk_Cnt_Tot 0x0033   000   000   000    Pre-fail  Always       -       45
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   071   055   000    Old_age   Always       -       29 (Min/Max 0/45)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
202 Unknown_SSD_Attribute   0x0030   100   100   001    Old_age   Offline      -       0
206 Unknown_SSD_Attribute   0x000e   100   100   000    Old_age   Always       -       0
210 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
246 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       4312042892
247 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       68557416
248 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       43355933

SMART Error Log Version: 1
Invalid Error Log index = 0x0d (T13/1321D rev 1c Section 8.41.6.8.2.2 gives valid range from 1 to 5)

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%       819         -
# 2  Short offline       Completed without error       00%       780         -
# 3  Short offline       Completed without error       00%       756         -
# 4  Short offline       Completed without error       00%       732         -
# 5  Short offline       Completed without error       00%       708         -
# 6  Short offline       Completed without error       00%       683         -
# 7  Short offline       Completed without error       00%       658         -
# 8  Short offline       Completed without error       00%       633         -
# 9  Short offline       Completed without error       00%       607         -
#10  Short offline       Completed without error       00%       582         -
#11  Short offline       Completed without error       00%       557         -
#12  Extended offline    Completed without error       00%       538         -
#13  Extended offline    Completed without error       00%       534         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

root@FreeNAS02[~]# >....
# 5  Short offline       Completed without error       00%       708         -
# 6  Short offline       Completed without error       00%       683         -
# 7  Short offline       Completed without error       00%       658         -
# 8  Short offline       Completed without error       00%       633         -
# 9  Short offline       Completed without error       00%       607         -
#10  Short offline       Completed without error       00%       582         -
#11  Short offline       Completed without error       00%       557         -
#12  Extended offline    Completed without error       00%       538         -
#13  Extended offline    Completed without error       00%       534         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

root@FreeNAS02[~]#


Long test output of da2:
Code:
root@FreeNAS02[~]# smartctl -a /dev/da2
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     CT1000MX500SSD1
Serial Number:    1852E1E040A1
LU WWN Device Id: 5 00a075 1e1e040a1
Firmware Version: M3CR023
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Tue Jun  4 15:46:04 2019 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  30) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x0031) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0
  5 Reallocated_Sector_Ct   0x0032   100   100   010    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       836
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       46
171 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
172 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
173 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       2
174 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       17
180 Unused_Rsvd_Blk_Cnt_Tot 0x0033   000   000   000    Pre-fail  Always       -       42
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   065   056   000    Old_age   Always       -       35 (Min/Max 0/44)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       1
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
202 Unknown_SSD_Attribute   0x0030   100   100   001    Old_age   Offline      -       0
206 Unknown_SSD_Attribute   0x000e   100   100   000    Old_age   Always       -       0
210 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
246 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       4312202524
247 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       68559898
248 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       33340769

SMART Error Log Version: 1
Invalid Error Log index = 0x0d (T13/1321D rev 1c Section 8.41.6.8.2.2 gives valid range from 1 to 5)

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%       835         -
# 2  Short offline       Completed without error       00%       796         -
# 3  Short offline       Completed without error       00%       772         -
# 4  Short offline       Completed without error       00%       748         -
# 5  Short offline       Completed without error       00%       723         -
# 6  Short offline       Completed without error       00%       698         -
# 7  Short offline       Completed without error       00%       672         -
# 8  Short offline       Completed without error       00%       646         -
# 9  Short offline       Completed without error       00%       620         -
#10  Short offline       Completed without error       00%       594         -
#11  Short offline       Completed without error       00%       568         -
#12  Extended offline    Completed without error       00%       549         -
#13  Extended offline    Completed without error       00%       544         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

root@FreeNAS02[~]#


Thanks in advance.
-Sam
 

SamM

Dabbler
Joined
May 29, 2017
Messages
39

Thanks Johnnie Black. That possibility escaped me.

Here I was ready to blame the D2700 enclosure since it's already doing weird things like kicking the negotiated speed to 3g when the card is reporting 6g to the enclosure and all the front panel mechanical drives (in the server itself, and connected to the same HBA) negotiate at 6g.

Last night as I was replacing the six 4Tb drives with twelve 10Tb drives, I started to get several other Crucial MX500's, but only in the first server, not the second, with these 1 pending read warnings. The one SSD FreeNAS complained about the most, I did a "wipe" from the web GUI hoping that the sector just needed to be overwritten. I suppose the real solution is to wait for a firmware update.
 
Top