"boot-pool uncorrectable I/O failure" error & unknown SMART attributes

FlyingPersian

Patron
Joined
Jan 27, 2014
Messages
237
Hello,

I'm on a relatively fresh (2-3 weeks) TN instal on my 2,5" SSD, which is connected to the internal motherboard USB connector via a SATA to USB Adapter. Today, my NAS has been running for a few hours, when I noticed I couldn't access it via the GUI. IPMI showed me that there was an errro with the boot-pool. Shutting the server down via console was not possible (see screenshot below). Connecting via SSH did not work as well (no error, just kept trying to connect):

Unbenannt.JPG

After a reset. I ran a few tests and some things seemed weird:

1. One of my drives is shown as ada3p2, the rest as gptid/xxx:

Code:
        NAME                                            STATE     READ WRITE CKSUM
        Data                                            ONLINE       0     0     0
          raidz2-0                                      ONLINE       0     0     0
            gptid/111dd78e-103b-11e9-b74e-0cc47a406253  ONLINE       0     0     0
            gptid/f862c9ea-00b4-11ea-9daf-0cc47a406253  ONLINE       0     0     0
            gptid/e79d9e0e-f731-11e9-9ef4-0cc47a406253  ONLINE       0     0     0
            gptid/ed4cf2d0-e4eb-11e8-bded-0cc47a406253  ONLINE       0     0     0
            ada3p2                                      ONLINE       0     0     0
            gptid/b9c40f77-68f3-11e8-a08d-0cc47a406253  ONLINE       0     0     0

2. A quick SMART on my boot-pool revealed one error on an Unknown Attribute:

Code:
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   100   100   050    Old_age   Always       -       0
  5 Reallocated_Sector_Ct   0x0032   100   100   050    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   050    Old_age   Always       -       25
12 Power_Cycle_Count       0x0032   100   100   050    Old_age   Always       -       8
160 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       0
161 Unknown_Attribute       0x0033   100   100   050    Pre-fail  Always       -       100
163 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       32
164 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       21
165 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       1
166 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       0
167 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       0
168 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       7000
169 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       100
175 Program_Fail_Count_Chip 0x0032   100   100   050    Old_age   Always       -       0
176 Erase_Fail_Count_Chip   0x0032   100   100   050    Old_age   Always       -       0
177 Wear_Leveling_Count     0x0032   100   100   050    Old_age   Always       -       0
178 Used_Rsvd_Blk_Cnt_Chip  0x0032   100   100   050    Old_age   Always       -       0
181 Program_Fail_Cnt_Total  0x0032   100   100   050    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   050    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   050    Old_age   Always       -       8
194 Temperature_Celsius     0x0022   100   100   050    Old_age   Always       -       43
195 Hardware_ECC_Recovered  0x0032   100   100   050    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   100   100   050    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   050    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0032   100   100   050    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   050    Old_age   Always       -       0
232 Available_Reservd_Space 0x0032   100   100   050    Old_age   Always       -       100
241 Total_LBAs_Written      0x0030   100   100   050    Old_age   Offline      -       43
242 Total_LBAs_Read         0x0030   100   100   050    Old_age   Offline      -       53
245 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       0

My guess is that it has something to do with the Adapter? The SSD is quite new (25h run-time). Is there any way I can verify without buying a new one? My motherboard (Supermicro X10SLM-F) has 6 SATA ports, which are all in use.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
What model of SSD? Certain ones (I believe using a SiliconMotion controller, eg: WD Green) have an issue with self-corroding when issued TRIM commands. Only solution is to disable TRIM which is unfortunately a global setting.
 

FlyingPersian

Patron
Joined
Jan 27, 2014
Messages
237
What model of SSD? Certain ones (I believe using a SiliconMotion controller, eg: WD Green) have an issue with self-corroding when issued TRIM commands. Only solution is to disable TRIM which is unfortunately a global setting.

Intenso High Performance interne SSD 120GB. The spec sheet says:

Code:
Properties: Advanced 3D-Nand technology with SLC data boost-cache*
            Low power consumption
            Schock-resistant (1500 G / 0.5 ms)
            Silent operation (0 dB)
Features: SMART Command Support
          TRIM Command Support


Edit: I can't find any information on the controller. On a different SSD from Intenso it says SandForce (SF 2281) Controller

Edit2: Happened again, this time with a slightly different message beforehand

Edit3: I changed out the cable since I had another one, and as of now (uptime 45 mins) I had no errors
 

Attachments

  • Unbenannt.JPG
    Unbenannt.JPG
    85.3 KB · Views: 271
Last edited:

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I have saved the config yesterday. When I do a fresh install and upload the config, will it be fine or most likely corrupted as well?
Hard to say, but I assume you will be notified on config import if anything can't be parsed.
 

FlyingPersian

Patron
Joined
Jan 27, 2014
Messages
237

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
Hard to say, but I assume you will be notified on config import if anything can't be parsed.
It should be clear in zpool status -v following a scrub. If the config database is listed in the corrupt files, you're going to have problems. If it's not in the list, all good.

A fresh install will be problematic as a lot of writes will happen before you can implement the trim setting. I saw a thread somewhere where there's a workaround for that too by going into command mode for the install and setting it at CLI first... Personally, I would just buy a $20 Kingston SSD and have an easier life.
 

robvandal999

Dabbler
Joined
Jan 14, 2018
Messages
23
I just installed TrueNAS 12.0 onto a 128gb thumb drive and I also am running into this issue. It is a brand new thumb drive too.
 

Attachments

  • e4gv9kv82wd61.jpg
    e4gv9kv82wd61.jpg
    412.3 KB · Views: 257

Kris Moore

SVP of Engineering
Administrator
Moderator
iXsystems
Joined
Nov 12, 2015
Messages
1,471
Possibly related, some other users have let us know that using USB 2.0 or USB 1.1 ports for boot-pool work better than using USB 3.0.
 
Top