New TrueNAS, HDD burn-in, crash 'boot-pool' encountered I/O failure

DanP.

Dabbler
Joined
Apr 2, 2019
Messages
17
Hello,

I'm building my first FreeNAS / TrueNAS (12.0-U2.1) with some used and new components:
  • Supermicro X9SCM-F - used
  • Intel Xeon E3 1230v2 - used
  • 32 GB ECC 1600 RAM - used
  • 6 x 6TB WD Red Plus - used
  • SAS HBA Dell H310 - new
  • Seasonic 750W - new
  • Crucial BX500 - new
Thanks to the great resources in this forum, I already made the install (on BX500 SSD), memtest, cpu stress test, long SMART tests. All looked well so far.
While doing the Burn-in on all 6 HDDs with badblocks (through SSH,with tmux), I started seeing badblocks errors on all disks after about 18hours (temps around 30°C). Between 200 and 1000 - e.g. (200/0/0 errors). Then I realized that the terminal is not responding. IPMI showed this error:
"WARNING: Pool 'boot-pool' has encountered an uncorrectable I/O failure and has been suspended."

Java iKVM Viewer v1.69.21 [192.168.1.95] TrueNAS-IPMI - Resolution 752 X 413 - FPS 22 2021-03-...png


After a restart the system booted normally.
SMART (long run) results from my boot device (BX500 SSD) show:
Code:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   050    Pre-fail  Always       -       0
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   050    Old_age   Always       -       76
12 Power_Cycle_Count       0x0032   100   100   050    Old_age   Always       -       17
171 Program_Fail_Count      0x0032   100   100   050    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   050    Old_age   Always       -       0
173 Ave_Block-Erase_Count   0x0032   100   100   050    Old_age   Always       -       0
174 Unexpect_Power_Loss_Ct  0x0032   100   100   050    Old_age   Always       -       8
180 Unused_Reserve_NAND_Blk 0x0032   100   100   050    Old_age   Always       -       100
183 SATA_Interfac_Downshift 0x0032   100   100   050    Old_age   Always       -       0
184 Error_Correction_Count  0x0032   100   100   050    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   050    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   067   058   050    Old_age   Always       -       33 (Min/Max 30/42)
196 Reallocated_Event_Count 0x0032   100   100   050    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   050    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   050    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   050    Old_age   Always       -       0
202 Percent_Lifetime_Remain 0x0030   100   100   001    Old_age   Offline      -       100
206 Write_Error_Rate        0x002e   100   100   050    Old_age   Always       -       0
210 Success_RAIN_Recov_Cnt  0x0032   100   100   050    Old_age   Always       -       0
246 Total_LBAs_Written      0x0032   100   100   050    Old_age   Always       -       4125690
247 Host_Program_Page_Count 0x0032   100   100   050    Old_age   Always       -       128927
248 FTL_Program_Page_Count  0x0032   100   100   050    Old_age   Always       -       0


It shows an unexpected power loss (174) with RAW value of 8.
  1. Does that mean my boot SSD already had 8 power losses, with only 76 hours uptime total?
  2. Could this also be a reason for the for the badblocks? Should I just start badblocks again?
  3. Where could I find those logs. I checked /var/log/messages, but can't find the 'boot-pool' error.
Thanks for any tips.
 
Last edited:

DanP.

Dabbler
Joined
Apr 2, 2019
Messages
17
Short update:
Since the server booted normally I just started badblocks again on all 6 disks in parallel. It ran throught without problems, all disks (0/0/0 errors).
Also smartctl -t long after that showed no errors.
All seems fine!

My guess / hope: my boot disk lost power (for whatever reason) and led to all the badblock errors.
I am going to install another BX500 and mirror my boot disk.
 
Top