Resource icon

Hard Drive Burn-In Testing - Discussion Thread

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,906
My burn-in was letting my new NAS run as a backup for the old one, using replication tasks. The latter also took care of data migration, BTW. This phase lasted about 3 months. I still got bitten after 8 months, but that's life.
 

Shigure

Dabbler
Joined
Sep 1, 2022
Messages
39
I might be on the wrong route now.

I have several EXOS drives in a new box then configured the pool before I realized I only run smart short, conveyance and long but not badblocks. All smart tests showed no error and the smart specs seems fine. I'm running badblocks now but basically all drives are showing something like this (0/0/{some numbers} errors), which I understands as there is no error during writing and reading but errors during comparsion? It is no finished yet and there is no badblocks address in the output It's a pool contains both data vdev and metadata vdev.

What I used is
badblocks -wsv -b 4096 /dev/sde

Since I have no data on the pool yet maybe I should distroy the pool then rerun badbloccks?
 
Joined
Oct 22, 2019
Messages
3,589
Since I have no data on the pool yet maybe I should distroy the pool then rerun badbloccks?
Running badblocks with the "-w" flag effectively destroys the data on it.

all drives are showing something like this (0/0/{some numbers} errors)
Is the third number very large? Did you check the connections/cables of all drives involved, including any HBA (if applicable)?
 

Shigure

Dabbler
Joined
Sep 1, 2022
Messages
39
Running badblocks with the "-w" flag effectively destroys the data on it.


Is the third number very large? Did you check the connections/cables of all drives involved, including any HBA (if applicable)?

Thanks, I know it's a destructive test but I'm not sure if in a configured pool will there be any differences or not.

There are 4 drives running now, only about 10% of reading and comparing finished on each though. 3 of them have less than 100 comparison errors, the last one has about 40K but it hasn't change for hours. All cables should be fine but I will double check.
 
Joined
Oct 22, 2019
Messages
3,589
Thanks, I know it's a destructive test but I'm not sure if in a configured pool will there be any differences or not.
If you're using badblocks in a destructive mode, the drive(s) should not be members of any vdev/pool in the first place. Are you implying that your pool was "imported" and then you started to run a destructive pass of badblocks on the drives? :oops:


There are 4 drives running now, only about 10% of reading and comparing finished on each though. 3 of them have less than 100 comparison errors, the last one has about 40K but it hasn't change for hours. All cables should be fine but I will double check.
The definitely smells like a connection/cable issue. (Hopefully not a RAM and/or CPU issue.)

Check / re-do all points where something connects into something else, including an HBA seated in the motherboard (if applicable):
✔ All data cables involved
✔ Connections to the back of each drive
✔ Connections to the motherboard (or HBA)
✔ HBA to the motherboard (if applicable)
 

Shigure

Dabbler
Joined
Sep 1, 2022
Messages
39
If you're using badblocks in a destructive mode, the drive(s) should not be members of any vdev/pool in the first place. Are you implying that your pool was "imported" and then you started to run a destructive pass of badblocks on the drives? :oops:



The definitely smells like a connection/cable issue. (Hopefully not a RAM and/or CPU issue.)

Check / re-do all points where something connects into something else, including an HBA seated in the motherboard (if applicable):
✔ All data cables involved
✔ Connections to the back of each drive
✔ Connections to the motherboard (or HBA)
✔ HBA to the motherboard (if applicable)
I think I'm still too new to TrueNAS and ZFS so didn't do things in the right order. Those are all new drives, and I configured them into a new pool.

Since I have no data on the drives at all, it should be safe that I just ctrl-c cancel all the running sessions, remove the drives from pool, shut down the machine and check connections then re-do badblocks?
 
Joined
Oct 22, 2019
Messages
3,589
Since I have no data on the drives at all, it should be safe that I just ctrl-c cancel all the running sessions, remove the drives from pool, shut down the machine and check connections then re-do badblocks?
Sounds like a plan. :smile:

Since I have no data on the drives at all
That wasn't my main concern. I was afraid you'd cause a system panic or random behavior on the TrueNAS server if you already created/imported a pool, and then started to use lower-level software to wipe the drives without using or informing the GUI and "middleware" (while the pool is still active/imported.)
 

Shigure

Dabbler
Joined
Sep 1, 2022
Messages
39
Sounds like a plan. :smile:


That wasn't my main concern. I was afraid you'd cause a system panic or random behavior on the TrueNAS server if you already created/imported a pool, and the started to use low-level software to wipe the drives without using the GUI (while the pool is still active/imported.)
I see your point, yes that will be a bad thing for TrueNAS itself if I'm not using something in the GUI. Thanks for your help! I will go check the cables now XD.
 

billbillw

Dabbler
Joined
Jan 6, 2023
Messages
33
On badblocks, do you all typically let it run through multiple patterns? I thought mine was about done (with the percentage), then it started another pattern. Running it as -b 4096 -ws /dev/sdX

This is my 1st time using badblocks. No errors after Testing with pattern 0xaa, reading and comparing. Now its doing pattern 0x55.
 
Joined
Jan 27, 2020
Messages
577
On badblocks, do you all typically let it run through multiple patterns? I thought mine was about done (with the percentage), then it started another pattern. Running it as -b 4096 -ws /dev/sdX

This is my 1st time using badblocks. No errors after Testing with pattern 0xaa, reading and comparing. Now its doing pattern 0x55.
and it'll do 2 more patterns, to a total of writing 4 patterns to each block of each drive and comparing them after each pattern.
Depending on your drive size this can take days - plural :wink:. But that's the point of burning in something, right? Sustained full load over a span of a time. If the disk can handle that, they will handle TrueNAS' zpools over next few years easily.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,945
18TB drives took well over a week here
 

billbillw

Dabbler
Joined
Jan 6, 2023
Messages
33
and it'll do 2 more patterns, to a total of writing 4 patterns to each block of each drive and comparing them after each pattern.
Depending on your drive size this can take days - plural :wink:. But that's the point of burning in something, right? Sustained full load over a span of a time. If the disk can handle that, they will handle TrueNAS' zpools over next few years easily.
Thanks for the info. Looks like it will be 3-4 days to complete then. At 44hrs now and it is about 60% done with the reading and comparing the 2nd test pattern. These are 8TB drives.
 

Shigure

Dabbler
Joined
Sep 1, 2022
Messages
39
Thanks for the info. Looks like it will be 3-4 days to complete then. At 44hrs now and it is about 60% done with the reading and comparing the 2nd test pattern. These are 8TB drives.

my 16TB EXOS took about 7 days to complete all 4 patterns, as long as you get 0/0/0 output then everything is good to go.
 

nasnice

Explorer
Joined
Jul 14, 2014
Messages
82
Trying to upgrade to Truenas Core 13-0-U5 -3 and find out that Tmux does not want to work anymore... Has anything changed? Also cannot get access to the box through Putty...
What am I missing?
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,740
Putty: possibly this, depending from which version you are coming.


By default, TrueNAS 12 cannot initiate a replication to or from TrueNAS 13 due to an outdated SSH client library. Allowing replication to or from TrueNAS 13 to TrueNAS 12 requires allowing ssh.rsa algorithms. See OpenSSH 8.2 Release for security considerations. Log into the TrueNAS 13 system and go to Services->SSH. Add the SSH Auxiliary Parameter: PubkeyAcceptedAlgorithms +ssh-rsa.
 

nasnice

Explorer
Joined
Jul 14, 2014
Messages
82
Just verified system settings.. SSH was turned off :oops: Now happily testing 11 20TB drives.... ETA? one week?
Stupid me...
 

EvanVanVan

Patron
Joined
Feb 1, 2014
Messages
211
I'm trying to run badblocks and I'm getting the following error. I've tried searching but have not been successful. 18TB shucked drive. It's brand new (should be empty). New temp install on TrueNAS Scale then after the same error now trying on Core.

Code:
root@truenas[~]# badblocks –b 4096 –vws /dev/da0
badblocks: invalid first block - –vws


Any ideas?

Thank you
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,112
Beyond 16 TB, I'd use -b 8192. But I'm not sure if MAXBLOCKS is the issue here.
 

EvanVanVan

Patron
Joined
Feb 1, 2014
Messages
211
In the end I seem to have fixed it by reinstalling truenas core...it's running now...I am doing half and half like this though: https://www.reddit.com/r/DataHoarder/comments/kejp08/wd_18tb_badblocks_error_value_too_large/


Edit: After spending 4 days on the first half, badblocks still failed on the next segment due to the large last block number. Ended up installing the trial version of unraid and I'm doing the "preclear" process which is supposedly some sort of similar burn-in procedure...
 
Last edited:

thomas-hn

Explorer
Joined
Aug 2, 2020
Messages
82
Maybe a stupid question, but if
Code:
sysctl kern.geom.debugflags=0x10
is set before the badblocks test, do I have to revert the setting after the test? If so, what is the correct default value?
 
Top