Hard Drive Burn-In Testing - Discussion Thread

ChrisRJ · Jan 5, 2023

My burn-in was letting my new NAS run as a backup for the old one, using replication tasks. The latter also took care of data migration, BTW. This phase lasted about 3 months. I still got bitten after 8 months, but that's life.

Shigure · Jan 14, 2023

I might be on the wrong route now.

I have several EXOS drives in a new box then configured the pool before I realized I only run smart short, conveyance and long but not badblocks. All smart tests showed no error and the smart specs seems fine. I'm running badblocks now but basically all drives are showing something like this (0/0/{some numbers} errors), which I understands as there is no error during writing and reading but errors during comparsion? It is no finished yet and there is no badblocks address in the output It's a pool contains both data vdev and metadata vdev.

What I used is
badblocks -wsv -b 4096 /dev/sde

Since I have no data on the pool yet maybe I should distroy the pool then rerun badbloccks?

winnielinnie · Jan 14, 2023

Shigure said:
Since I have no data on the pool yet maybe I should distroy the pool then rerun badbloccks?

Running badblocks with the "-w" flag effectively destroys the data on it.

Shigure said:
all drives are showing something like this (0/0/{some numbers} errors)

Is the third number very large? Did you check the connections/cables of all drives involved, including any HBA (if applicable)?

Shigure · Jan 14, 2023

winnielinnie said:
Running badblocks with the "-w" flag effectively destroys the data on it.

Is the third number very large? Did you check the connections/cables of all drives involved, including any HBA (if applicable)?

Thanks, I know it's a destructive test but I'm not sure if in a configured pool will there be any differences or not.

There are 4 drives running now, only about 10% of reading and comparing finished on each though. 3 of them have less than 100 comparison errors, the last one has about 40K but it hasn't change for hours. All cables should be fine but I will double check.

winnielinnie · Jan 14, 2023

Shigure said:
Thanks, I know it's a destructive test but I'm not sure if in a configured pool will there be any differences or not.

If you're using badblocks in a destructive mode, the drive(s) should not be members of any vdev/pool in the first place. Are you implying that your pool was "imported" and then you started to run a destructive pass of badblocks on the drives?

Shigure said:
There are 4 drives running now, only about 10% of reading and comparing finished on each though. 3 of them have less than 100 comparison errors, the last one has about 40K but it hasn't change for hours. All cables should be fine but I will double check.

The definitely smells like a connection/cable issue. (Hopefully not a RAM and/or CPU issue.)

Check / re-do all points where something connects into something else, including an HBA seated in the motherboard (if applicable):
✔ All data cables involved
✔ Connections to the back of each drive
✔ Connections to the motherboard (or HBA)
✔ HBA to the motherboard (if applicable)

Shigure · Jan 14, 2023

winnielinnie said:
If you're using badblocks in a destructive mode, the drive(s) should not be members of any vdev/pool in the first place. Are you implying that your pool was "imported" and then you started to run a destructive pass of badblocks on the drives?

The definitely smells like a connection/cable issue. (Hopefully not a RAM and/or CPU issue.)

Check / re-do all points where something connects into something else, including an HBA seated in the motherboard (if applicable):
✔ All data cables involved
✔ Connections to the back of each drive
✔ Connections to the motherboard (or HBA)
✔ HBA to the motherboard (if applicable)

I think I'm still too new to TrueNAS and ZFS so didn't do things in the right order. Those are all new drives, and I configured them into a new pool.

Since I have no data on the drives at all, it should be safe that I just ctrl-c cancel all the running sessions, remove the drives from pool, shut down the machine and check connections then re-do badblocks?

winnielinnie · Jan 14, 2023

Shigure said:
Since I have no data on the drives at all, it should be safe that I just ctrl-c cancel all the running sessions, remove the drives from pool, shut down the machine and check connections then re-do badblocks?

Sounds like a plan.

Shigure said:
Since I have no data on the drives at all

That wasn't my main concern. I was afraid you'd cause a system panic or random behavior on the TrueNAS server if you already created/imported a pool, and then started to use lower-level software to wipe the drives without using or informing the GUI and "middleware" (while the pool is still active/imported.)

Shigure · Jan 14, 2023

winnielinnie said:
Sounds like a plan.

That wasn't my main concern. I was afraid you'd cause a system panic or random behavior on the TrueNAS server if you already created/imported a pool, and the started to use low-level software to wipe the drives without using the GUI (while the pool is still active/imported.)

I see your point, yes that will be a bad thing for TrueNAS itself if I'm not using something in the GUI. Thanks for your help! I will go check the cables now XD.

billbillw · Jan 23, 2023

On badblocks, do you all typically let it run through multiple patterns? I thought mine was about done (with the percentage), then it started another pattern. Running it as -b 4096 -ws /dev/sdX

This is my 1st time using badblocks. No errors after Testing with pattern 0xaa, reading and comparing. Now its doing pattern 0x55.

mistermanko · Jan 23, 2023

billbillw said:
On badblocks, do you all typically let it run through multiple patterns? I thought mine was about done (with the percentage), then it started another pattern. Running it as -b 4096 -ws /dev/sdX

This is my 1st time using badblocks. No errors after Testing with pattern 0xaa, reading and comparing. Now its doing pattern 0x55.

and it'll do 2 more patterns, to a total of writing 4 patterns to each block of each drive and comparing them after each pattern.
Depending on your drive size this can take days - plural

. But that's the point of burning in something, right? Sustained full load over a span of a time. If the disk can handle that, they will handle TrueNAS' zpools over next few years easily.

NugentS · Jan 24, 2023

18TB drives took well over a week here

billbillw · Jan 24, 2023

mistermanko said:
and it'll do 2 more patterns, to a total of writing 4 patterns to each block of each drive and comparing them after each pattern.
Depending on your drive size this can take days - plural . But that's the point of burning in something, right? Sustained full load over a span of a time. If the disk can handle that, they will handle TrueNAS' zpools over next few years easily.

Thanks for the info. Looks like it will be 3-4 days to complete then. At 44hrs now and it is about 60% done with the reading and comparing the 2nd test pattern. These are 8TB drives.

Shigure · Feb 4, 2023

billbillw said:
Thanks for the info. Looks like it will be 3-4 days to complete then. At 44hrs now and it is about 60% done with the reading and comparing the 2nd test pattern. These are 8TB drives.

my 16TB EXOS took about 7 days to complete all 4 patterns, as long as you get 0/0/0 output then everything is good to go.

nasnice · Sep 6, 2023

Trying to upgrade to Truenas Core 13-0-U5 -3 and find out that Tmux does not want to work anymore... Has anything changed? Also cannot get access to the box through Putty...
What am I missing?

Patrick M. Hausen · Sep 6, 2023

Putty: possibly this, depending from which version you are coming.

13.3 Version Notes

Highlights and change log for each TrueNAS CORE 13.3 release.

www.truenas.com

By default, TrueNAS 12 cannot initiate a replication to or from TrueNAS 13 due to an outdated SSH client library. Allowing replication to or from TrueNAS 13 to TrueNAS 12 requires allowing ssh.rsa algorithms. See OpenSSH 8.2 Release for security considerations. Log into the TrueNAS 13 system and go to Services->SSH. Add the SSH Auxiliary Parameter: PubkeyAcceptedAlgorithms +ssh-rsa.

nasnice · Sep 6, 2023

Just verified system settings.. SSH was turned off

Now happily testing 11 20TB drives.... ETA? one week?
Stupid me...

EvanVanVan · Oct 12, 2023

I'm trying to run badblocks and I'm getting the following error. I've tried searching but have not been successful. 18TB shucked drive. It's brand new (should be empty). New temp install on TrueNAS Scale then after the same error now trying on Core.

Code:

root@truenas[~]# badblocks –b 4096 –vws /dev/da0
badblocks: invalid first block - –vws

Any ideas?

Thank you

Etorix · Oct 12, 2023

Beyond 16 TB, I'd use -b 8192. But I'm not sure if MAXBLOCKS is the issue here.

EvanVanVan · Oct 13, 2023

In the end I seem to have fixed it by reinstalling truenas core...it's running now...I am doing half and half like this though: https://www.reddit.com/r/DataHoarder/comments/kejp08/wd_18tb_badblocks_error_value_too_large/

Edit: After spending 4 days on the first half, badblocks still failed on the next segment due to the large last block number. Ended up installing the trial version of unraid and I'm doing the "preclear" process which is supposedly some sort of similar burn-in procedure...

thomas-hn · Feb 6, 2024

Maybe a stupid question, but if

Code:

sysctl kern.geom.debugflags=0x10

is set before the badblocks test, do I have to revert the setting after the test? If so, what is the correct default value?

Important Announcement for the TrueNAS Community.

Hard Drive Burn-In Testing - Discussion Thread

Wizard

Dabbler

MVP

Dabbler

MVP

Dabbler

MVP

Dabbler

Dabbler

Guru

MVP

Dabbler

Dabbler

Explorer

Hall of Famer

Explorer

Patron

Wizard

Patron

Explorer

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Hard Drive Burn-In Testing - Discussion Thread"

Similar threads