Hard Drive Burn-In Testing - Discussion Thread

PhilZJ81 · Apr 29, 2016

so, i ran command lines badlocks -b 4096 -ws /dev/ada[X] been running 2 hours and they are 25% complete. That seems quite fast considering how long some of you guys say it took. Did I do something wrong? Also, another thing I find odd is that the temperature has not changed by much over the drives. For example, it was 48C this morning (idling), now it's 50C.

Ericloewe · Apr 29, 2016

PhilZJ81 said:
ing 2 hours and they are 25% complete.

That's the first write. Then it has to read. Then it has to do it all again for three more patterns.

Large blocks do speed it up, though.

yourmate · Apr 29, 2016

It is a good read indeed!
I am currently 20 hours into my 32GB memory's memtest so I am nowhere near to do this yet but let me ask if the OP had deserted this thread or is there any other reason not having all suggestions/knowledge put into post #1?

I am thinking to combine all this later on so I might just well copy it in here when it's done...

PhilZJ81 · Apr 30, 2016

yourmate said:
It is a good reed indeed!
I am currently 20 hours into my 32GB memory's memtest so I am nowhere near to do this yet but let me ask if the OP had deserted this thread or is there any other reason not having all suggestions/knowledge put into post #1?

I am thinking to combine all this later on so I might just well copy it in here when it's done...

Ya, I agree that it would be good to have the large drive badlocks command added to the first post.
The post looks really complete and current, it would prevent a lot of helpless idiots like myself from posting 16 pages in about "why my command line failed".

yourmate · Apr 30, 2016

PhilZJ81 said:
Ya, I agree that it would be good to have the large drive badlocks command added to the first post.
The post looks really complete and current, it would prevent a lot of helpless idiots like myself from posting 16 pages in about "why my command line failed".

Will see what I can do ;)

leoj3n · May 8, 2016

With 6TB (Seagate) drives I had to specify a block size for it to run:

Code:

badblocks -b 4096 -ws /dev/da0

qwertymodo · May 10, 2016

The original post has been updated with the blocksize flag for badblocks.

Tekz · Jun 10, 2016

Heads up on WD Re SAS drives - on the disks I just received (model WD2001FYYG-01SL3 VR08) , by default the writeback cache is turned off and results in badblocks and other writes running extremely slow. Talking under 8Mbps. If you're getting very poor speeds with WD Re SAS drives, run the following command and check to see if the writeback cache is turned on:

Code:

smartctl -x /dev/[disk id]

You can turn the writeback cache on by running the following command:

Code:

smartctl -s wcache,on/dev/[disk id]

After enabling the writeback cache on these drives, I saw speeds increase up to over 150Mbps when running badblocks and a similar increase when running dd.

Big thanks to the folks in IRC, especially ittti, khanman, and DrKK' for spending a few hours troubleshooting with me tonight to figure this out.

suhlhorn · Jul 6, 2016

Ericloewe said:
tmux is very unintuitive at first. My recommendation to get 6 nicely distributed screens is to first carelessly open 6 of them. Then toggle between display options until you reach tiled (you absolutely need the man page for tmux).

It took me a while to figure it our from the man page, but M-5 (meta-5) is the magic key to automatically tile multiple panes in a single window.

Als0- If you're using Terminal on a Mac to connect through ssh, there is an option in the Terminal setting to use the 'option' key for Meta.

HTH-
-stephen

S1RC · Aug 6, 2016

Is there any issue running the 4 patterns individually instead of consecutively?

I mistakingly cancelled during the second pattern, so I restarted only using that pattern and no errors. If I manually do pattern three and four is that sufficient?

u6f6o · Aug 11, 2016

I am just installing my first freenas box. I am wondering, when you guys execute the hard disk burn-in, do you have freenas already installed to execute the burn in or do you use some liveusb whatsoever to execute them?

MrToddsFriends · Aug 11, 2016

u6f6o said:
I am just installing my first freenas box. I am wondering, when you guys execute the hard disk burn-in, do you have freenas already installed to execute the burn in or do you use some liveusb whatsoever to execute them?

You are free to use a first FreeNAS installation like any other "live medium" that you are using during initial burn-in testing. After completing all burn-in tests it's your decision to use that first installation further on or to start from scratch.

Deleted47050 · Aug 11, 2016

u6f6o said:
I am just installing my first freenas box. I am wondering, when you guys execute the hard disk burn-in, do you have freenas already installed to execute the burn in or do you use some liveusb whatsoever to execute them?

I do it from a freenas box I use specifically for testing stuff, so I just use that since I have it readily available.

Wallybanger · Aug 20, 2016

It might be easier to tell people to use Ctrl+B C to create new windows in tmux. Hitting Ctrl+B " over and over again will just keep cutting the panes in half and eventually they will run out of panes. Or hit ctrl+b " 4 times and then hit ctrl+b [space] to reorganize those panes and then do the same thing in a new window. I have 8 drives so I have 2 windows, each with 4 panes.

For reference, I'm running badblocks on my 4tb drives and it's currently sitting at 25hrs and still going. I'm guessing it's going to be 48hrs before it finishes.

Stux · Aug 21, 2016

Took 2 or 3 days on my 8 x 4tb. Can't remember which ;)

It does 4 passes

Wallybanger · Aug 21, 2016

I'm at 49hrs, still going.

Wallybanger · Aug 22, 2016

So badblocks was finished doing it's thing when I woke up this morning. When I went to exit tmux my system froze up. I had to restart it to get it to cooperate. Now that I've restarted I get this error in the console:

CRITICAL: Aug. 22, 2016, 2:26 p.m. - The volume DirtyData (ZFS) state is UNKNOWN:

In the volume manager it's saying Error Getting Usable Space, Status UNKNOWN

I'm guessing that the destructive badblocks test wrote over some zpool info and crashed the pool but that's just a guess.

Anyway I'm going to run the smart tests again and see what what does for me.

When running badblocks shouldn't there be a flag or something to output the bad blocks to a file so that the system knows not to use those blocks on the drives?

nightshade00013 · Aug 22, 2016

Badblocks is meant to be run before a pool is created. Sounds like you created a pool with the drives and then ran badblocks.

You will have to destroy the pool in the console and recreate it. You should be ok with the test being completed but it may not have written to the entire drive and it could have taken longer than needed due to ZFS doing things to the drives at the same time. Hopefully you did not try and put any data on the pool you wanted to keep as it is no longer in existence.

Wallybanger · Aug 22, 2016

nightshade00013 said:
Badblocks is meant to be run before a pool is created. Sounds like you created a pool with the drives and then ran badblocks.

You will have to destroy the pool in the console and recreate it. You should be ok with the test being completed but it may not have written to the entire drive and it could have taken longer than needed due to ZFS doing things to the drives at the same time. Hopefully you did not try and put any data on the pool you wanted to keep as it is no longer in existence.

Yep, you are correct. No, I didn't put any data on the pool. The only reason a pool was there was because I followed the installation guide in the documentation. Had I know to wait, I would have. In any case, I unmounted the pool (and based on the prompts I got, the console deleted it...?). I'm going to rerun badblocks now but having the pool mounted has generated a shit tonne of read errors in the SMART output.

Code:

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   113   100   006    Pre-fail  Always       -       51493616
  3 Spin_Up_Time            0x0003   091   091   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       18
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   075   060   030    Pre-fail  Always       -       34074229
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       294
10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       19
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   070   067   045    Old_age   Always       -       30 (Min/Max 19/33)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       14
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       18
194 Temperature_Celsius     0x0022   030   040   000    Old_age   Always       -       30 (0 19 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

I'm hoping that once I rerun badblocks those numbers will change to more realistic values. They are seagate drives though and I know they index the errors differently....

VladTepes · Sep 6, 2016

I've not read the whole thread but following the OP I have done the smartctl short and conveyance tests
It says they will take x minutes but don;t report back anything to say they have completed or otherwise. Is that normal?

Important Announcement for the TrueNAS Community.

Hard Drive Burn-In Testing - Discussion Thread

Explorer

Server Wrangler

Contributor

Explorer

Contributor

Dabbler

Contributor

Dabbler

Dabbler

Dabbler

Explorer

Documentation Browser

Deleted47050

Guest

Contributor

MVP

Contributor

Contributor

Wizard

Contributor

Patron

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Hard Drive Burn-In Testing - Discussion Thread"

Similar threads