Resource icon

Hard Drive Burn-In Testing - Discussion Thread

PhilZJ81

Explorer
Joined
Mar 29, 2016
Messages
99
so, i ran command lines badlocks -b 4096 -ws /dev/ada[X] been running 2 hours and they are 25% complete. That seems quite fast considering how long some of you guys say it took. Did I do something wrong? Also, another thing I find odd is that the temperature has not changed by much over the drives. For example, it was 48C this morning (idling), now it's 50C.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
ing 2 hours and they are 25% complete.
That's the first write. Then it has to read. Then it has to do it all again for three more patterns.

Large blocks do speed it up, though.
 

yourmate

Contributor
Joined
Apr 4, 2016
Messages
105
It is a good read indeed!
I am currently 20 hours into my 32GB memory's memtest so I am nowhere near to do this yet but let me ask if the OP had deserted this thread or is there any other reason not having all suggestions/knowledge put into post #1?

I am thinking to combine all this later on so I might just well copy it in here when it's done...
 
Last edited:

PhilZJ81

Explorer
Joined
Mar 29, 2016
Messages
99
It is a good reed indeed!
I am currently 20 hours into my 32GB memory's memtest so I am nowhere near to do this yet but let me ask if the OP had deserted this thread or is there any other reason not having all suggestions/knowledge put into post #1?

I am thinking to combine all this later on so I might just well copy it in here when it's done...

Ya, I agree that it would be good to have the large drive badlocks command added to the first post.
The post looks really complete and current, it would prevent a lot of helpless idiots like myself from posting 16 pages in about "why my command line failed".
 

yourmate

Contributor
Joined
Apr 4, 2016
Messages
105
Ya, I agree that it would be good to have the large drive badlocks command added to the first post.
The post looks really complete and current, it would prevent a lot of helpless idiots like myself from posting 16 pages in about "why my command line failed".
Will see what I can do ;)
 

leoj3n

Dabbler
Joined
Jan 10, 2014
Messages
18
With 6TB (Seagate) drives I had to specify a block size for it to run:
Code:
badblocks -b 4096 -ws /dev/da0
 

Tekz

Dabbler
Joined
May 28, 2016
Messages
12
Heads up on WD Re SAS drives - on the disks I just received (model WD2001FYYG-01SL3 VR08) , by default the writeback cache is turned off and results in badblocks and other writes running extremely slow. Talking under 8Mbps. If you're getting very poor speeds with WD Re SAS drives, run the following command and check to see if the writeback cache is turned on:
Code:
smartctl -x /dev/[disk id]


You can turn the writeback cache on by running the following command:
Code:
smartctl -s wcache,on/dev/[disk id]


After enabling the writeback cache on these drives, I saw speeds increase up to over 150Mbps when running badblocks and a similar increase when running dd.

Big thanks to the folks in IRC, especially ittti, khanman, and DrKK' for spending a few hours troubleshooting with me tonight to figure this out.
 

suhlhorn

Dabbler
Joined
Apr 20, 2016
Messages
23
tmux is very unintuitive at first. My recommendation to get 6 nicely distributed screens is to first carelessly open 6 of them. Then toggle between display options until you reach tiled (you absolutely need the man page for tmux).

It took me a while to figure it our from the man page, but M-5 (meta-5) is the magic key to automatically tile multiple panes in a single window.

Als0- If you're using Terminal on a Mac to connect through ssh, there is an option in the Terminal setting to use the 'option' key for Meta.

HTH-
-stephen
 

S1RC

Dabbler
Joined
Jul 28, 2016
Messages
28
Is there any issue running the 4 patterns individually instead of consecutively?

I mistakingly cancelled during the second pattern, so I restarted only using that pattern and no errors. If I manually do pattern three and four is that sufficient?
 

u6f6o

Explorer
Joined
Jul 27, 2016
Messages
59
I am just installing my first freenas box. I am wondering, when you guys execute the hard disk burn-in, do you have freenas already installed to execute the burn in or do you use some liveusb whatsoever to execute them?
 

MrToddsFriends

Documentation Browser
Joined
Jan 12, 2015
Messages
1,338
I am just installing my first freenas box. I am wondering, when you guys execute the hard disk burn-in, do you have freenas already installed to execute the burn in or do you use some liveusb whatsoever to execute them?

You are free to use a first FreeNAS installation like any other "live medium" that you are using during initial burn-in testing. After completing all burn-in tests it's your decision to use that first installation further on or to start from scratch.
 
D

Deleted47050

Guest
I am just installing my first freenas box. I am wondering, when you guys execute the hard disk burn-in, do you have freenas already installed to execute the burn in or do you use some liveusb whatsoever to execute them?

I do it from a freenas box I use specifically for testing stuff, so I just use that since I have it readily available.
 

Wallybanger

Contributor
Joined
Apr 17, 2016
Messages
150
It might be easier to tell people to use Ctrl+B C to create new windows in tmux. Hitting Ctrl+B " over and over again will just keep cutting the panes in half and eventually they will run out of panes. Or hit ctrl+b " 4 times and then hit ctrl+b [space] to reorganize those panes and then do the same thing in a new window. I have 8 drives so I have 2 windows, each with 4 panes.

For reference, I'm running badblocks on my 4tb drives and it's currently sitting at 25hrs and still going. I'm guessing it's going to be 48hrs before it finishes.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Took 2 or 3 days on my 8 x 4tb. Can't remember which ;)

It does 4 passes
 

Wallybanger

Contributor
Joined
Apr 17, 2016
Messages
150
I'm at 49hrs, still going.
 

Wallybanger

Contributor
Joined
Apr 17, 2016
Messages
150
So badblocks was finished doing it's thing when I woke up this morning. When I went to exit tmux my system froze up. I had to restart it to get it to cooperate. Now that I've restarted I get this error in the console:

CRITICAL: Aug. 22, 2016, 2:26 p.m. - The volume DirtyData (ZFS) state is UNKNOWN:

In the volume manager it's saying Error Getting Usable Space, Status UNKNOWN

I'm guessing that the destructive badblocks test wrote over some zpool info and crashed the pool but that's just a guess.

Anyway I'm going to run the smart tests again and see what what does for me.

When running badblocks shouldn't there be a flag or something to output the bad blocks to a file so that the system knows not to use those blocks on the drives?
 
Last edited:
Joined
Apr 9, 2015
Messages
1,258
Badblocks is meant to be run before a pool is created. Sounds like you created a pool with the drives and then ran badblocks.

You will have to destroy the pool in the console and recreate it. You should be ok with the test being completed but it may not have written to the entire drive and it could have taken longer than needed due to ZFS doing things to the drives at the same time. Hopefully you did not try and put any data on the pool you wanted to keep as it is no longer in existence.
 

Wallybanger

Contributor
Joined
Apr 17, 2016
Messages
150
Badblocks is meant to be run before a pool is created. Sounds like you created a pool with the drives and then ran badblocks.

You will have to destroy the pool in the console and recreate it. You should be ok with the test being completed but it may not have written to the entire drive and it could have taken longer than needed due to ZFS doing things to the drives at the same time. Hopefully you did not try and put any data on the pool you wanted to keep as it is no longer in existence.
Yep, you are correct. No, I didn't put any data on the pool. The only reason a pool was there was because I followed the installation guide in the documentation. Had I know to wait, I would have. In any case, I unmounted the pool (and based on the prompts I got, the console deleted it...?). I'm going to rerun badblocks now but having the pool mounted has generated a shit tonne of read errors in the SMART output.

Code:
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   113   100   006    Pre-fail  Always       -       51493616
  3 Spin_Up_Time            0x0003   091   091   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       18
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   075   060   030    Pre-fail  Always       -       34074229
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       294
10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       19
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   070   067   045    Old_age   Always       -       30 (Min/Max 19/33)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       14
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       18
194 Temperature_Celsius     0x0022   030   040   000    Old_age   Always       -       30 (0 19 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0


I'm hoping that once I rerun badblocks those numbers will change to more realistic values. They are seagate drives though and I know they index the errors differently....
 

VladTepes

Patron
Joined
May 18, 2016
Messages
287
I've not read the whole thread but following the OP I have done the smartctl short and conveyance tests
It says they will take x minutes but don;t report back anything to say they have completed or otherwise. Is that normal?
 
Top