Hard Drive Burn-In Testing - Discussion Thread

pjc · Sep 3, 2018

Stux said:
10% drop per drive when scaling out from 1 drive to 8 seems okay to me.

Why? @jgreco's solnet script yells if there's more than an 8% drop.

And it’d also depend on what part of the drive is being read.

This is reading the beginning of the drive, the same as the serial speed test does, so it's apples-to-apples comparison.

Maybe your system just can’t process 1.6GB/s?

Why can't my system process 1.6GB/s when each drive has a 600MB/s link to the card, which in turn has a 4GB/s connection to the CPU, which in turn is only 10% loaded?

Optimized or not, I don't see where the bottleneck is. I care less about the actual throughput than I do about understanding the phenomenon.

(If we haven't sorted it out by then, I may indeed try 6 drives at a time once I'm done with my burn-in.)

jgreco · Sep 4, 2018

pjc said:
Why? @jgreco's solnet script yells if there's more than an 8% drop.

This is reading the beginning of the drive, the same as the serial speed test does, so it's apples-to-apples comparison.

Why can't my system process 1.6GB/s when each drive has a 600MB/s link to the card, which in turn has a 4GB/s connection to the CPU, which in turn is only 10% loaded?

Optimized or not, I don't see where the bottleneck is. I care less about the actual throughput than I do about understanding the phenomenon.

(If we haven't sorted it out by then, I may indeed try 6 drives at a time once I'm done with my burn-in.)

The bottleneck is likely to be the HBA itself. What you have in the HBA is a little PowerPC CPU. The design intent with the LSI2008 is that it is a low-end RAID controller, and that it handles RAID0, RAID1, and RAID5 operations with a little silicon help, but the CPU is doing the disk and management operations. When you crossflash to IT mode, you are bludgeoning out all the RAID-handling features, which increases the speed somewhat. However, the PPC CPU is still handling the host interface and then turning around and handling the SAS interface and SATA encapsulation too (IIRC, maybe not). Either way, it is important to remember that the HBA may not have infinite capacity. This is more of an issue because at the time the LSI2008 was released, the average disk speed was ~80-100MBytes/sec, but of course we're seeing much greater speeds. People have known for a long time that the LSI2008 cannot saturate a bunch of SSD's. You can probably find some discussions of that.

The detection is there in the script because it's just something that could indicate an issue. This was originally designed back around ~2000 for a massive 72-drive array that was being powered by a Dell PERC RAID controller that they had promised would be able to deliver a certain amount of performance ... that it didn't. Of course. The specific threshold back then was a LOT lower because the shared SCSI bus was definitely a limit, and eight of them running through a single controller was also a big limiting factor. We do a lot better these days with all the dedicated and point-to-point SAS and PCIe lanes.

Johnnie Black · Sep 5, 2018

I tested SAS2008 with 8 SSDs some time ago and it topped at around 2600MB/s combined bandwidth, 8 x 320MB/s, so that shouldn't be the bottleneck.

pjc · Sep 5, 2018

jgreco said:
The bottleneck is likely to be the HBA itself. What you have in the HBA is a little PowerPC CPU. The design intent with the LSI2008 is that it is a low-end RAID controller...the PPC CPU is still handling the host interface and then turning around and handling the SAS interface and SATA encapsulation too...it is important to remember that the HBA may not have infinite capacity. This is more of an issue because at the time the LSI2008 was released, the average disk speed was ~80-100MBytes/sec, but of course we're seeing much greater speeds. People have known for a long time that the LSI2008 cannot saturate a bunch of SSD's. You can probably find some discussions of that.

That explanation makes a lot of sense, thank you, though note that the 9201-16i is a non-RAID (IT) HBA and uses the 2116 controller rather than the 2008. Either way, I'm relieved that you don't think this is an alarming situation.

I'm curious about your thoughts on what @Johnnie Black observed, since he said he was able to push 2600MB/s over 8 SSDs on the SAS2008, and I'm only seeing 1440MB/s on the SAS2116, which is clocked higher (800 MHz vs. 533).

I'm also a little surprised that there would be this significant a performance limitation when its documentation says it supports 40Gb/s over PCIe 2.0 x8 and a full 600MB/s per SAS lane. The only limitation I see listed is "DDR-2 speeds up to 800 MT/s," but I'm not sure if or how that would apply here.

Maybe the limitation is due to SATA encapsulation overhead, since these are SATA drives rather than bare SAS?

The detection is there in the script because it's just something that could indicate an issue. This was originally designed back around ~2000 for a massive 72-drive array that was being powered by a Dell PERC RAID controller that they had promised would be able to deliver a certain amount of performance ... We do a lot better these days with all the dedicated and point-to-point SAS and PCIe lanes.

Thank you for the historical context. The detection is still useful, since it seems to have revealed an unexpected limitation in my system, benign though it may be. Until you mentioned your goal of testing the entire system end-to-end, I always wondered why you bothered with the serial vs. parallel analysis, since the drives themselves wouldn't care. Now I know firsthand.

Johnnie Black · Sep 5, 2018

pjc said:
I'm also a little surprised that there would be this significant a performance limitation when its documentation says it supports 40Gb/s over PCIe 2.0 x8 and a full 600MB/s per SAS lane. The only limitation I see listed is "DDR-2 speeds up to 800 MT/s," but I'm not sure if or how that would apply here.

The main bottleneck is the PCIe 2.0 bus, theoretical max is 4000MB/s but there's always considerable overhead on the PCIe bus, I'd say between 2500 and 3000 is the expected max usable for a x8 PCIe 2.0 link.

I also tested an LSI2308 based controller, main difference is that one is PCIe 3.0 (and a slightly higher clock) and could get 8 x 440MB/s , which was the maximum read speed of the SSDs used, so it didn't hit any controller bottleneck.

Also got 16x275MB/s with the 2308 and a SAS2 expander, expander is the bottleneck here.

There's is also SATA/SAS overhead but it's a smaller percentage compared for example with PCIe, of the 600MB/s theoretical max you can get on a SATA3/SAS2 link you'll get around 550MB/s max usable, around 275MB/s for a SATA2/SAS1 link.

jgreco · Sep 6, 2018

pjc said:
That explanation makes a lot of sense, thank you, though note that the 9201-16i is a non-RAID (IT) HBA and uses the 2116 controller rather than the 2008. Either way, I'm relieved that you don't think this is an alarming situation.

I didn't say that. I'm just saying there's a potentially plausible area of investigation. As for the 2116 vs the 2008, I believe that the overall designs are similar, though the IR-capable chipsets appear to have some silicon offload for RAID5 operations. It is still a situation where the driver on the host system is coordinating with the HBA CPU. As these things move through their lifecycles and move into the "older hardware" territory, I would not be shocked to find that it may be that fixes to support modern issues may actually also be slowing things down, as the time that LSI would have been doing performance tweaking is during development and benchmarking, not aftermarket support. The LSI team that developed the thing might be at least partially gone, as LSI was acquired by Avago was acquired by Broadcom, so, well, who knows. Current patching and updates may render the device slower than it used to be.

I'm curious about your thoughts on what @Johnnie Black observed, since he said he was able to push 2600MB/s over 8 SSDs on the SAS2008, and I'm only seeing 1440MB/s on the SAS2116, which is clocked higher (800 MHz vs. 533).

Investigate and experiment.

I'm also a little surprised that there would be this significant a performance limitation when its documentation says it supports 40Gb/s over PCIe 2.0 x8 and a full 600MB/s per SAS lane. The only limitation I see listed is "DDR-2 speeds up to 800 MT/s," but I'm not sure if or how that would apply here.

I'll bet you $20 that you can get full speeds on *a* single SAS lane. Put one drive on the HBA. It works! The mistake you (and a Dell sales engineer 18 years ago) are probably making is to extrapolate this out to multiple lanes. Pushing traffic to a single device without other contention is pretty easy. But these things are fundamentally miniature embedded computer systems, and for the low end, and these ARE low end RAID controllers, there may not only NOT be an effort to optimize performance, but there may in fact be a disincentive to make it perform as well as the high end controllers.

Maybe the limitation is due to SATA encapsulation overhead, since these are SATA drives rather than bare SAS?

Testing variable numbers of devices may give you clues.

Thank you for the historical context. The detection is still useful, since it seems to have revealed an unexpected limitation in my system, benign though it may be. Until you mentioned your goal of testing the entire system end-to-end, I always wondered why you bothered with the serial vs. parallel analysis, since the drives themselves wouldn't care. Now I know firsthand.

Yeah. In general, things have limits, and it doesn't freak me out when things hit limits. I have hypervisors with 4x10GbE, and they will never be able to hit anywhere near that on a production workload. This doesn't freak me out because the intent was to avoid link saturation, which was something that I do see happen with 4x1GbE. The trick is to understand if things are operating as well as they reasonably can operate. I like for tools to point out where the sharp edges are if it is reasonable to do so. I need to make sure that my expectation of a system doesn't exceed what it is capable of.

Stux · Sep 7, 2018

Repeating:

Stux said:
Try with six drives?

Also, What motherboard and which slot? It’s possible to have severely constrained PCIe bandwidth via DMI contention on modern consumer intel boards.

Johnnie Black · Sep 7, 2018

Stux said:
Also, What motherboard and which slot? It’s possible to have severely constrained PCIe bandwidth via DMI contention on modern consumer intel boards.

That's a good point, besides the DMI issue many boards have for example x8 slots that are only x4 electrically.

Other bottlenecks are less obvious, for example, the Intel onboard SATA controller on pre-Skylake boards was limited by the DMI 2.0 link, of the theoretical 2GB/s max you could get around 1.6GB/s, after Skylake with DMI 3.0 in theory they should have a lot more bandwidth, at least about 3GB/s of the theoretical max 4GB/s, but in practice they have some other limit that won't let them go above 2GB/s, so say you have an X11SSM or similar board with 8 onboard ports, if they are all in use it won't go above 250MB/s per port, that is < SATA2 speeds.

Ender117 · Sep 16, 2018

Been using badblocks -b 4096 -ws /dev/adaX to test 12 SAS 4TB disks, after 85 hours it just finished the first pass, which means the whole test would take ~14 days. This seems excessive to me. The calculated throughput per disk is ~15MB/s, Anyone got any idea on what might be happening?

My hardware:
Dell R620
Dual E5 2690 v2
128GB RAM
LSI 9207-8e
NetAPP DS4243 24Bay disk shelf, swapped in a HB-SBB2-E601-COMP IO module
12 4TB HGST NL SAS drives

BTW, before badblocks I did a test pool consist of 2 6disk raidz2 vdev. Server side copy on a samba share was ~600MB/s, if that means anything

Chris Moore · Sep 17, 2018

Ender117 said:
Been using badblocks -b 4096 -ws /dev/adaX to test 12 SAS 4TB disks, after 85 hours it just finished the first pass, which means the whole test would take ~14 days. This seems excessive to me. The calculated throughput per disk is ~15MB/s, Anyone got any idea on what might be happening?

Something isn't working the way it should. It should be about six to ten times faster than that. I usually get around 100MB/s on my disks during testing although it does fluctuate depending on the part of the disk being accessed.

Ender117 said:
NetAPP DS4243 24Bay disk shelf, swapped in a HB-SBB2-E601-COMP IO module

Are you sure this is fully supported?

Ender117 · Sep 17, 2018

Chris Moore said:
Something isn't working the way it should. It should be about six to ten times faster than that. I usually get around 100MB/s on my disks during testing although it does fluctuate depending on the part of the disk being accessed.

Are you sure this is fully supported?

No it's not, NetApp only support it to be used with their system and software, not surprisingly. Another hack I know of is to use SFF 8088 to QSFP cable, which is more expensive.

That being said, this setup reportedly work for at least some people. And I didn't observed anything abnormal other than this. Some rough read/write test on a test pool also gave reasonable results. I do have

Code:

set_o_direct: Inappropriate ioctl for device

for running badblocks but it seems to be a freeBSD problem.

I guess I will just wait for its completion, 14 days is still tolerable. Unless this would affect the accuracy of badblocks? knocking on wood that it won't be cut off by a power outage or something. After this I might need to do a dd test on every disk to make sure things goes well

SuF1X · Oct 6, 2018

Hey All! hope you can help me. i am doing a burn in and followed the guide to the letter. one thing i did do is :

sysctl kern.geom.debugflags=0x10 twice

i am getting this:

Code:



root@freenas:~ # badblocks -b 4096 -ns /dev/ada0
Checking for bad blocks (non-destructive read-write test)
Testing with random pattern: set_o_direct: Inappropriate ioctl for device
^B^B^B^R0.01% done, 0:12 elapsed. (0/0/0 errors)
 74.59% done, 23:32:25 elapsed. (0/0/0 errors)
────────────────────────────────────────────────────────────────────────────────
Testing with random pattern: set_o_direct: Inappropriate ioctl for device
 76.86% done, 23:32:02 elapsed. (0/0/0 errors)
────────────────────────────────────────────────────────────────────────────────
Testing with random pattern: set_o_direct: Inappropriate ioctl for device
 79.90% done, 23:31:52 elapsed. (0/0/0 errors)
────────────────────────────────────────────────────────────────────────────────
  0.01% ^R0.01% done, 0:11 elapsed. (0/0/0 errors)
 81.60% done, 23:31:42 elapsed. (0/0/0 errors)
────────────────────────────────────────────────────────────────────────────────
  6.26%   6.27% done, 1:33:35 elapsed. (0/0/0 errors)
tmux ls.27% done, 1:33:37 elapsed. (0/0/0 errors)
 79.08% done, 23:30:32 elapsed. (0/0/0 errors)
[0] 0:badblocks*

does it seem ok?

2nd-in-charge · Apr 7, 2019

Ender117 said:
Been using badblocks -b 4096 -ws /dev/adaX to test 12 SAS 4TB disks, after 85 hours it just finished the first pass, which means the whole test would take ~14 days. This seems excessive to me. The calculated throughput per disk is ~15MB/s, Anyone got any idea on what might be happening?

My hardware:
Dell R620
Dual E5 2690 v2
128GB RAM
LSI 9207-8e
NetAPP DS4243 24Bay disk shelf, swapped in a HB-SBB2-E601-COMP IO module
12 4TB HGST NL SAS drives

I'm getting similar speeds with very similar setup. My h/w is
IBM x3550M4
2x E5-2650
192Gb RAM
LSI 9201-16e
Xyraltex HB-1235
12x 3Tb Seagate ES.2 SAS drives

After 44 hours completed only 75% of the 0xaa test parten.
Also getting "set_o_direct: Inappropriate ioctl for device" error when starting badblocks.

Perhaps badblocks doesn't work properly via SAS expanders?

twf85 · Apr 18, 2019

First of all, hello and thank you! Before finding this guide, my "readiness" testing consisted of the options available in SeaTools. I've never felt fully confident in a drive after those tests, but this concept should eliminate any doubt I might have when deploying a new drive.

If you don't mind, I noticed a few things I wanted to share:

There's no mention of opening an SSH connection prior to the start of commands. For the guide to be "noob-friendly", you may want to include some direction for people who are unfamiliar with SSH (even something as simple as a few words and a link to a guide that explains how to connect to a FreeNAS box via SSH, would suffice).
As a noob, I wasn't aware that it would be difficult to add su permissions to new user accounts. You may want to include a tidbit about using the "root" account to issue the commands, or that you'll at least need to issue a "su" command to type in the password for the "root" account to gain proper privileges (if a person cannot or will not use the "root" account).
The instructions to split the session are somewhat confusing. It might be easier to lead with something like, "Press CTRL+B and then SHIFT+', typing each key combination separately". For some reason, the first portion of the command put me into a "follow-along exactly" mindset. I gave up trying after awhile.
I could not get "tmux attach" to work as expected (the response was something like, "Careful nesting. Use $TMUX to force"). In failing the split session command too many times, my shell was disconnected. When "tmux attach" didn't display the test progress like I expected, I issued a new "badblocks" test on the first drive. I did this because I thought the first test wasn't currently running. After 10 minutes or so, the first drive was lagging behind the others substantially, so I checked the Reports. The first drive was running around 2000k Bytes/s, effectively double that of the other drives, leading me to believe there were two "badblocks" tests running concurrently. I know the guide said I couldn't do that, so perhaps I interpreted the Bytes/s wrong. Either way, I rebooted the system to prevent unnecessary stress to the first drive.

I'm currently testing 4x4TB WD Reds, ~12h 4m into the first pass, with about 68.7% completion in the "read and compare" phase. I, too, received the "set_o_direct: Inappropriate ioctl for device" error. This system is contained within a Norco RPC-4224 case, so it has 6 SAS expanders for the hot-swap bays, but only three of those are currently connected via reverse-breakout cables to SATA ports (on a tight budget, building as I go... First LSI card arrives tomorrow).

The drives were never added to a pool or formatted before conducting these tests, so I wonder if that could be why I'm seeing the error? If not, should I be worried at this point? I also skipped the portion of the instructions that modified the system's settings, as it didn't seem like it was a necessary step.

I can't be certain, but I do feel good about the progress so far. My system seems like it is chugging through at the expected rate, but if I won't be able to trust the results, it'll all be for not... I'm hoping the error is a red herring.

By my calculation, the full suite of tests should take roughly 74 81 hours to complete. Does that sound about right?

System info:

Norco RPC-4224
ASRock EPC612D8 SSI
Intel E5-1650V3
Samsung 2133Mhz ECC 32GB x 4 (128GB)
Samsung 970 Evo Plus 250GB x 2 (FreeNAS OS)

Pool 1: 6 x 10TB (WD EasyStore = "White Label")
Pool 2*/3*: 6 x 4TB (WD Red)

Pools 2 and 3 will each get two of the new drives I am testing, and be comprised of disks I am transitioning into this build from a JBOD setup.

Again, thank you for the write-up!

Chris Moore · Apr 18, 2019

twf85 said:
As a noob

Not exactly a noob if you have a problem with using the root account?

Take a look at this guide. It has pictures. It was written for an older version of FreeNAS but it might help you and you are certainly welcome to share it with others:

Uncle Fester's Basic FreeNAS Configuration Guide
https://www.familybrown.org/dokuwiki/doku.php?id=fester:intro

Also, have a look at these scripts:

Github repository for FreeNAS scripts, including disk burnin
https://forums.freenas.org/index.ph...for-freenas-scripts-including-disk-burnin.28/

twf85 · Apr 18, 2019

Chris Moore said:
Not exactly a noob if you have a problem with using the root account?

I'm new to FreeNAS, but I do have some experience using SSH; however, I wouldn't give myself too much credit where Linux/Unix is concerned ;)

I'll have to read through the Github, thank you for the links :)

droeders · Apr 18, 2019

twf85 said:
I could not get "tmux attach" to work as expected (the response was something like, "Careful nesting. Use $TMUX to force"). In failing the split session command too many times, my shell was disconnected.

I haven't read the guide, but the "Careful nesting..." error is because you were trying to start or attach a tmux session from within another tmux session.

kikotte · May 7, 2019

@droeders

I would probably have posted this.

Feel stressed, have 24 disks that are 8TB to burn in to test so that there are no errors on the disks.

Would you be able to check what I did wrong?

https://www.ixsystems.com/community/threads/burn-the-disc-it-cannot-work.76186/

asmodeus · Jun 6, 2019

Wanted to share a quick snippet to run badblocks on disks in parallel. It uses nohup instead of tmux to keep interactive setup minimal. For progress just watch the output files hdX-badblocks.err.

seq 0 7 | xargs -n1 -P8 -IX sh -c "nohup badblocks -b 4096 -ws /dev/daX > hdX-badblocks.out 2> hdX-badblocks.err < /dev/null &"

NasKar · Sep 10, 2019

I'm planning on upgrading my pool and have 8 free bays in my chassis. I purchased 9 x 4 TB WD RED HDD. 8 for the pool and one as a spare. My plan was to run your script on the spare and then replace it with the other 8 drives and run in a tmux session

Code:

disk-burnin.sh da8
disk-burnin.sh da9 ...

.
My system is a SuperMicro x9srl-f with a Xeon E5-2650 V2 with 64 GB RAM.
HBA HP-H220-6Gbps-SAS-PCI-E-3-0-HBA-LSI-9207-8i-P20-IT-Mode-for-ZFS-FreeNAS-unRAID
8 x 4 TB WD Reds ZFS Z2

I started the burnin on Sept 4 th at 10PM and it's still churning away. On Sunday there was a short power interruption but my UPS should have prevented any issues. Is this normal for my system and a 4 TB drive? Can I run 8 tmux sessions at the same time to burnin the remaining drives and not cause a problem for my current freenas system running at the same time?

Important Announcement for the TrueNAS Community.

Hard Drive Burn-In Testing - Discussion Thread

Contributor

Resident Grinch

Guru

Contributor

Guru

Resident Grinch

MVP

Guru

Patron

Hall of Famer

Patron

Dabbler

Explorer

Dabbler

Hall of Famer

Dabbler

Contributor

Explorer

Explorer

Guru

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Hard Drive Burn-In Testing - Discussion Thread"

Similar threads