Very slow reads with RAIDZ2 and 4x 3TB Drives

DanE · Dec 11, 2013

Hello,

I'm running FreeNAS 9.1.1 x64 with the following hardware:
Intel i3 4340
ASRock Z87 PRO4
16GB RAM
4x 3TB WD Red Drives

After I'd gotten everything installed and running, I thought I'd run a few benchmarks to see where things were at.

I found that I get fantastic write speeds:

Code:

# dd if=/dev/zero of=testfile bs=2048k count=25k
25600+0 records in
25600+0 records out
53687091200 bytes transferred in 260.237136 secs (206300654 bytes/sec)

(197 MB/s)

But read speeds, oddly, are abysmal (I killed the test after about 13 minutes):

Code:

# dd of=/dev/null if=testfile bs=2048k
^C12910+0 records in
12910+0 records out
27074232320 bytes transferred in 783.682117 secs (34547467 bytes/sec)

(33 MB/s)

I'm at a loss on how to troubleshoot this much further. I've tried running dd against the raw drives (/dev/ada1, etc) singly and at the same time (as per this thread: http://forums.freenas.org/threads/write-performance-issues-mid-level-zfs-setup.13372/) and any time I read data directly from the drives, I see throughput in the 140-150 MB/s per drive range.

When I run gstat during the ZFS write tests, all 4 drives are almost constantly at 100% utilization. During read tests, generally 1 or 2 drives will be pegged at 100%, and the other 2 or 3 will be less than 20%. I've noticed that ada2 and ada3 tend to be the ones at 100%, but it changes; they're just at 100% the most often.

Any suggestions on what to check next? Am I misreading these numbers somehow?

Any help would be greatly appreciated.

Thanks,

cyberjock · Dec 11, 2013

That is... indeed weird.

Do you have a failing disk? Normally failing disks kill performance badly, kind of like what you have.

If you want to do a teamviewer session with me you can get on IRC and I can take a quick look.

titan_rw · Dec 12, 2013

Sound similar to my issue posted here:

http://forums.freenas.org/threads/read-speeds-impacted-by-changing-vdev-max_pending.15907/

JohnK · Dec 12, 2013

DanE said:

I hate asking this, but are you sure about your "dd" command?
Shouldn't it read

dd if=testfile of=/dev/null bs=2048

or something.

titan_rw · Dec 12, 2013

Without suffix, bs=2048 probably means 2048 bytes. So yes, 2048k is correct here. bs=2m would also work.

DanE · Dec 12, 2013

cyberjock: I suspected bad hardware as well. So when I got home yesterday I had all 4 drives run a long SMART test, and they all passed. I also removed my RAIDZ2 volume and created 4 single disk volumes. Read and write performance were fine for 4 separate volumes. I can post the exact commands and output I used if you want to see it.

titan_rw: That sounds pretty much identical to my problem. I'll play around with the min and max_pending and see what happens. I don't suppose you've gained any insight into why that works since your post?

JohnK: That's exactly why I posted; to sanity check what I was doing. :) When I started testing I was using a block size of 1048576, but switched to 2048k at some point. The throughput didn't really change, so I think 2048k works.

Thanks for the input!

JohnK · Dec 12, 2013

Maybe I should be been more clear.
My understanding is that basic dd syntax is as follow:
# dd if= of= bs=
if being the input file and of meaning the output file and bs the byte size.

In your read test you changed the syntax to
#dd of= if= bs
thus changing the basic syntax of dd.

I have never seen anyone using the syntax format and wondered if it is causing a problem.

DanE · Dec 12, 2013

Here are some new results. I'm back to a RAIDZ2 volume, this time with vfs.zfs.vdev.max_pending and vfs.zfs.vdev.min_pending tunables set to 1.

Code:

[dan@rlyeh] /mnt/store# dd if=/dev/zero of=testfile bs=2048k count=25k
25600+0 records in
25600+0 records out
53687091200 bytes transferred in 266.195212 secs (201683159 bytes/sec)

(192 MB/s Write)

Code:

[dan@rlyeh] /mnt/store# dd of=/dev/null if=testfile bs=2048k
25600+0 records in
25600+0 records out
53687091200 bytes transferred in 218.128102 secs (246126431 bytes/sec)

(235 MB/s Read)

Code:

[dan@rlyeh] /mnt/store# dd if=testfile of=/dev/null bs=2048k
25600+0 records in
25600+0 records out
53687091200 bytes transferred in 218.510798 secs (245695369 bytes/sec)

(234 MB/s Read for JohnK :) )

Those tunables seem to have fixed my test cases, but what other ramifications are there? Any ideas on why that works?

I'll do some research myself and report back if I discover anything.

titan_rw · Dec 12, 2013

For dd, the order of the options doesn't matter.

I'm not positive why I needed min / max pending set to 1, but I know that was definitely the fix.

In my case, it wasn't my freenas box, it was for a friend in another country. So I have no access to it anymore. I have never used WD Red's, so I can't troubleshoot why it seems they sometimes need a tweak to min / max pending.

cyberjock · Dec 12, 2013

titan_rw said:
In my case, it wasn't my freenas box, it was for a friend in another country. So I have no access to it anymore. I have never used WD Red's, so I can't troubleshoot why it seems they sometimes need a tweak to min / max pending.

I've tried to look into that situation. There's very little documentation on the subject. About the only things I'm sure of is that if you were using some RAID controllers max pending of 1 would be bad. This is not a problem with the RAID controller but a problem with the wrong RAID controller being used for the task.

I don't know why, but I have a hunch that setting the max pending manually is "not a smart long term choice"(TM). I have no evidence to back up this hunch and I'm not sure I'll ever get enough information to actually prove or disprove my hunch.

The only good info I can find say something about this being the number of transactions that are pending on a per disk basis. Normally a HBA will just passthrough your disks. So if you have 8 disks you theoretically can have a max of 8 pending transactions for your entire pool at any given time. If you use a RAID controller using a hardware RAID you will get an amazing 1 transaction at any time for your entire pool since you made the bad choice to use a hardware RAID. In some cases, some RAID controllers in JBOD still limit all of the disks on the controller to 1 transaction at a time so performance ends up being a hybrid of RAID and HBA controllers.

Important Announcement for the TrueNAS Community.

Very slow reads with RAIDZ2 and 4x 3TB Drives

DanE

Cadet

cyberjock

Inactive Account

titan_rw

Guru

JohnK

Patron

titan_rw

Guru

DanE

Cadet

JohnK

Patron

DanE

Cadet

titan_rw

Guru

cyberjock

Inactive Account

Similar threads

Important Announcement for the TrueNAS Community.

Very slow reads with RAIDZ2 and 4x 3TB Drives

Cadet

Inactive Account

Guru

Patron

Guru

Cadet

Patron

Cadet

Guru

Inactive Account

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Very slow reads with RAIDZ2 and 4x 3TB Drives"

Similar threads