Very slow reads with RAIDZ2 and 4x 3TB Drives

Status
Not open for further replies.

DanE

Cadet
Joined
Dec 11, 2013
Messages
3
Hello,

I'm running FreeNAS 9.1.1 x64 with the following hardware:
Intel i3 4340
ASRock Z87 PRO4
16GB RAM
4x 3TB WD Red Drives

After I'd gotten everything installed and running, I thought I'd run a few benchmarks to see where things were at.

I found that I get fantastic write speeds:
Code:
# dd if=/dev/zero of=testfile bs=2048k count=25k
25600+0 records in
25600+0 records out
53687091200 bytes transferred in 260.237136 secs (206300654 bytes/sec)


(197 MB/s)

But read speeds, oddly, are abysmal (I killed the test after about 13 minutes):
Code:
# dd of=/dev/null if=testfile bs=2048k
^C12910+0 records in
12910+0 records out
27074232320 bytes transferred in 783.682117 secs (34547467 bytes/sec)


(33 MB/s)

I'm at a loss on how to troubleshoot this much further. I've tried running dd against the raw drives (/dev/ada1, etc) singly and at the same time (as per this thread: http://forums.freenas.org/threads/write-performance-issues-mid-level-zfs-setup.13372/) and any time I read data directly from the drives, I see throughput in the 140-150 MB/s per drive range.

When I run gstat during the ZFS write tests, all 4 drives are almost constantly at 100% utilization. During read tests, generally 1 or 2 drives will be pegged at 100%, and the other 2 or 3 will be less than 20%. I've noticed that ada2 and ada3 tend to be the ones at 100%, but it changes; they're just at 100% the most often.

Any suggestions on what to check next? Am I misreading these numbers somehow?

Any help would be greatly appreciated.

Thanks,
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
That is... indeed weird.

Do you have a failing disk? Normally failing disks kill performance badly, kind of like what you have.

If you want to do a teamviewer session with me you can get on IRC and I can take a quick look.
 

JohnK

Patron
Joined
Nov 7, 2013
Messages
256
Hello,
Code:
# dd of=/dev/null if=testfile bs=2048k
^C12910+0 records in
12910+0 records out
27074232320 bytes transferred in 783.682117 secs (34547467 bytes/sec)

(33 MB/s)
quote]
I hate asking this, but are you sure about your "dd" command?
Shouldn't it read
dd if=testfile of=/dev/null bs=2048
or something.
 

titan_rw

Guru
Joined
Sep 1, 2012
Messages
586
Without suffix, bs=2048 probably means 2048 bytes. So yes, 2048k is correct here. bs=2m would also work.
 

DanE

Cadet
Joined
Dec 11, 2013
Messages
3
cyberjock: I suspected bad hardware as well. So when I got home yesterday I had all 4 drives run a long SMART test, and they all passed. I also removed my RAIDZ2 volume and created 4 single disk volumes. Read and write performance were fine for 4 separate volumes. I can post the exact commands and output I used if you want to see it.

titan_rw: That sounds pretty much identical to my problem. I'll play around with the min and max_pending and see what happens. I don't suppose you've gained any insight into why that works since your post?

JohnK: That's exactly why I posted; to sanity check what I was doing. :) When I started testing I was using a block size of 1048576, but switched to 2048k at some point. The throughput didn't really change, so I think 2048k works.

Thanks for the input!
 

JohnK

Patron
Joined
Nov 7, 2013
Messages
256
Maybe I should be been more clear.
My understanding is that basic dd syntax is as follow:
# dd if= of= bs=
if being the input file and of meaning the output file and bs the byte size.

In your read test you changed the syntax to
#dd of= if= bs
thus changing the basic syntax of dd.

I have never seen anyone using the syntax format and wondered if it is causing a problem.
 

DanE

Cadet
Joined
Dec 11, 2013
Messages
3
Here are some new results. I'm back to a RAIDZ2 volume, this time with vfs.zfs.vdev.max_pending and vfs.zfs.vdev.min_pending tunables set to 1.

Code:
[dan@rlyeh] /mnt/store# dd if=/dev/zero of=testfile bs=2048k count=25k
25600+0 records in
25600+0 records out
53687091200 bytes transferred in 266.195212 secs (201683159 bytes/sec)

(192 MB/s Write)

Code:
[dan@rlyeh] /mnt/store# dd of=/dev/null if=testfile bs=2048k
25600+0 records in
25600+0 records out
53687091200 bytes transferred in 218.128102 secs (246126431 bytes/sec)

(235 MB/s Read)

Code:
[dan@rlyeh] /mnt/store# dd if=testfile of=/dev/null bs=2048k
25600+0 records in
25600+0 records out
53687091200 bytes transferred in 218.510798 secs (245695369 bytes/sec)

(234 MB/s Read for JohnK :) )

Those tunables seem to have fixed my test cases, but what other ramifications are there? Any ideas on why that works?

I'll do some research myself and report back if I discover anything.
 

titan_rw

Guru
Joined
Sep 1, 2012
Messages
586
For dd, the order of the options doesn't matter.

I'm not positive why I needed min / max pending set to 1, but I know that was definitely the fix.

In my case, it wasn't my freenas box, it was for a friend in another country. So I have no access to it anymore. I have never used WD Red's, so I can't troubleshoot why it seems they sometimes need a tweak to min / max pending.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
In my case, it wasn't my freenas box, it was for a friend in another country. So I have no access to it anymore. I have never used WD Red's, so I can't troubleshoot why it seems they sometimes need a tweak to min / max pending.

I've tried to look into that situation. There's very little documentation on the subject. About the only things I'm sure of is that if you were using some RAID controllers max pending of 1 would be bad. This is not a problem with the RAID controller but a problem with the wrong RAID controller being used for the task.

I don't know why, but I have a hunch that setting the max pending manually is "not a smart long term choice"(TM). I have no evidence to back up this hunch and I'm not sure I'll ever get enough information to actually prove or disprove my hunch.

The only good info I can find say something about this being the number of transactions that are pending on a per disk basis. Normally a HBA will just passthrough your disks. So if you have 8 disks you theoretically can have a max of 8 pending transactions for your entire pool at any given time. If you use a RAID controller using a hardware RAID you will get an amazing 1 transaction at any time for your entire pool since you made the bad choice to use a hardware RAID. In some cases, some RAID controllers in JBOD still limit all of the disks on the controller to 1 transaction at a time so performance ends up being a hybrid of RAID and HBA controllers.
 
Status
Not open for further replies.
Top