iSCSI Target Issues?

Status
Not open for further replies.

mattlach

Patron
Joined
Oct 14, 2012
Messages
280
Hey all,


I am attempting to use FreeNAS as an iSCSI target, but having a few issues, and I was hoping someone might be able to help me.

1.) Significant reduction in write speed over iSCSI.

If I use dd to test speed locally from the FreeNAS console I wind up with about 220MB/s reads and 170MB/s writes (I have 4 3TB WD Green drives in RAIDz2 mode)

If I mount the iSCSI Target on mt Linux box suing the intiator, I still wind up getting 220MB/s reads, but my writes are only 79MB/s

What could be slowing down my writes? I am using 10gig ethernet on both sides on a network dedicated to this iSCSI traffic. 10gig ethernet should be good for up to 1280MB/s (less overhead).

My theory is that iSCSI is rather CPU dependent on the FreeNAS side, and that my RAIDz2 writes are already pretty CPU intensive, maybe I need more cores to be able to do both at the same time? This would make sense, as it only appears to affect writes, not reads, and reads use a lot less CPU.

Thoughts?

2.) I'm getting some sort of odd iSCSI errors in the FreeNAS console:

8143338891_40463bc8b5_o.jpg


Does anyone know what these might be?

Are they cause for concern?

Could they be contributing to my slow speeds in 1.) above?

Much obliged,
Matt
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
You're probably constrained by IOPS on those drives more than anything else. Check with iostat or gstat to see how busy things are on the drives when you're writing. Writes will be more intensive because of the need to maintain parity information; reads for a RAID are inherently easier and generally faster if you can read one copy successfully (but if you're degraded, then that'll suck too).
 

mattlach

Patron
Joined
Oct 14, 2012
Messages
280
That can't be right.

Maybe I didn't explain the issue properly.

Locally from FreeNAS Console:

dd if=/dev/zero of=testfile bs=1024k count=250k (write test)
Results in 295MB/s ~85% CPU load

dd if=testfile of=/dev/null bs=1024k count=250k (read test)
Results in 436MB/s ~80% CPU load

Remotely from iSCSI initiator connected via 10gig Ethernet:

dd if=/dev/zero of=testfile bs=1024k count=250k (write test)
Results in 99MB/s ~67% CPU load on FreeNAS Box

dd if=testfile of=/dev/null bs=1024k count=250k (read test)
Results in 223MB/s ~72% CPU load on FreeNAS box.


So, same array, same disks, same freenas install, one locally, and one remotely via iSCSI. Despite being via 10gig ethernet, it is surprisingly slow remotely.

Any thoughts?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
There's a potentially significant difference between your two scenarios there.

With the "locally" you are writing to a ZFS file, meaning that ZFS is able to intelligently allocate space and just write out the data with no significant overhead, few (if any) additional seeks, etc. This is an ideal situation.

With your "Remotely" scenario, you are not writing to a raw disk device on the client, which means that your remote client is running a filesystem abstraction layer inside of the ZFS file that your FreeNAS is maintaining. This means there are numerous performance-affecting issues to consider. Is ZFS as efficient at rewriting blocks within the extent file as it is at just writing a fresh file? (I don't think it is) On the client side, do you know for a fact that the client isn't introducing additional seeks, trying to search its filesystem free block list for available space? Remember, your client is maintaining its own filesystem within that extent file, so you have two separate filesystem layers at work.

You cannot just make up random tests and expect that their results will be meaningful. Any time you are introducing seeks into linear speed tests, you are corrupting the results in unpredictable ways. Every time a drive seeks, it has to stop what it was doing (i.e. writing), go off to find another block somewhere else, deal with that, and seek back. It only takes a few of these per second to substantially reduce your I/O throughput. Some filesystems are worse than others at this sort of thing as well, and you've not given us any idea what filesystem you used on your "initiator."

So, do some useful tests. Oh, and give us a bit of an idea as to your hardware as well, because my alarm bells go off when I see high CPU load, something that should be uncommon on contemporary fileservers.

I suggest:

1) Run iperf in server mode on the FreeNAS. Run iperf in client mode on your iSCSI initiator. Post the speed AND cpu percentage used. That'll help identify some external influences that we normally assume aren't an issue, but could be.

2) Run your dd from the initiator on the raw disk device, not on a file on some random filesystem built on top of the raw disk device. Maybe even tell us a little about the initiator, because some initiators have funky options that can affect performance.

3) In order to more fairly compare ZFS performance, first dd a large file out to your FreeNAS system from the console like you did. Note the numbers. Then repeat the test, but use the option "conv=notrunc". If your FreeNAS pool is pretty empty, these numbers should be similar, but if it's pretty full, the second one may be slower. It's also more representative of what you're doing with iSCSI.

But really, you should do

4) what I first suggested, because it's by far the most likely issue, which is to check to see if you're constrained by IOPS. Does gstat report the disks >80% busy when you're writing via iSCSI? If so, you're constrained by IOPS. We've seen this again and again on 4-drive RAIDZ2's. They're great for resiliency but not so great for high performance.
 

mattlach

Patron
Joined
Oct 14, 2012
Messages
280
There's a potentially significant difference between your two scenarios there.

With the "locally" you are writing to a ZFS file, meaning that ZFS is able to intelligently allocate space and just write out the data with no significant overhead, few (if any) additional seeks, etc. This is an ideal situation.

With your "Remotely" scenario, you are not writing to a raw disk device on the client, which means that your remote client is running a filesystem abstraction layer inside of the ZFS file that your FreeNAS is maintaining. This means there are numerous performance-affecting issues to consider. Is ZFS as efficient at rewriting blocks within the extent file as it is at just writing a fresh file? (I don't think it is) On the client side, do you know for a fact that the client isn't introducing additional seeks, trying to search its filesystem free block list for available space? Remember, your client is maintaining its own filesystem within that extent file, so you have two separate filesystem layers at work.

Thank you for taking time to explain this. I figured there would be a small overhead due to these very reasons, I was not expecting something this large.

You cannot just make up random tests and expect that their results will be meaningful. Any time you are introducing seeks into linear speed tests, you are corrupting the results in unpredictable ways. Every time a drive seeks, it has to stop what it was doing (i.e. writing), go off to find another block somewhere else, deal with that, and seek back. It only takes a few of these per second to substantially reduce your I/O throughput. Some filesystems are worse than others at this sort of thing as well, and you've not given us any idea what filesystem you used on your "initiator."

And that's why I'm asking the question :) I made my best guess at what would be an appropriate test to diagnose the issue, and then posted here with it, hoping that someone with more experience might have a better idea of what to try.

So, do some useful tests. Oh, and give us a bit of an idea as to your hardware as well, because my alarm bells go off when I see high CPU load, something that should be uncommon on contemporary fileservers.

But ZFS RAIDz arrays are notorious for their CPU usage, particularly when writing, no? Particularly since I am using a RAIDz2 configuration with two parity drives.


I suggest:

1) Run iperf in server mode on the FreeNAS. Run iperf in client mode on your iSCSI initiator. Post the speed AND cpu percentage used. That'll help identify some external influences that we normally assume aren't an issue, but could be.

Did this yesterday, but I did not monitor CPU use. I'll try again, and look for CPU use, but I'm not thinking network CPU use is the issue, as I'm actually seeing MORE CPU use when I do my speed test locally, than when I do it remotely.

The iPerf test was a little disappointing though. I only got just over 2gig speeds on 10gig ethernet, but even so, at 2gig speeds (which should translate to 256MB/s less overhead) that can explain why my read speeds are much lower than the 400+MB/s locally, but can not explain why reads are only at 99MB/s.

Maybe this is a silly thing to assume, but my assumption has been that if it were a network limitation, it would be a network limitation in both directions, not just in one.

2) Run your dd from the initiator on the raw disk device, not on a file on some random filesystem built on top of the raw disk device. Maybe even tell us a little about the initiator, because some initiators have funky options that can affect performance.

I didn't realize I could point a dd to a raw disk device. I thought it needed to have a file. I will try this again. Thank you for that suggestion.

The initiator I have been using is open-iscsi under Ubuntu Server 12.04 LTS. I just used default settings to get it up and running, have not played with them much (at all). I formatted the file system using ext4.

3) In order to more fairly compare ZFS performance, first dd a large file out to your FreeNAS system from the console like you did. Note the numbers. Then repeat the test, but use the option "conv=notrunc". If your FreeNAS pool is pretty empty, these numbers should be similar, but if it's pretty full, the second one may be slower. It's also more representative of what you're doing with iSCSI.

Thank you, appreciate the suggestion. I'll give it a shot. What does the conv=notrunk option do?

But really, you should do

4) what I first suggested, because it's by far the most likely issue, which is to check to see if you're constrained by IOPS. Does gstat report the disks >80% busy when you're writing via iSCSI? If so, you're constrained by IOPS. We've seen this again and again on 4-drive RAIDZ2's. They're great for resiliency but not so great for high performance.

Thank you for that suggestion. Now that you explained the differences this makes more sense to me. When you fiurst suggested this my thought was "why would my IOPS be any different locally vs. remotely", but I hadn't factored in any extra seeks that iSCSI may have introduced.

I wasn't familiar with gstat. I will give it a try and see.

My drives are by no means IOPS monsters. They are consumer WD Green drives, but there are six of them. (please ignore the figures from my first post at the top, I added two more drives and rebuilt the volume since then, so the figures in my second post are the currently accurate ones.


I really appreciate all your help.

I'm going to diagnose and see what I find from a learning perspective, but at this point I am considering dropping iSCSI all together. I originally opted for iSCSI as I found it interesting, and something I'd never played with before, so more or less an opportunity to learn something, but I really don't need what iSCSI has to offer for this system, and will likely just fall back to NFS or SMB instead, as iSCSI seems to imply rather large efficiency hits, and I'd rather have an efficiently running system.

My setup currently looks like this:

FreeNAS Guest on VMWare ESXi --(vmxnet3 10gig virtual ethernet)-->
VMWare 10gig Vswitch (Dedicated to iSCSI traffic only) --(vmxnet3 10gig virtual ethernet)-->
Ubuntu 12.04 LTS Server guest on VMWare ESXi --(SAMBA via direct I/O Mapped Intel copper gigabit Server NIC)-->
HP ProCurve 1810G-24 managed gigabit switch --> Clients.


The server has the following specs:
- 8 core AMD FX-8120 (3.1Ghz base clock, 3.4Ghz turbo with all cores loaded, 4.0Ghz max turbo) with 4 cores assigned to FreeNAS.
- 32GB RAM (22GB assigned to FreeNAS, as I understand FreeNAS Loves and will use all the RAM you can give it.)
- The 6 Drives in the volume are all hooked up via a direct I/O Mapped IBM M1015 SAS RAID controller controller flashed with LSI's JBOD IT firmware

I know AMD systems are not the fastest these days, but I figured for these purposes it ought to be fast enough.


I chose this method, as I already had all the permissions and shares set up on my Ubuntu install, and liked them the way they are, and I use my Ubuntu install for lots of stuff, and find working with BSD atrocious compared to modern Linux distributions. I figured the performance hit from going this route would be minimal, but I guess I was wrong.

Instead I'll probably just do a simple NFS share to Ubuntu via the internal virtual network, and share with the clients via SMB directly from FreeNAS.

Thanks for all the suggestions! Getting someone knowledgeable and experienced with these things is exactly what I needed.

--Matt
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Thank you for taking time to explain this. I figured there would be a small overhead due to these very reasons, I was not expecting something this large.

Overhead in any NAS or SAN system can waste a significant amount of your hardware's potential capacity. It's been a battle for many years, and software and hardware workarounds have been around roughly forever. Consider, for example, Sun/Legato's PrestoServe for NFS acceleration... NFSv2, late '80's, rapidly inspires PrestoServe (~1989).

And that's why I'm asking the question :) I made my best guess at what would be an appropriate test to diagnose the issue, and then posted here with it, hoping that someone with more experience might have a better idea of what to try.

Yeah, but then when I gave you a likely answer, because I have experience with some very similar system (WD Green 2TB in a 4 disk RAIDZ2) where we've seen unexpectedly high IOPS to the pool as a cause of many problems, you opted to skip the easy test to see if that might be the issue. And I started with that because it WAS easy, and if it was the problem then this question would basically be at its end.

But ZFS RAIDz arrays are notorious for their CPU usage, particularly when writing, no? Particularly since I am using a RAIDz2 configuration with two parity drives.

I haven't noticed that as much as I've noticed that 4-drive RAIDZ2's seem to suck the IOPS. Really, with today's modern CPU's, with multiple cores, I wouldn't be expecting to see it as a significant issue. Now I will also comment that our old storage servers, single-core Opteron 240's, suffered unusually much under ZFS, and turning features like compression on would result in horrifying catatonic states, basically because between the RAID computations and the compression and the file transfer protocol, yes, the CPU was full-out busy. But there were also some worst-case mitigating factors in all that..

The iPerf test was a little disappointing though. I only got just over 2gig speeds on 10gig ethernet, but even so, at 2gig speeds (which should translate to 256MB/s less overhead) that can explain why my read speeds are much lower than the 400+MB/s locally, but can not explain why reads are only at 99MB/s.

You will absolutely be capped at what iperf is able to wrangle out of your network. However, iperf may be able to wrangle a variety of things out of your network. There may not be one best answer. Are you using jumbo frames? TOE? Interrupt coalescing or polling? All these things change the dynamics.

Maybe this is a silly thing to assume, but my assumption has been that if it were a network limitation, it would be a network limitation in both directions, not just in one.

Only given two systems that are identical in all their particulars on both sides.

I didn't realize I could point a dd to a raw disk device. I thought it needed to have a file. I will try this again. Thank you for that suggestion.

You need to do that to minimize the effects of any other layers if at all possible. Sticking a filesystem layer in there has the potential to be very messy. Or it could make not-much-of-a-difference. Point is, it's hard to know.

Thank you, appreciate the suggestion. I'll give it a shot. What does the conv=notrunk option do?

notrunc makes it not truncate the file. In an iSCSI extent, you are rewriting existing data blocks. In your original dd test, you were just allocating new blocks and writing to them. The latter is likely to be a faster codepath for ZFS, and the reallocation - especially on a fullish filesystem - has the potential to be slower. Again, this is about trying to bring the comparison closer together.


Thank you for that suggestion. Now that you explained the differences this makes more sense to me. When you fiurst suggested this my thought was "why would my IOPS be any different locally vs. remotely", but I hadn't factored in any extra seeks that iSCSI may have introduced.

Not iSCSI, but rather your ext4 filesystem on top of the extent. Now quite frankly I wouldn't expect it to make a huge difference. But again, why chance it? Whittle away the irrelevancies until you get down as far as you can.

But the fundamental truth here is that iSCSI is going to add a certain amount of overhead, because instead of just writing to a disk via ZFS locally, you're needing to push that data over the network and then get an acknowledgment, and there's a block-level protocol involved, and buffers and queues and caches, so this will never be as fast as some other things you might be able to do. Of course, many of those things are tunable and there are no promises that they're already optimal for your setup, so your setup can probably be made faster.

I have to cut this short for right now.
 
Status
Not open for further replies.
Top