iSCSI vs SMB Performance, What am I doing Wrong?

jakehansen · Mar 29, 2020

First and foremost, I want to say the standard "long time lurker, first time poster". Considering this, I understand from my experience with the forum how important it is to do your homework and research, research, research before you post anything. I feel as if I have done my due diligence as I have researched the question that I have extensively and have learned a lot, but have not come to a conclusive answer/solution. I appreciate any help and am well aware I have quite a few things I probably need to learn, but I am open to learning.

With that in mind, here goes.

I currently am running FreeNAS on a Dell Poweredge R420 (specs below) and I access it via iSCSI from an ESXi machine (the ESXi machine is the only initiator). For disks, I have 4 7200 RPM SAS 2TB drives configured as mirrored VDEVs in one pool. Currently, I have no L2ARC or SLOG (this is part of my question to come). My iSCSI zvol is configured at 50% capacity of the entire pool, to help performance because of fragmentation. Currently, my fragmentation is at 0% since I have erased my pool and started over to try to solve this problem.

This setup has worked great for me for about a year. On the ESXi box, I have one datastore residing on the presented iSCSI drive, and my VMs use that datastore solely for file storage. The boot volumes for the VMs are on a separate, SSD backed RAID. I was consistently maxing out a gigabit connection on both sequential reads and writes on my VMs. Now, I recently upgraded my ESXi host and FreeNAS box to 10Gb. The ESXi host and FreeNAS box are connected directly to each other via the 10Gb nics. The problem I see now is my writes are stuck around 110 MB/s, but reads are fine as they are maxing out the 10g connection. Below, I'll provide all the troubleshooting I have done.

First, it's helpful to know the maximum write speed of my pool. I setup a test datastore with compression disabled in order to run a quick dd test on the FreeNAS box. My results show around 220 - 250 MB/s when copying a 20 GB file with 1024k block size. This is performance I expect for my mirrored VDEVs.

Next, I made sure I tested the 10Gb connection between my FreeNAS box and ESXi box using iPerf3. This test showed I was sustaining a connection of around 9.41 Gbps. Exactly the performance I expect.

Next, I tested the performance of the datastore I created on the presented iSCSI volume by running dd on the datastore from the ESXi box. Here, I get around 110 MB/s. So at least I know there isn't an issue between any of the VMs and the datastore since the problem shows up when writing directly to the datastore from ESXi.

Since I researched this problem a ton, I decided to adjust a few different settings but none of them helped. First, I followed the iSCSI best practice guide for FreeNAS (can't remember the link, but if you search for it you can find it). I tried jumbo frames, making sure the vSwitch and the nics all are set at 9000 mtu. I tried disabling delay ACK. Tried a few different tunables. Nothing was helping.

Just to keep my sanity, I created an SMB share on the same pool as the iSCSI zvol and connected to it via one of my VMs. There, I was getting around 220 - 250 MB/s sequential write. So, I know my network and storage array are capable of these speeds.

Just for fun, I tried reconfiguring my pool to stripe just two disks together and present that over iSCSI. When doing this, I now got 220 - 250 MB/s! Now I'm really confused.

As a side note, no matter what protocol I use (iSCSI or SMB) and no matter what configuration I have my VDEVs in (2 mirrored VDEVs or 2 stripped disks), my sequential writes always start out around 900 MB/s, saturating my 10Gb connection. This speed only lasts for a couple of seconds, obviously, since this speed is due to FreeNAS's cache, but it further proves that my network and the protocols I am choosing are capable of the speeds I am expecting.

I've read all the posts about iSCSI and really focused on jgreco's responses and those posts were helpful. I even tried resetting my expectations about what performance I can get out of my current hardware. The thing that is bothering me is that I know my array and network are capable of the speeds I am expecting, but iSCSI is posing some sort of problem (it seems). What really bothers me is that I'm getting the speeds I am expecting over iSCSI when two disks are stripped, just not on a pool of mirrored VDEVs between all 4 of my disks.

To me, it seems like my mirrored VDEVs are causing some sort of problem over iSCSI, since I get the expected performance out of 2, stripped disks over iSCSI. Unfortunately, I just don't know what that problem is.

I stated earlier that I know I might need a SLOG and potentially L2ARC. From reading about what SLOG improves for iSCSI, I'm struggling to understand how it will help me in my situation. From what I've read, it seems that SLOG is only useful in an iSCSI configuration if sync writes are being used. In my case, I'm not using sync writes, so I don't think SLOG would help me. Currently, I'm satisfied with my read speeds, so I'm not sure that I'm ready to add L2ARC.

In the coming weeks, however, I plan on getting more RAM as I'd like to bump it up to at least 64GB. I don't think my RAM is my limitation right now, since I can get 200 - 250 MB/s sustained sequential writes in some configurations, as stated above.

Any direction would be appreciated!

Specs for FreeNAS box:

Motherboard: Whatever motherboard comes with the R420
Processors: 2 x 6 core 2.20 Ghz E5-2430
RAM: 24 GB DDR3 ECC RAM
Disks: 4 x Seagate ES.3 Constellation 2TB SAS
HBA: Dell H310 Mini flashed to IT mode (LSI 9211-8i)
NIC: Broadcom 57810S

jgreco · Mar 30, 2020

I realize you said

jakehansen said:
Ive read all the posts about iSCSI and really focused on jgreco's responses and those posts were helpful.

but just to make sure, be sure to have seen

https://www.ixsystems.com/community...res-more-resources-for-the-same-result.28178/

https://www.ixsystems.com/community/threads/the-path-to-success-for-block-storage.81165/

as these may have particular relevance. Offhand I don't know what to make of the performance difference you are seeing between two drives in a stripe and two mirror vdevs. I would expect the mirrors to be slightly slower just due to the added overhead. You seem to have a handle on the sorts of things you should be looking at.

Possibly useless suggestions:

1) See if you get good performance when you use the OTHER two drives in a stripe, just to rule out "bad drive" problems.

2) Run gstat while writing to the mirrors to see if there are obvious single-device bottlenecks.

3) Run solnet-array-test on the disks to see if you can identify a slow disk (less likely IMHO)

Be aware that SAS drives can persist settings such as RCD and WCE (see mode page 8) so it's entirely possible that if you sourced used drives from different places that they can perform differently.

Other than that, bearing in mind that I'm half asleep right now, I'm not coming up with anything too helpful. You seem to be doing lots of the things I'd be doing, so I kinda think you'll figure it out sooner or later.

jakehansen · Mar 30, 2020

but just to make sure, be sure to have seen

https://www.ixsystems.com/community...res-more-resources-for-the-same-result.28178/

https://www.ixsystems.com/community/threads/the-path-to-success-for-block-storage.81165/

Got it. Took a look at both of those. The first link I've read multiple times. The second link, however, I have not seen! There was a lot of useful information in that thread. Particularly, this stuck out to me:

While there are optimizations you can do to make it suck less, the fact is that a RAIDZ vdev tends to adopt the IOPS characteristics of the slowest component member. This is partly because of what Avi calls "seek binding", because multiple disks have to participate in a single operation because the data is spread across the disks. Your ten drive RAIDZ2 vdev may end up about as fast as a single drive, which is fine for archival storage, but not good for storing lots of active VM's on.

I know I am not using RAIDZ, but the part specifically about the slowest component member part made me question the performance of my individual disks.

1) See if you get good performance when you use the OTHER two drives in a stripe, just to rule out "bad drive" problems.

I did source the disks used, on eBay, and judging from the SMART stats they are all a little over 70,000+ power-on hours, which I know is ripe. Based of of your recommendation, I tried different combinations of stripped disks. Whatever combination of two stripped disks I use, they all perform at 200-250 MB/s. I did order brand new 4TB nearline disks today as I am in need of higher capacity drives anyway. I'll have some time to test them later this week once they arrive. I'm hoping that my used drives are the source of my issues. In any case, I'm going to continue troubleshooting until I receive the new disks.

2) Run gstat while writing to the mirrors to see if there are obvious single-device bottlenecks.

I ran gstat while running CrystalDisk Benchmark on my mirrored VDEV pool. All four of my drives were hovering around 90-100%.

3) Run solnet-array-test on the disks to see if you can identify a slow disk (less likely IMHO)

Going to run solnet over the next day or so and see what info I can find!

Be aware that SAS drives can persist settings such as RCD and WCE (see mode page 8) so it's entirely possible that if you sourced used drives from different places that they can perform differently.

Checked all the drives, both write and read caches are enabled.

UPDATE:
So I've been editing this post and keeping it basically as a log of my troubleshooting journey today. Just this evening, I wiped all of the disks and did a quick dd test on them bs=1024k count=10000. All of my disks performed fairly well at around 140 MB/s EXCEPT for my last disk, da3. During the dd test, I watched it start out at around 140 MB/s and creep all the way down to about 60 MB/s. Eventually it worked its way back up to around 90 MB/s. Gstat showed this drive at 98% the whole time. Lo and behold, after running SMARTCTL -a on this disk:

Code:

Elements in grown defect list: 1600

I believe I *might* have found the source of my issue: a dying drive.

I'm a little embarrassed at this point because after doing some research, having 1600 bad sectors is terrible, and a strong indicator that it is time to replace the drive. This brings up a few questions though.

1) Why is the SMART health status still showing as "OK" with that many elements in the defect list?
2) For the future, at what number of elements in the defect list should I start to consider replacing the drive? 1? 10? 50? There seems to be no solid answer for this that I've been able to find other than "any number of elements in the defect list is a sign of failure".
3) For the drive that is failing, why is it still performing okay in a stripe with any other disk?

And jgreco, thanks for your reply and insight, I truly appreciate it. I now realize I should have tested my disks individually and looked at the SMART data myself before doing any additional troubleshooting. This would have saved me a lot of time and you from replying to another iSCSI performance post! I guess I was relying on FreeNAS to tell me it found a problem with a drive so I figured I could skip looking at the SMART stats.

I'm hoping my new disks bring me the performance I'm looking for! Also ordered 64GB of RAM today just to be safe.

jgreco · Mar 31, 2020

jakehansen said:
1) Why is the SMART health status still showing as "OK" with that many elements in the defect list?

Does it have SMART? SMART is actually a SATA thing, though apparently some SAS drives do support a variant. Be aware that the smartmon tools have the ability to process and present some data from SCSI modepages, so the fact that smartmon runs isn't necessarily proof it supports SMART. I haven't played with this enough to be able to give you a better answer.

2) For the future, at what number of elements in the defect list should I start to consider replacing the drive? 1? 10? 50? There seems to be no solid answer for this that I've been able to find other than "any number of elements in the defect list is a sign of failure".

That second sentence is wrong, conditionally. There are two defect lists, one is the manufacturer's defect list, which documents blocks known to be bad as shipped from the factory. This one is just peachy. Drives typically have some bad sectors as shipped. It is the grown defect list that is concerning if it seems to be growing.

Now the thing that's worth understanding is that drives are expected to occasionally develop new defects, because sometimes a surface just isn't entirely perfect. So typically you set ARRE/AWRE as desired and/or force manual reallocation, and if a disk suddenly developed a bad sector, or a short run of bad sectors, or even maybe a bad "spot" on the disk where you had what appeared to be short runs on adjacent tracks, it would be okay to remediate that and get on with life without freaking out. It's when this is happening at large scale that this is a problem.

So if you check the size of the grown defect list, and then a month later it seems to have increased, and then another month later it's gone up again, that's kinda bad, and at some point you might consider deeming the disk broken.

It's also worth noting that this is the sort of failure environment that ZFS was developed in, and designed to cope with. It's why scrubs are a thing, for example.

3) For the drive that is failing, why is it still performing okay in a stripe with any other disk?

Don't know. It could be that you haven't actually found your problem.

jakehansen · Apr 1, 2020

jgreco said:
Does it have SMART? SMART is actually a SATA thing, though apparently some SAS drives do support a variant. Be aware that the smartmon tools have the ability to process and present some data from SCSI modepages, so the fact that smartmon runs isn't necessarily proof it supports SMART. I haven't played with this enough to be able to give you a better answer.

That second sentence is wrong, conditionally. There are two defect lists, one is the manufacturer's defect list, which documents blocks known to be bad as shipped from the factory. This one is just peachy. Drives typically have some bad sectors as shipped. It is the grown defect list that is concerning if it seems to be growing.

Now the thing that's worth understanding is that drives are expected to occasionally develop new defects, because sometimes a surface just isn't entirely perfect. So typically you set ARRE/AWRE as desired and/or force manual reallocation, and if a disk suddenly developed a bad sector, or a short run of bad sectors, or even maybe a bad "spot" on the disk where you had what appeared to be short runs on adjacent tracks, it would be okay to remediate that and get on with life without freaking out. It's when this is happening at large scale that this is a problem.

So if you check the size of the grown defect list, and then a month later it seems to have increased, and then another month later it's gone up again, that's kinda bad, and at some point you might consider deeming the disk broken.

It's also worth noting that this is the sort of failure environment that ZFS was developed in, and designed to cope with. It's why scrubs are a thing, for example.

Don't know. It could be that you haven't actually found your problem.

This is all very helpful info. I definitely learned some new things about what to look for going forward in terms of disk failure. I do have a quick update about my issue. TL;DR - The new disks solved my problem. The pesky disk that was performing badly seems to have been the issue.

Right after I installed the new drives, I configured them as two mirrored VDEVs just as before. Now, I'm getting right around the 250 MB/s per second I am expecting during my workloads. SMB is even a little faster than that which is a nice perk.

I've attached a few of screenshots as proof. The first test is Crystal Disk Mark over iSCSI on a VM, notice the 32 GB test size which far exceeds the amount of RAM I have available. The sequential reads and writes look really good. Random 4K is obviously a lot slower -- currently learning why this is and how to improve, but that is a topic for another time.

The second screenshot is a real world test of copy a ~45GB movie over iSCSI. The final screen shot is the same file transferred over SMB.

With all of that being said, I've decided to move to SMB for my workload. iSCSI only seems useful in FreeNAS if you throw a bunch of resources at it. I have local storage available for all of my VMs anyway, so not a huge loss.

jgreco · Apr 2, 2020

jakehansen said:
iSCSI only seems useful in FreeNAS if you throw a bunch of resources at it.

Yup. I got a little blowback over that "iSCSI often requires more resources" post, but I've had enough people say what you just did that I'm pretty sure it isn't my imagination.

Important Announcement for the TrueNAS Community.

iSCSI vs SMB Performance, What am I doing Wrong?

jakehansen

Cadet

jgreco

Resident Grinch

jakehansen

Cadet

Attachments

jgreco

Resident Grinch

jakehansen

Cadet

Attachments

jgreco

Resident Grinch

Similar threads

Important Announcement for the TrueNAS Community.

iSCSI vs SMB Performance, What am I doing Wrong?

jakehansen

Cadet

jgreco

Resident Grinch

jakehansen

Cadet

Attachments

jgreco

Resident Grinch

jakehansen

Cadet

Attachments

jgreco

Resident Grinch

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "iSCSI vs SMB Performance, What am I doing Wrong?"

Similar threads