ESX / iSCSI - Improve 4k Read / Writes, Latency & Optimize for large reads

Status
Not open for further replies.

msignor

Dabbler
Joined
Jan 21, 2015
Messages
28
Hi Guys,

Let me preface with the fact that I am not using any L2ARC or ZIL, have a pretty well spec'd system (IMHO) that I am not looking to replace but completely open to adding hardware to what I have.

- Supermicro X10 w/ I3-4130T, 32GB ECC
- Lenovo SA120 DAS 6.0GB, w/ IT Flashed LSI 12.0GB Card
- 12x 2TB 7200RPM arranged as single Raid-Z2

This is mostly a storage backend for media so that is why I didn't go for mirrors. I needed capacity more than performance. I only have two gigabit links dedicated to iSCSI because I know that my limiting factor will be my RAID configuration as a Z2, I will only get the performance of a single disk. I attached a screenshot of the disk benchmark, after running the a workload over night. I wanted to present a picture not being completely fresh.

However, I have gotten into BURSTcoin mining to take advantage of some 12TB of free space just for fun. The workload is that you generate large samples of data and then the "mining" is the ability to read back that data, looking for matching blocks. So, you are reading multiple 2TB files as fast as possible before someone else finds a matching block and lets the network know.

I seem to have great performance but as soon as whatever buffer fills up it turns to crud. What do I mean... well, my issue is that when I have "burstable" types of data, such as writing this random workload, I get what I would expect regarding performance. (Roughly, 80MB/sec writes).. However, when doing the reads I get maybe 10-15MB/sec tops. When doing disk benchmarks, I also max out my Dual gigabit iSCSI bandwidth.. so from a configuration perspective I am at a bit of a loss since ESX looks good, performance looks OK for a low power workload.

When I am doing 4k benchmarks, I end up around 10MB/Sec which is also not what I would expect but seems in line with the performance I am getting in the application. Oh yeah, the iSCSI is terminated in ESX, and presented to a single VM as an attached disk.

When I reboot everything and clear the caches in FreeNAS - I get "ideal" benchmarks of approx 200MB/Sec reads and writes. but same 10-15MB 4k Performance.

When looking at some statistics, my pool is still under 80% utilized, my ARC is full at 27.5GB and my hit ratio is at 93% under the workload. There is only 1VM attached to these disks.

So my questions are:
- Is this normal based on a 12 drive Raid-Z2?
- From what I am reading, adding a ZIL or L2ARC wouldn't really help here, true?
- My RAM seems to be spot on, for a total of 24TB unformatted disk, 16TB usable.
- What can I do to improve my 4k reads and writes? Again, am I stuck due to the inherent speed of the pool?
- I understand the latency is a killer and ethernet has a huge chunk of overhead, aside from going to 10G (which I suspect would be the same latency since it too is ethernet), can I do anything to improve my latency? In Windows, I get at most 500-600ms latency when under load.
- What hardware can I throw at this to make it better? Is there anything within reason or should I consider a new build dedicated for this?

Ultimately, my goal is to get another shelf and dedicated all of the disks as single drive iSCSI LUN's back to ESX from FreeNAS

Any advice is welcome!

Thanks!!
 

Attachments

  • Untitled-1.fw.png
    Untitled-1.fw.png
    122.3 KB · Views: 543

m0nkey_

MVP
Joined
Oct 27, 2015
Messages
2,739
Is this normal based on a 12 drive Raid-Z2?
You're better off with mirrored vdevs if you're using FreeNAS as a ESXI datastore. The more pairs you have, the more IOPS you have. Adding a SLOG device may also help with writes, however you can usually get away with sync=off in a home environment with a good UPS.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
The more pairs you have,
It is really about the number of vdevs, it is just easier (less costly from a number of drives perspective) to get high vdev count by using pairs of drives.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
- 12x 2TB 7200RPM arranged as single Raid-Z2
You must not have done much investigation before setting your system up. I recall reading documentation when I set my system up that recommended not going over a certain number of drives in RAIDz2 and the optimal number was 6. That is probably all different now. It has been years.
What is not different is that you need to break that into multiple vdevs (instead of just one big vdev, like it is now) if you want to get better performance. This means you will have to move all your data somewhere else, blow the configuration and begin again.
 

m0nkey_

MVP
Joined
Oct 27, 2015
Messages
2,739
It is really about the number of vdevs, it is just easier (less costly from a number of drives perspective) to get high vdev count by using pairs of drives.
After reading Allan Jude's book, FreeBSD Mastery: ZFS, you really want to be using mirrored vdevs for the most IOPS. RAIDZn will only be as fast as the slowest disk.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
After reading Allan Jude's book, FreeBSD Mastery: ZFS, you really want to be using mirrored vdevs for the most IOPS. RAIDZn will only be as fast as the slowest disk.
For the workload he has, where the OP needs high rate random reads, that is absolutely true. You can't beat mirrors for high speed random IO.
When you have a large volume of storage where the reads are sequential more so than random, that is when RAID-z(some number greater than 1) is more appropriate.

Ultimately, my goal is to get another shelf and dedicated all of the disks as single drive iSCSI LUN's back to ESX from FreeNAS
What you probably need to do is get that other shelf and make it all mirrors and just use your primary storage the way you originally intended, movies, etc. The 12 disk in a single vdev is never going to give you the speed you need for your usage. Don't forget to bump your RAM up too.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
my goal is to get another shelf and dedicated all of the disks as single drive

Depending on how much data you are looking at putting in there, you could go with something like this:
http://www.ebay.com/itm/NetApp-DS42...OM3-Controllers-4x-Power-Supply-/182762049338
comes with 24 drives 450 MB each (small I know) but if you set them up as a pool of mirrors, it should give you about 5TB of usable space and the IO should be pretty nice; I estimated 1900 MB/s... My math could be all wrong on that, I don't claim to be perfect.
They are enterprise grade 15k SAS drives that are being retired because they are low capacity, but how much capacity do you need for this workload?
If I recall, the SAS drives from NetApp are formatted with a different size block, but they can be reformatted to change the block allocation, instructions can be found online.
 

msignor

Dabbler
Joined
Jan 21, 2015
Messages
28
For the workload he has, where the OP needs high rate random reads, that is absolutely true. You can't beat mirrors for high speed random IO.
When you have a large volume of storage where the reads are sequential more so than random, that is when RAID-z(some number greater than 1) is more appropriate.


What you probably need to do is get that other shelf and make it all mirrors and just use your primary storage the way you originally intended, movies, etc. The 12 disk in a single vdev is never going to give you the speed you need for your usage. Don't forget to bump your RAM up too.

Thanks. I did research about the max drives per ZVOL and most people said 12 is the absolute max, with less being preferred. Again, initial purpose was storage capacity and media.

However the part that really perplexes me and I have yet to really completely understand is that benchmarking the array, seems completely normal and accurate. When I get a strong workload it turns to hell, but even then my reads are still way above 100MB/Sec. How would I identify what cache gets filled and how do I augment it? I know that with a 2TB file, it's not filling into RAM completely. There is definitely disk I/O going on. From what I have read, I am actually par the course for 32GB of ram (max) where my ARC hits are 90-93%. Is the issue I need more ARC?

Would more ZVOL's increase my 4k read and writes? I am curious why those are so low too.
 

msignor

Dabbler
Joined
Jan 21, 2015
Messages
28
After reading Allan Jude's book, FreeBSD Mastery: ZFS, you really want to be using mirrored vdevs for the most IOPS. RAIDZn will only be as fast as the slowest disk.

My problem is that I am not getting consistent results for even that of a single disk. A single disk capable of 100+MB Reads and Writes would be fine by me for this. But it just doesn't seem to be the case.
 

m0nkey_

MVP
Joined
Oct 27, 2015
Messages
2,739
Have you tried turning off sync to see if it improves anything?

zfs set sync=off tank/zvol
zfs set sync=disabled tank/zvol

Replace tank/zvol with your pool and ZVOL name.

Edit: Fixed command ;) Thanks @Stux
 
Last edited:

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Thanks. I did research about the max drives per ZVOL and most people said 12 is the absolute max, with less being preferred. Again, initial purpose was storage capacity and media.

However the part that really perplexes me and I have yet to really completely understand is that benchmarking the array, seems completely normal and accurate. When I get a strong workload it turns to hell, but even then my reads are still way above 100MB/Sec. How would I identify what cache gets filled and how do I augment it? I know that with a 2TB file, it's not filling into RAM completely. There is definitely disk I/O going on. From what I have read, I am actually par the course for 32GB of ram (max) where my ARC hits are 90-93%. Is the issue I need more ARC?

Would more ZVOL's increase my 4k read and writes? I am curious why those are so low too.
You have your terminology confused. More vdevs (virtual devices) in the pool make the pool faster.
Here is an article that might help explain. http://constantin.glez.de/blog/2010/06/closer-look-zfs-vdevs-and-performance

I thought I already told you that having one big RAIDz2 vdev with all 12 drives clumped together will make it slow. There is another discussion about the same topic in another thread right now:
https://forums.freenas.org/index.ph...file-vs-folder-s-transfers.57738/#post-407639
 

msignor

Dabbler
Joined
Jan 21, 2015
Messages
28
You have your terminology confused. More vdevs (virtual devices) in the pool make the pool faster.
Here is an article that might help explain. http://constantin.glez.de/blog/2010/06/closer-look-zfs-vdevs-and-performance

I thought I already told you that having one big RAIDz2 vdev with all 12 drives clumped together will make it slow. There is another discussion about the same topic in another thread right now:
https://forums.freenas.org/index.ph...file-vs-folder-s-transfers.57738/#post-407639

Thanks. I am not trying to be dense about the vdev and 12 disks. I just though that "slow" was relative in the sense that you don't get the combined iops of multiple drives but you would at least get the iops of one single drive. Am I mis-interpreting that? Is there more to it that perhaps I am missing? Will check out those links now.

Would you be able to help me identify why after clearing the caches on FreeNAS (Start and stop the iscsi service) would yield different results? I know a cache and how it's faster etc. etc. but is that where something like a S3700 SSD or something would benefit? Is this also why 4k read and writes are garbage?

EDIT: OK. After reading the linked materials, I guess I have to just accept this for being what it is in the sense of a single disk worth of performance. I suppose I had an expectation that if a disk can do 100MB Writes and 200MB Reads then I should be able to achieve that throughput regardless of process and ultimately everything I am seeing is due to RAM and buffers. I just can't wrap my head around how reading back would be so horrible in this case. I can copy GB's worth of files over the LAN at gigabit speed with SMB no problem. (Say a 64GB file for example) With iSCSI I get literally half that speed in writes.

I also don't get why if I start with a clean slate and iSCSI host service, I get good performance. Is this again, back down to the disks? Again, I know this is pointless but other people are mining this coin with 14 drives over USB3, and getting 50-60MB/Sec Reads per drive. That is why I initially said that I would pass the disks 1:1. No loss of data and same gain of IOPS. Don't really care about the data it's garbage anyway so losing a disk is no big deal.
 
Last edited:

msignor

Dabbler
Joined
Jan 21, 2015
Messages
28
Have you tried turning off sync to see if it improves anything?

zfs set sync=off tank/zvol

Replace tank/zvol with your pool and ZVOL name.
Just Tried - No improvement.

EDIT: BTW, my options were standard, always, disabled
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
More vdevs improves random io, Raidz2 vdevs have good sequential io.

Another alternative is to just get a bunch of SSDs for your 2TB data file.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419

msignor

Dabbler
Joined
Jan 21, 2015
Messages
28
More vdevs improves random io, Raidz2 vdevs have good sequential io.

Another alternative is to just get a bunch of SSDs for your 2TB data file.
got it. I unfortunately have about 6 or 7 2TB files, so going to have to go for a sli
Goes to show how often I don't set that parameter :D

I figured as much :smile: but I did try and actually and realized I didn't provide any information past that..

When going disabled, I actually seem to get more "real world" kind of performance. Around the 60MB/Sec point on writes. When I put it back to standard, I get the same.

When I restart the iSCSI service on FreeNAS I get over 150MB/SEC writes (see pic) - but I know it's just caching somewhere.

Is that caching in RAM, or somewhere else? How can I expand it? Also any comment about 4k performance?
 

Attachments

  • Capture.PNG
    Capture.PNG
    27.7 KB · Views: 536

m0nkey_

MVP
Joined
Oct 27, 2015
Messages
2,739
OK, so the fact we have established that sync=disabled works for you, you might be a perfect candidate for a SLOG device. While the pool you have set-up is sub-optimal (mirrored vdevs will give you better random IO), you can at least improve it with a SLOG.

A SLOG device however cannot be any old SSD. Ideally, it must have power loss protection in order to safeguard your data from (you've probably guessed already), power loss. I suggest you take a look at Intel's SSD offerings as some of theirs have this feature.
 

msignor

Dabbler
Joined
Jan 21, 2015
Messages
28
OK, so the fact we have established that sync=disabled works for you, you might be a perfect candidate for a SLOG device. While the pool you have set-up is sub-optimal (mirrored vdevs will give you better random IO), you can at least improve it with a SLOG.

A SLOG device however cannot be any old SSD. Ideally, it must have power loss protection in order to safeguard your data from (you've probably guessed already), power loss. I suggest you take a look at Intel's SSD offerings as some of theirs have this feature.

So after setting this, whatever write speeds I had went to total hell. At least before I had bursts of 80MB/sec and now I can't seem to break 5 megs / sec. I set it back to standard, and now it's the same. Very odd.

The part that I don't understand now is that I can write to the iSCSI disks, inside the same windows VM from a local SSD, at 80/90 MB/sec but when writing and reading this plot data it is horrible. Is this related to sequential vs. non sequential and the underlying raid pool type still?

Also, how can I confirm what the current sync setting is?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
The speed you are talking about, "80/90 MB/sec", is much slower than it could be if you didn't have those 12 disk laid out as a single large RAID-z2 vdev. I have 12 disks in both of my systems and laid them out as two vdevs for the purpose of greater speed. On my 10GB network they will transfer (large files) around 550MB/s and they completely max the wire speed on the 1GB network. Many small files slow everything down, even when you have high capacity for sequential transfer, small files kill the performance. So, the thing that may be causing you more trouble than you realize is the kind of file access you are dealing with. I know you said it is a big file, but you also said the goal is to search for data inside the file and that changes things because it isn't like just moving one big file, which would be fast, it is more the kind of work of accessing many small files.
Like @Stux said above, you might need to put the data for this on SSD or a bunch of mirrors like I suggested to get reasonable performance.
Also a SLOG device might help. Your current hardware is just not going to do it, not without making some changes.
 
Status
Not open for further replies.
Top