ISCSI irregular performance

Status
Not open for further replies.

Reinier

Cadet
Joined
Dec 12, 2015
Messages
7
Hi, I recently started with ESXI, Freenas, pfsense and some windows vm's.
I develop software and offer some services (online backup) and was looking for an energy efficient way to separate my data from the customer's.
Thus I got a xeon D1540, 32G DDR4 ECC, a LSI SAS 3008 and 3x 4T WD Red Pro drives and a SSD to boot from and store the VM's on.
So far so good :)
Freenas installed, SAS card flashed to IT 9 version (green light in freenas) and created a CIFS share and a ZVOL (ISCSI). The ISCSI is passed on to ESXI and can be given to one VM as a drive (once I get the problem solved, I will pass this to the DMZ VM).
The CIFS is a share accessible from the LAN (where the freenas server is located as well).
This way my data and the customer's data are nicely separated.
The VM has 8G ram, Freenas has also 8G ram. I tested with 16x1G zip files. The network controller is an intel I350 (driver igb, 1500MTU).

The problem:
ISCSI seems to give irregular performance.

I did some tests: (and please tell me if the performance is as expected or too low and where I should look to find the cause!)

SSD-SSD same pc: 145 MB/s
SSD pc1 -> SSD VM 112MB/s (network speed)
SSD VM -> CIFS share: 250-260MB/s
CIFS share -> SSD VM: 150-200 MB/s
SSD VM -> ISCSI: 20-250 MB/s *irregular
ISCSI -> VM: 20-250 MB/s *irregular
CIFS share -> SSD PC1: 80-90 MB/s (over the network)

The most concerning is the irregular ISCSI speed.
upload_2015-12-13_11-15-8.png


If you look at the disk transfer rate, you see several times it has 0 speed.

How do I go about solving this?

Thanks for the help,

Reinier
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
So if you've got 3 x 4TB drives, you've got, what, an 8TB RAIDZ1 pool? We don't really recommend RAIDZ1 because there's a lot of opportunity for loss of redundancy. Also, it creates a single vdev, so performance tends to be reduced.

The big burst of speed suggested by the first part of that copy graph is saying that the NAS is accepting data at a pretty good clip and then hits a point where the pool isn't keeping up. You have a transaction group being committed, another fills, things stall until the first one's committed, repeat. Classic sawtooth graph results.

How full is this pool? For block storage purposes, you REALLY want your pool to remain at less than (possibly far less than) 50% capacity for optimal performance. Also, you really want your RAM to be sized at a MINIMUM of 2GB-per-TB of the pool size, preferably more like 5GB-per-TB if you have a smallish amount of RAM. My guess is that you're badly breaking these rules.

You can cause things to be somewhat better by reducing the txg parameters; this results in smaller txg's and more even write performance, but also usually somewhat lower write performance. Set vfs.zfs.txg.synctime_ms to 200 and vfs.zfs.txg.timeout to 1 (don't suggest playing with these numbers).
 

Reinier

Cadet
Joined
Dec 12, 2015
Messages
7
Hi Jgreco,
Thanks for your input.

I know RaidZ1 is not ideal, and I am considering getting one more drive to switch to RaidZ2. This does create a single VDev as well, right?
The current VDev contains only 500G of data (3,47T free space).
The CIFS share contains 1,1T of data (1,71T free space).

I will try your solution.

I take it this only effects the ISCSI and not the CIFS performance?

Looking at CIFS performance I get these graphs:

SSD VM-> CIFS: 250 MB/s
upload_2015-12-13_11-59-22.png


and CIFS -> SSD
upload_2015-12-13_12-0-40.png


So sawtooth events here, and a constant transfer rate.
(the sawtooth in the pic above is the SSD so thats ok).

Therefore I would think that the harddisk can handle the speeds just fine, and it is the ISCSI that goes wrong?

I agree that more ram would be nice for freenas to have, but I think 8G should be enough for the 7,5T storage pool.
For a test I could increase it to 16G, but I would prefer it use that ram for the other VM's.

The CIFS to CIFS speed is really bad as well: 38 MB/s
upload_2015-12-13_12-14-2.png


The boost at the start is the 8G ram that freenas has as cache?

Greetings and thanks for any input/comment/advice!
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Hi Jgreco,
Thanks for your input.

I know RaidZ1 is not ideal, and I am considering getting one more drive to switch to RaidZ2. This does create a single VDev as well, right?

Yes.

The current VDev contains only 500G of data (3,47T free space).

You may mean "zvol" here. Your pool is built out of vdevs. You have just the one vdev. If you've created a 4TB zvol, that's about as much space as you should use on the pool, at least if you need performance to remain vaguely acceptable. It will get slower over time as ZFS has to work harder to find free ranges of disk space to allocate.

With my previous statement in mind, I'd note that you don't have any more space for:

The CIFS share contains 1,1T of data (1,71T free space).

How this will impact performance depends on what's been done to the pool so far.

I will try your solution.

I take it this only effects the ISCSI and not the CIFS performance?

No, your pool's performance affects both, but ZFS will find it easier to optimize the CIFS.

Looking at CIFS performance I get these graphs:

SSD VM-> CIFS: 250 MB/s
View attachment 9546

and CIFS -> SSD
View attachment 9547

So sawtooth events here, and a constant transfer rate.
(the sawtooth in the pic above is the SSD so thats ok).

Therefore I would think that the harddisk can handle the speeds just fine, and it is the ISCSI that goes wrong?

Again, https://forums.freenas.org/index.ph...res-more-resources-for-the-same-result.28178/

I agree that more ram would be nice for freenas to have, but I think 8G should be enough for the 7,5T storage pool.

You're allowed to think what you wish. But let me put it this way: My VM filer here has a 14TB pool that delivers 7TB of usable space. I've chosen to give it 128GB of RAM.

For a test I could increase it to 16G, but I would prefer it use that ram for the other VM's.

Of course you would, as would we all. The upside to ZFS is that you can get some amazing performance out of it. The downside is that you have to throw mad amounts of resources at it.
 

Reinier

Cadet
Joined
Dec 12, 2015
Messages
7
I used tuneables to set these two parameters: vfs.zfs.txg.synctime_ms to 200 and vfs.zfs.txg.timeout to 1 using sysctl and rebooted.

ISCSI->SSD VM:
upload_2015-12-13_12-59-43.png



SSD VM-> ISCSI: 80MB/s
upload_2015-12-13_13-1-19.png


Just to test the memory requirements, I gave freenas 20G instead of 8G:

ISCSI->SSD VM: 110 MB/s. Nice and constant disk transfer rate.

upload_2015-12-13_13-21-22.png



Terrible performance when I copy from SSD VM to ISCSI target:
Note the transfer rate. Ps the total data being copied is less than the RAM on the freenas...
upload_2015-12-13_13-17-29.png


How can I find out what causes these delays? The device is busy, but the transfer rate is 0.
 

Attachments

  • upload_2015-12-13_12-37-40.png
    upload_2015-12-13_12-37-40.png
    20.6 KB · Views: 242
  • upload_2015-12-13_13-14-57.png
    upload_2015-12-13_13-14-57.png
    305.8 KB · Views: 244

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
What's the fragmentation like on the pool? ("zpool list")
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
I don't see a SLOG device in your configuration.

You can watch the disk IO performance on Freenas at the CLI with "zpool iostat -v Datastore". This will give you per drive performance stats. Keep an eye on it while making a transfer. You can also try running "zilstat" to see if there is a hangup there.
 

Reinier

Cadet
Joined
Dec 12, 2015
Messages
7
So I thought, let's see how a mirrored configuration would do. (don't laugh!)
So I updated the bios etc on the LSI card to version 10 and the RE and created a mirrored drive.
That performance was totally bad (32 MB/s).
I then checked each drive individually, and they do 154 MB/s

This confirms that the time spent on freenas is not wasted :)

So I reinstalled the version 9 firmware and reinstalled freenas.

About the SLOG and ZIL, what devices should I get to implement these?
Can you give me a brand and size/type? Will any SSD do?

Thanks.
 

Attachments

  • upload_2015-12-13_20-31-45.png
    upload_2015-12-13_20-31-45.png
    272 KB · Views: 254

Reinier

Cadet
Joined
Dec 12, 2015
Messages
7
I got an additional harddisk, so I changed the config to a z2.
I an happy with the performance on CIFS, but not quite so happy still when I use ISCSI.
I dont like the drops in transfer rate.



upload_2015-12-18_10-56-46.png

ZIL:
upload_2015-12-18_10-56-28.png


It looks as though the writing of the zil causes the drop in performance. That does make sense.

Looking at the latencies:
upload_2015-12-18_11-6-52.png


Would a SLOG take care of these latencies/drops in transferrate?

And what size should the SLOG be? Is a 60G SSD drive suffiecient, or should I go for a 250G SSD drive?
SLC SSD's are out of my price range though...

Or can I influence this by some tunable?
You mentioned: vfs.zfs.txg.synctime_ms to 200 and vfs.zfs.txg.timeout to 1
<warning noob question ahead> What type do I set these? Loader, rc.conf or Sysctl?

Thanks for the feedback,

Reinier
 

Attachments

  • upload_2015-12-18_11-0-38.png
    upload_2015-12-18_11-0-38.png
    186 KB · Views: 260

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
The txg.timeout in more recent versions of FreeNAS may already have been lowered to 1. AFAICT it was always stupid to have it much larger; larger is good for some specific uses but smaller gives more consistency.

Open up a CLI and do "sysctl vfs.zfs.txg.timeout" to check. It's possible to modify this on the fly with "sysctl -w vfs.zfs.txg.timeout=1" from the CLI and as such it'd count as a "sysctl" style tunable.

For iSCSI, the thing to check is whether or not you're writing sync. You can do "zfs get sync ${poolname}/${zvolname}" to check; "zfs set sync=always ${poolname}/${zvolname}" to force sync on, or set sync=disabled to force it off. A SLOG device only helps with sync writes, so if your pool is slow with sync=disabled, a SLOG can only hurt you further. A SLOG only helps where you are pushing out lots of sync writes to a pool that honors them, and where there's no SLOG.

The Intel 750 400GB is probably the most budget-friendly SLOG device.

But I'm guessing the real issue is RAIDZ. iSCSI with RAIDZ is almost always a bad performer because the thing that ZFS shines at is storing files in long runs of blocks. Block storage is always more dicey, as you're updating little teeny blocks here and there, and this is forcing ZFS to micromanage small allocations, which translates into lots of I/O on lots of drives for even the smallest write.
 

Reinier

Cadet
Joined
Dec 12, 2015
Messages
7
Thanks for the tips.

The timeout (default) is set to 5. I changed it to 1.

Sync is set to Default
upload_2015-12-18_15-2-8.png



When I disable the sync (just for the test), I see this:
upload_2015-12-18_15-25-27.png

The latency on ESXI monitor is maxed at 100ms

I must state that the copy action shown here is from the CIFS (freenas) to ISCSI (over esxi to freenas).
The speed seems to be fluctuating still:
upload_2015-12-18_15-30-43.png


Setting the vfs.zfs.txg.synctime_ms to 200 results in this:
upload_2015-12-18_15-38-18.png


Lets reset the sync to default:
upload_2015-12-18_15-44-15.png


What is clear is that when sync=default

When there are lots of small files, performance drops drastically:
upload_2015-12-18_15-50-55.png


Any suggestions? or things I can try?
 

Attachments

  • upload_2015-12-18_15-28-1.png
    upload_2015-12-18_15-28-1.png
    161.8 KB · Views: 263
  • upload_2015-12-18_15-34-2.png
    upload_2015-12-18_15-34-2.png
    145.6 KB · Views: 257
Status
Not open for further replies.
Top