SMB/iSCSI NAS, mostly for VM images - possible optimizations?

Majo · Sep 29, 2017

Lately i'm looking for possible optimizations for a small company NAS since my users complain about iSCSI performance. I'd like to hear some expert opinions on this.

Hardware: Haswell Xeon 1230 on a Supermicro X10 board, 16 GB RAM (maximum the board supports), 6 x 4 TB WD RED in a Z2 RAID, no special ZIL or L2ARC devices, external USB drive (see below)

Software: FreeNAS 11 U1, no VMs, one jail (bacula client for LTO backups)

Configuration - the interesing part:
- utilization is about 40%
- CIFS, SMB and NFS services active
- one NFS export (not a dataset, just a directory, not in use anymore)
- 6 SMB/CIFS shares (datasets)
- iSCSI - see below. iSCSI uses a dedicated Gbit VLAN.
- about 1500 snapshots for all the SMB datasets, no snapshots for iSCSI extent datasets
- ZFS replication to a USB 3.0 mounted 8 TB disk (just for backup/security reasons) - i have to use USB since all my SATA lanes are full.

Originally we used the NAS as a CIFS Fileserver only, but needed iSCSI at some point ("what do you do with all the storage anyway?").

Now our VirtualBox server uses the NAS to store all the VM images. It mounts one iSCSI target with all the VM images in it. We run about 10 VMs that use this target and about 5 with another target.

All in all there are 4 machines that use iSCSI, 2 for VM images, 1 for the mailserver (boxes, queue, etc.) and 1 just for data. All of the VM iSCSI targets are in a common dataset. Each target is split into several extents, because we wanted to test network aggregation but ultimately didn't use it. The targets are still split into LUNs. All the extents are files.

I just discovered tha my LUN/RPM settings for the extents are incorrect. Is this an issue?

We don't have any issues with sequential reads or writes. But with several active VMs with heavy IO, performance decreases noticable, e.g. when compiling on several machines in parallel. ARC size is about 12G, Arc hit ratio is quite constant at about 27%

Once per minute I see a demand_metadata ARC quest spike, each at about 2000 requests. That used to be 4000 until i deleted about half of the snapshots and adjusted auto-snapshots accordingly.

EDIT: When i deleted half of the snapshots (ca 1500) we noticed that iSCSI performance was noticeably better!

Before i throw switches randomly i'd like to hear the community. Perhabs there's a best practice for this scenario. I thought about:
- SMB/CIFS is given; i can't change anything about that.
- for a seperate ZIL/L2ARC SSD the NAS hasn't enough RAM and i can't add more.
- should i use one extent per target instead?
- should i use one iSCSI target per VM instead?
- should i use one dataset per target? All of these shouldn't make much difference from what i've read.
- should i use devices instead of files for the extents? From several threads in this forum i gathered that it shouldn'nt make much difference in speed either.
- Maybe it's not a iSCSI issue but a fragmentation issue. Should i copy/replace the images regularly?

Are there test tools a can run to locate the bottelneck? I know about iostat, zilstat, arstat etc, but have no idea about iSCSI debugging.

As you can see i'm a newbie when it comes to iSCSI. So sorry for the dumb questions.

-Walter

Important Announcement for the TrueNAS Community.

SMB/iSCSI NAS, mostly for VM images - possible optimizations?

Majo

Dabbler

Similar threads