Something isn't adding up.

Status
Not open for further replies.
Joined
Jun 13, 2013
Messages
25
Hi all,

I've been using ZFS for a few years on NexentaStor CE and finally got fed up of being ignored so have moved to FreeNAS and have done a lot more reading on ZFS in the meantime, specifically how it deals with performance when getting close to it's size limit (I'm at 88% out of 10.9TB), so I started looking at my setup, but I can't get things to add up.

All my data is all on 5 separate ZVOLS, all of which were created sparse (4x2TB and 1x1TB) and connected to hosts via iSCSI.

Code:
[root@freenas] ~# zfs list
NAME                    USED  AVAIL  REFER  MOUNTPOINT
store                  6.40T  761G  180K  /mnt/store
store/homeserver_main1  1.46T  761G  1.46T  -
store/homeserver_main2  1.46T  761G  1.46T  -
store/homeserver_main3  1.46T  761G  1.46T  -
store/homeserver_main4  1.46T  761G  1.46T  -
store/icrashplan        566G  761G  566G  -


So a total of 6.4TB used and in theory 9 allocated.

But...

Code:
[root@freenas] ~# zpool list
NAME    SIZE  ALLOC  FREE    CAP  DEDUP  HEALTH  ALTROOT
store  10.9T  9.60T  1.29T    88%  1.00x  ONLINE  /mnt


Where has the 9.6TB come from? I've suddenly lost 3TB? Or does that include the total allocated amount of the ZVOLS? If so, why 9.6TB and not 9TB? And why am I seeing 761GB free when I actually 1.29TB free in the pool?

I've clearly done something wrong/weird... But what? :)

The bottom line to all this was - do I need to delete some stuff off the host that has these drives mounted via iSCSI as I'm getting close to 90% utilization?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Sparse is dangerous for iSCSI. You can get into a situation where ZFS fills and is unable to allocate blocks, which basically appears as write errors to the iSCSI initiator, which can turn into an unrecoverable mess.

The 9.6TB is the amount of used space. You have a pool consisting of 10.9TB of actual storage. Would that be six 2TB drives in RAIDZ2? Your "zfs" command lists 6.4TB used and the pool is showing that to be consuming 9.6TB, which is a 50% increase in space, and 4TB drives haven't been available for all that long, so it seems to me like RAIDZ2 on six 2TB drives, otherwise it must be RAIDZ1 on three 4TB drives.

The free space reporting would be consistent with a one-third parity arrangement as well.

ZFS with iSCSI will not work well at 88% utilization. Especially as you write things to the pool, vdev fragmentation will increase rapidly, and such a high utilization percentage dramatically increases the stress and rate at which fragmentation will occur. The normal ZFS rule of thumb to keep utilization less than 80% is probably insufficient for iSCSI. I suspect a better number would be around 60%, to reduce the pace at which fragmentation occurs.

Of course, if you have an application that only writes stuff out to the iSCSI vdev disk once, and is then almost exclusively reads, that would weigh heavily in favor of a higher safe percentage utilization. As well, the presence of sufficient ARC and possibly L2ARC can substantially offset the issue of fragmentation, by reducing the number of IOPS that need to be fulfilled by the pool.
 
Joined
Jun 13, 2013
Messages
25
Thanks for the reply!

A bit of history (and more detail) might be useful here to explain how I got to where I am now, as it's been bit of a trek.
  • Birth - N36L with 8GB RAM and 4x2TB disks, running NexentaStor CE. RAIDZ1 with 1 hot spare.
  • Teenage years - Added an N40L running OpenFiler and 3x1TB disks via iSCSI, again in RAIDZ1, so now we have two vdevs in a single pool.
  • Twenties - 3x1TB disks replaced with 3x2TB disks, pool is now 7x2TB disks, split across 2 vdevs in RAIDZ1 and 1 hot spare. At this point, all data is mounted under two CIFS shares.
  • Thirties - Got fed up with rubbish CIFS performance and ACL fights between Windows and OSX on the same share. Created thin ZVOLS (as I didn't have sufficient space for thick ones), mounted them via iSCSI to a W2K12 server and copied everything across. Set pool and ZVOLS to sync writes always. Everything runs fine (but not hugely fast). Remote OpenFiler drives are the bottleneck.
  • Mid-life crisis (where I am now) - Upgrade time! N36L gets 16GB, dedicated (Intel) NIC for iSCSI to the w2K12 server (also with dedicated Intel NIC), added eSATA card and 4 bay enclosure, migrated drives off OpenFiler to the enclosure and moved to FreeNas. Also got a 64GB SSD as it was cheap and was thinking of using it as SLOG or L2ARC.

I now have my N36L, with all disks local with 16GB ram. My usage is mainly sync reads or writes - the pool is just photos, backups, my iTunes library etc. so it's write once in one go, then lots of reads.

So, to get back on topic... I didn't realise the zpool list output was raw, rather than usable. Makes sense to me now, I have 7.2TB usable (3.6TB per vdev x 2) which is 10.8TB raw. I clearly need to clear some space and reduce what's in use, but (getting back to the original question), if I remove files from the NTFS filesystem on the W2K12 server, will FreeNas know that space is now free and reduce the "inuse" portion of the pool... Or is it the case that it's now been allocated to the ZVOLS, and can't be reclaimed? With this in mind could I, in theory, wipe clean the filesystem on the W2K12 server but still have a 90% full pool on my FreeNas box?

How do I get myself out of this mess? :)
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
You've just described the problem every SSD has, and why TRIM was invented. By using a SAN protocol, you've removed the ability of ZFS to understand the filesystem organization of the data you're storing.

SCSI was designed to handle communications between a host and a storage device, and is basically a fairly simple "store block X"/"read block X" protocol.

iSCSI does almost the same thing, just over IP.

An operating system sees an iSCSI device as a block level storage device. So it just pushes blocks at it. To remove a file, typically an operating system merely deletes the directory entry and updates the free list metadata blocks to indicate that the file's blocks are no longer allocated.

Worse, even if you were to write all zeroes to the block, ZFS will not re-"sparse" the block, even though this could be a useful operation under some circumstances. Sadly, ZFS uses a different block allocation strategy than just all-512-byte blocks anyways.

So. Couple of thoughts.

1) You could enable compression on one of the zvol's. Then go on to the W2K12 server, and fill every last free byte on the corresponding filesystem with zeroes. This will force a fresh write of the blocks that NTFS believes is free, and compression should be fairly efficient, especially if you use zle. Then turn off compression (optionally). As long as there was more than a trite amount of space to recover, that ought to help. If it does - then do it on the remainder of them.

2) You can create a new zvol and copy over all the filesystem level data. Since you don't have enough space for a non-sparse zvol this is kind of a lot of trouble to go to, especially several times.

3) You could break down and do what you really ought to in order to go about things this way, which is to buy some larger drives, and get rid of the sparse. Be sure to get enough disk to comply with the 80% rule. You might be able to accomplish a similar thing through reducing the number of volumes you have, I don't know.

But really, if you want to make best use of ZFS, the problem here is that you are using it as a SAN in an application where the tradeoffs do not appear to add up. People who use ZFS as an iSCSI SAN for an ESXi cluster... that makes sense. It is a cluster-aware filesystem and multiple initiators necessitate the use of iSCSI. But here, what's the value of what you are doing, over simply attaching the disks directly to your W2K12 server with a good RAID controller?

I know you already tried CIFS and were disappointed. Are you sure you wouldn't be better off with using something like NFS?
 
Joined
Jun 13, 2013
Messages
25
I think you've basically crystallized what has been going round in my head in various forms - that iSCSI wasn't the way to go and I need bigger disks :) Problem was I was fed up with a couple of problems and, at the time, iSCSI represented a solution. However now I'm 12 months down the road... Hindsight is a wonderful thing!

NFS looks like a good trade off, I can mount a dataset on the windows box and gradually copy files until I can remove a ZVOL, then the next one etc.

Thanks very much for your help!
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
You will find that ZFS with NFS provides a very pleasant experience, unless perhaps you have a client that isn't fully compatible. For example, on OS X, I had to bludgeon file locking support to off due to occasional problems. Be sure to use multiple datasets if you want to be able to set quotas or keep track of space on your filer. Best to experiment for a bit. Also be aware that FreeNAS 9 will have different NFS support.
 
Joined
Jun 13, 2013
Messages
25
I've just discovered you need "mount -o nolock" to make windows NFS work. I was about to make another post till I saw a footnote in the manual :)

I never managed to figure out the magic set of mount commands on OSX to make NFS work consistently on NexentaStor, again part of the reason for going iSCSI to the windows server then SMB from there. I'll happily trade performance for "it just working".

Looks like I'll be reconfiguring my XBMC boxes to point to another share yet again..!
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Seriously, then, try seeing if FreeNAS 9 will work with Win NFS locking. Any NFS admin will tell you that locking interoperability is the bane of NFS.
 
Joined
Jun 13, 2013
Messages
25
Not sure I'm ready for a beta yet! The limited bit of info I could google about NFS in FreeNAS 9 didn't mention anything that looked like it might be useful?

And I can easily imagine it's a horrible thing to try and make work across many systems and implementations. One of the reasons for not doing NFS originally was I just didn't understand most of it!
 
Status
Not open for further replies.
Top