96% used, should have been far less ? (how is all calculated ?)

Johan Palsson · Jan 28, 2016

Hello,

I am a bit confused, have been migrating data from a physical server with 4 physical data disks, one volume at 2TB per disk (a total of 8TB data formatted as NTFS)

My target is a Proliant Microserver with 4 physical 4TB disks, no RAID configured (best handled by ZFS), 10 GB RAM.
On this machine i have a running installation of "FreeNAS-9.3-CURRENT-201503161938"

The 4 disks are visible in Freenas GUI and are set up as a RAIDZ1
After that 4 zvols where created with the size 2T

These volumes are shared with iSCSI, 4 volumes at all.

No problem to mount them at the source server, partition them and formatting them at 2TB size.

No problem to write to them also, good performance and everything were going smoothly, i thought.

But when i had copied around 7.8TB from the source server, suddenly all iSCSI disks dropped.
When i look into the Freenas GUI i see "CRITICAL: The capacity for the volume 'Datavolym' is currently at 96%, while the recommended value is below 80%."

But.., why ?
And, how is the "Used" calculated in the volume view ? (see below)

I myself had thought that four 4TB disks in RAIDZ1 would be sufficient to store 8TB of data, but apparently (with my configurations) it was not enough.

With best regards

Johan

pirateghost · Jan 28, 2016

Well, based on the math here, you have over allocated your pool.
http://doc.freenas.org/9.3/freenas_sharing.html#extents

Do you have snapshots enabled also?

jgreco · Jan 28, 2016

RAIDZ1 is not a good choice for storing block data because you end up losing a lot of space to parity. Parity in ZFS does not work the way it does in RAID5. It's based on the blocks written. You can cause ZFS to start consuming extra disk space at a frantic rate with some RAIDZ configurations by writing small blocks of data.

I expect that's why your datasets appear to be using more than 2TB of space, but it's been awhile since I've done that on RAIDZ so it's a bit of a guess.

Also, you really shouldn't try to store that much data on a pool. Filling your pool causes ZFS some severe problems allocating new blocks, and your pool will fragment fairly quickly as additional read/write cycles occur. Try to keep utilization down below 50% if you can.

Johan Palsson · Jan 29, 2016

Thank you for your ansvers !

To you pirateghost, no, have not enabled any snapshots, that was my first suspicion to.

To jgreco, but, what are you using if not RAIDZ ?

I suspected that block allocation had something to do with it.
Have used ZFS before, but only on Solaris and always at RAIDZ2 with up to 36 spindles.
Both NFS and iSCSI, have never had these type of problems before.

So, i suspect that this combination (4 spindles, RAIDZ1 and iSCSI) is a dead end then ?

jgreco · Jan 29, 2016

Johan Palsson said:
Thank you for your ansvers !

To you pirateghost, no, have not enabled any snapshots, that was my first suspicion to.

To jgreco, but, what are you using if not RAIDZ ?

Usually you want to use mirrors for block storage protocols.

I suspected that block allocation had something to do with it.
Have used ZFS before, but only on Solaris and always at RAIDZ2 with up to 36 spindles.
Both NFS and iSCSI, have never had these type of problems before.

Oh sure they have. It's inherent in how RAIDZ works.

Look here. Basic RAIDZ1 setup with five spindles. This is what's actually going onto disk, after compression.

In the first (orangey) example, 40K gets written out (8 4K sectors of Data plus 2 4K sectors of Parity). This exactly matches what many people would think of as "RAIDZ1" because if you filled an entire disk this way, you lose exactly one disk's worth of space to parity.

But next, the system wants to write a 12K record (yellow), which is three sectors worth of data. So it generates a parity block. This uses slightly more parity than you'd expect, because instead of one parity per 4 data, you now get one parity per 3.

We repeat with green.

Worse, we now hit the reddish. ZFS wants to write a 4K record. It has to pop out an entire parity block to protect that one 4K record; this is one parity per one data block.

Even worse worse, there are cases where ZFS will not allocate without also padding, such as it won't allocate five sectors (purpley with X) because it wants to discourage fragmentation.

Code:

Sectors of data vs Sectors used:
1                  2
2                  4
3                  4
4                  6
5                  6
6                  8

And this gets worse for RAIDZ2, because the parity storage requirement is worse. RAIDZn is not good at storing small amounts of data. It's very efficient at storing long runs of data, though.

This gets even more complicated thanks to compression, where the compression can offset some of the parity losses when you're storing highly compressible data.

So, i suspect that this combination (4 spindles, RAIDZ1 and iSCSI) is a dead end then ?

No, it's just not going to work as well as you hoped.

Starpulkka · Jan 29, 2016

This is something that i have also noticed. If you copy data from windows NewTypeFileSystem hdd to exact same size hdd as an example ext3 filesystem with same block size (tried even journaling on or off), you still might get disk full. I member 10 years a go when i used linux based nas devices, windows did hold more data regardless from different types to show space.

jgreco · Jan 29, 2016

Yeah, and a modern 4K AF drive can hold fewer 512 byte files than an older 512 sector drive of the same overall size. There's lots of choices that get made along the way as to efficiency vs performance. With ZFS, we're blessed with the ability to go two different ways, optimized for two different workload styles.

Johan Palsson · Jan 29, 2016

jgreco said:
Oh sure they have. It's inherent in how RAIDZ works.

Hello,

Thank you for your answers, most informative !

Yes, you are correct, have copied the same dataset to a Solaris server now
and it shows quite a bit of overhead also (though a little less, see below)

Solaris

Used Avail Refer
arkiv/filarkiv1 2,41T 6,21T 2,41T -
arkiv/filarkiv2 2,41T 6,21T 2,41T -

Freenas
Datavolym/Arkiv1 2.67T 0 2.67T -
Datavolym/Arkiv2 2.77T 0 2.77T -

But, this Solaris pool "arkiv" consists of two raidz2 vdevs (2*9 disks)
(and it is running a old version of ZFS)

jgreco said:
No, it's just not going to work as well as you hoped.

Sad,
But, if i go for mirroring instead i need some 16TB raw capacity to achive 8TB raw capacity ?
And, i should not fill iSCSI volumes with more than 50% ?
That makes a need for 32TB raw capacity (in this chassi 4 * 8TB disks)

Is this correct ?

But, the resulting 8TB raw disk will not probably be sufficient to store 8TB of NTFS data counting
that some further overhead probably will occur ?

So, is this at all achivable with this 4 disk chassi at all ?
(unless i fill the volumes with more than 50% ?)

With best regards

Johan

jgreco · Jan 29, 2016

If you wish to sustain good performance levels on an iSCSI volume, you need to leave lots of space. If you don't care about performance, you can probably fill to 70-80% before things get really bad.

Four 8TB disks as two sets of 8TB mirrors gives you 16TB of pool space. There is no loss of space to parity or padding with mirrors, so the amount of pool space is a fixed 16TB, not the "sorta approximately about" numbers that one gets with RAIDZ. Of that, I commonly suggest using no more than half. However, leaving compression on means you'll probably compress your data somewhat, and you are likely to find yourself in a good place.

This is what happens to the speed of ZFS as a pool fragments. As you can see, things remain much faster on pools that are lightly filled.

Johan Palsson · Jan 30, 2016

jgreco said:
If you wish to sustain good performance levels on an iSCSI volume, you need to leave lots of space. If you don't care about performance, you can probably fill to 70-80% before things get really bad.
Four 8TB disks as two sets of 8TB mirrors gives you 16TB of pool space. There is no loss of space to parity or padding with mirrors, so the amount of pool space is a fixed 16TB, not the "sorta approximately about" numbers that one gets with RAIDZ. Of that, I commonly suggest using no more than half. However, leaving compression on means you'll probably compress your data somewhat, and you are likely to find yourself in a good place.

Many thanks for your answers Jgreco !
Very pedagogical and informative.

I will give mirrors a try !

With best regards

Johan Palsson

Important Announcement for the TrueNAS Community.

96% used, should have been far less ? (how is all calculated ?)

Johan Palsson

Cadet

pirateghost

Unintelligible Geek

jgreco

Resident Grinch

Johan Palsson

Cadet

jgreco

Resident Grinch

Starpulkka

Contributor

jgreco

Resident Grinch

Johan Palsson

Cadet

jgreco

Resident Grinch

Johan Palsson

Cadet

Similar threads

Important Announcement for the TrueNAS Community.

96% used, should have been far less ? (how is all calculated ?)

Cadet

Unintelligible Geek

Resident Grinch

Cadet

Resident Grinch

Contributor

Resident Grinch

Cadet

Resident Grinch

Cadet

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "96% used, should have been far less ? (how is all calculated ?)"

Similar threads