Writing halted on iSCSI zpool

Status
Not open for further replies.

Bitwise

Cadet
Joined
Nov 18, 2014
Messages
5
I am running 9.2.1.8 Release x64 with 6x3TB drives in a raidz2 pool. I setup an iscsi target that allocates all but 800k of the zfs pool. This config has been running fine for about a year. Now, I am getting alerts that the capacity of my pool is at 98%. The pool is 10.3TB and ~5TB are in use via a Windows box through the iscsi. At this point, I cannot write anything to the drive via Windows due to an I/O error. The freenas box is incredibly slow. It is also running a scrub at the moment. When I try to stop the scrub, it says: "cannot cancel scrubbing Poolz: out of space"

My questions are:
1. I have 10.3 TB allocated to that iscsi target, can I use all of it or am I out of space at the halfway point?
2. Was I supposed to leave some space in the pool for FreeNAS?
3. (this is a broad one) what path do I need to take to get this thing running again?

Capture.PNG
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
A pool should never be allocated above 80%, and absolutely unequivocally NEVER EVER above 95%. So no, you should have set your iSCSI extent to not be more than 80%. For performance reasons you really shouldn't go above 50%.

The answer is to destroy the pool and start over. Not sure if you have backups of your data or not, and you may or may not be able to make backups because the pool is 100% full. Things go very wonky on a CoW filesystem when filled to 100%. Feel free to search the forums to see how it's fubared quite a few people that didn't know this very simple but very important piece of information.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Yeah, it's pretty much the ZFS nightmare scenario where it is having trouble allocating space because it's all fragmented. The blocks reporting as free might all be too small to be useful or usable.

Block storage seems to encounter eventual problems (i.e. "it worked for a year!") when you pass around 60%, but fragmentation and performance effects might start to affect the pool at much lower percentages. It has been slowing down on you for awhile but you might not have noticed.

I would suggest doubling the amount of HDD space. Depending on how much damage has been done, you might or might not be able to recover what is on the Windows volume. Most filesystems do not expect to be getting random errors trying to write random blocks, since a physical disk probably won't exhibit that type of behaviour unless it is (almost?) totally failed, so most filesystems cope poorly or not-at-all with that. If you were to replace all the drives with 6TB drives, or add another vdev to the pool, it might be recoverable.

I'm contemplating the deployment of an iSCSI FreeNAS appliance here and basically I'm intending to overprovision it with space by a massive amount, because the disks are cheap and the pain of failing to do so is really awful.
 

Bitwise

Cadet
Joined
Nov 18, 2014
Messages
5
OK, to be clear, I have 18TB of storage (6 drives @ 3tb each) that gets reduced down to 8.24tb (80% of 10.3tb) of usable space?? What if I was to delete a bunch of stuff on the drive, would it make it usable again so I could backup and fix?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
*if* you can delete it (and delete enough of it to make it matter), yes.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
The real problem is that with iSCSI, "deleting" files on the disk is actually just causing NTFS or whatever to mark blocks free in the free blocklist, not actually free up space on the array. Counter-intuitively this involves more writing and maybe no actual freeing.

Since you're on 9.2.1.8, it is possible that your configuration might allow one of several possible things that could help fix this. Basically if you have compression enabled, or whatever the option is that causes zero filled disk blocks to be registered as free in the zvol, then using a Windows tool that zero-overwrites a file prior to deleting it might be helpful.
 

mav@

iXsystems
iXsystems
Joined
Sep 29, 2011
Messages
1,428
New iSCSI target in 9.3 (and experimental one in 9.2.1.9) supports UNMAP SCSI command. If used by initiator, it should allow to restore pool space when files deleted from NTFS on the iSCSI LUN. That is "must have" feature if you are going to do over-provisioning. Though for some reason Windows seems to record UNMAP support in time of filesystem creation, so it may require volume recreation to start use it.

Also 9.3 supports thin-provisioning threshold SCSI feature, that should notify initiator if backing pool usage reaches critical level.

Also make sure that your ZVOL has block size no less then 16K, otherwise storing it on 6-disk RAIDz will be quite space-inefficient. If data are compressible, you may want even bigger block size.
 
Last edited:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Ah yes, that's another thing, that's right. All these newfangled things.
 

Bitwise

Cadet
Joined
Nov 18, 2014
Messages
5
OK, I was able to recover everything off the iSCSI extent. I upgraded the firmware to 9.2.1.9-Release-x64 and destroyed the zvol. One thing has me confused, in post #2, you mentioned that a pool should never be over 80%. In rebuilding, shall I recreate the pool at 100% and the iSCSI extent at 80% or do I create the pool at 80% and the iSCSI target at 100% or are my numbers totally off?

I have 6 x 3TB drives that gives me around 10.9 TB of space. Would I allocate that all to a zvol and create my iSCSI extent at 8TB?
 

mav@

iXsystems
iXsystems
Joined
Sep 29, 2011
Messages
1,428
ZVOL size itself is not what is really critical here. What is critical is space it occupies on pool, that depends on several factor. If you are using snapshots, or if your ZVOL block size is too small (less then 16-32KB) to be stored efficiently on your 6-disk RAIDZ2, ZVOL still may consume more pool space then you specified and still overflow the pool. To be on a safe side you should set ZVOL block size to at least 16KB to use RAIDZ2 space efficiently (set higher if you want better compression), and set ZVOL size so that without compression with all planned snapshots and with some metadata overhead it always fit within 80% of pool capacity.

From the other side, if you are storing well-compressible data and/or your iSCSI initiators supports UNMAP to return back unused space, you may create so called "thin-provisioned" ZVOL with size even bigger then your pool capacity. But in that case you should much more carefully track pool space usage to add more disks to your storage if at some point you get close to the threshold. Thin provisioning is not so interesting if single ZVOL occupies all your storage space. But if you have several ZVOLs, or you are using same storage for some other things too, that may be a very flexible and efficient solution.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
And note that 80% is way too full for a busy volume with lots of writing.
 

Bitwise

Cadet
Joined
Nov 18, 2014
Messages
5
I am a bit of a noob here so please help me to understand. I bought 6 x 3tb drives to run in a RAIDZ2 configuration. I figured this would give me around 12tb of space to write to. When FreeNAS configures the volume, it gives me 10.9tb. From what you are telling me, I only get to use 5.5tb (50%) of my (total) 18tb?? Why would anyone use this over RAID6? That is 13tb of overhead as opposed to 6tb. Am I not seeing the big picture?

My main goal is to attach every bit of usable space to a Windows box to serve document data. I chose iSCSI so I could use NTFS permissions. I don't need to make snapshots as the data is all backed up in a few places, one of them being offline storage. I chose FreeNAS for the ZFS infrastructure. The FreeNAS box is directly connected to the Windows box because I do not need to further share anything directly from the FreeNAS box or use any of the other plugins. If I wanted to use closer to that 10TB size do I need to be doing something different?

Again, I am not trying to talk down to anyone here. I am just trying to understand what I have got myself into and I appreciate any support you can offer me.

Below is a screenshot of the current config. I have the "ZPool" spanning those 6 drives with compression turned on. The iSCSI zvol is using 32k blocks with compression turned on. I have an iSCSI extent mapped to this zvol and I turned on the experimental target to use the UNMAP feature.

Capture.PNG
 

mav@

iXsystems
iXsystems
Joined
Sep 29, 2011
Messages
1,428
6-disk RAIDZ2 by definition gives capacity of 4 disks. 3TB disks have about 2.7TiB capacity. Please don't be confused with difference between marketing 18TB and ZFS's 16.3TiB -- TB is not equal to TiB! Four such disks should give about 10.9TiB of reliable space. 10.9TiB of usable capacity reported by ZFS is in fact the maximum you can get from those disks in any possible RAID with two-disk redundancy.

Theoretically you can use most of those 10.9TiB for data. But you should understand that you are running the most trivial block storage on top of very complicated Copy-on-Write file system. You may not use many benefits of CoW, but you still had to pay the price. And the price is that filesystem has to write data each time in new place. That creates huge potential for data fragmentation. To help ZFS mitigate that fragmentation you should always keep some reserve of free space on your pool. According to latest presentations, ZFS performance heavily drops if you reach 90% pool usage, and the more space you left -- the better. How much more -- depends on your workload: if you usually write and delete large files in sequential fashion -- you possibly may go higher (~ 80% usage). If you are doing many small random writes -- you should better have more space reserved.

What I am telling you, is you should not use more then 80% of pool capacity (about 8TiB) for a contignouos periods of time. If your data are compressible, that may mean more user data by the factor of compression. Your comparison of 5.5TiB on a Windows side to 18TB of raw disks is like is comparison of "green" and "soft".

In my understanding your original problem could be a combination of two mistakes: too small ZVOL block size (IIRC FreeNAS before 9.3 use 8K by default, that for 6-disk RAIDZ2 give only 50% storage efficiency instead of expected 66%), and lack of UNMAP support (that was not allowing ZFS to free pool space when you deleted some files on Windows side).
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I am a bit of a noob here so please help me to understand. I bought 6 x 3tb drives to run in a RAIDZ2 configuration. I figured this would give me around 12tb of space to write to. When FreeNAS configures the volume, it gives me 10.9tb. From what you are telling me, I only get to use 5.5tb (50%) of my (total) 18tb?? Why would anyone use this over RAID6? That is 13tb of overhead as opposed to 6tb. Am I not seeing the big picture?

My main goal is to attach every bit of usable space to a Windows box to serve document data. I chose iSCSI so I could use NTFS permissions. I don't need to make snapshots as the data is all backed up in a few places, one of them being offline storage. I chose FreeNAS for the ZFS infrastructure. The FreeNAS box is directly connected to the Windows box because I do not need to further share anything directly from the FreeNAS box or use any of the other plugins. If I wanted to use closer to that 10TB size do I need to be doing something different?

Again, I am not trying to talk down to anyone here. I am just trying to understand what I have got myself into and I appreciate any support you can offer me.

Below is a screenshot of the current config. I have the "ZPool" spanning those 6 drives with compression turned on. The iSCSI zvol is using 32k blocks with compression turned on. I have an iSCSI extent mapped to this zvol and I turned on the experimental target to use the UNMAP feature.

View attachment 5701

You get 12 "TB", as defined by drive manufacturers (12 * 10^12). Every OS on the planet reports x * 2^y. Guess which number is larger (hint: someone has an incentive to fudge with how much storage you think you get).

For iSCSI, count on keeping 50% free on any CoW filesystem (including ZFS). Of course, there's no reason to use iSCSI if you don't need to serve a virtual disk. Permissions work fine via CIFS and the whole setup is easy enough for home/small business scenario.

Some advantages of ZFS:

Open source implementation that is properly developed and supported, instead of some shady hardware with even shadier firmware doing even shadier calculations.
CoW and associated advantages (snapshots, for one)
Integration of disk and file management
Focus on data integrity at all costs

Some disadvantages:

Said costs are hardware (RAM!) and disk space.
Not as flexible as crummy NAS units (synology and similar).
 

Bitwise

Cadet
Joined
Nov 18, 2014
Messages
5
Thanks for that. I was aware of the TB vs TiB and used 18 for generalization (I knew it was wrong). This was originally an 8.x box so it did most likely have 8k block size and was not using UNMAP. It was upgraded as the new versions came out.

Based on the answers above, by allocating 8TiB to my pool of 10.9TiB, I would be using roughly 73% of pool space. That should suffice correct?
 

mav@

iXsystems
iXsystems
Joined
Sep 29, 2011
Messages
1,428
Yes, that should suffice. But I would still recommend you to track pool space usage when you start uploading the data. Just in case we missed some other factors.

If at some point later you see that because of efficient compression you are have too much free space available on pool while Windows tell disk is full -- you can always increase zvol size. Growing is easier then shrinking.
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
mav said:
In my understanding your original problem could be a combination of two mistakes: too small ZVOL block size (IIRC FreeNAS before 9.3 use 8K by default, that for 6-disk RAIDZ2 give only 50% storage efficiency instead of expected 66%), and lack of UNMAP support (that was not allowing ZFS to free pool space when you deleted some files on Windows side).

Actually, it could be worse than 50%. 8K divided by the four drives, leaves 2K for each drive. But if it is a 4k/ashift=12 pool, then another 2K on each drive is wasted. So it would be more like 33%, wouldn't it?

Although, if a block is small enough, doesn't raidz just mirror it? Maybe a 4K block would just be triple mirrored.
 
Last edited:

mav@

iXsystems
iXsystems
Joined
Sep 29, 2011
Messages
1,428
AFAIK it works different. With 8K ZVOL block size and 4K disk block size (ashift=12) RAIDZ2 will write 2x4K of payload data on first two disks and 2x4K of parity data on another two disks, space on two remaining disks remain empty and AFAIK can be reused by some other writes. So 8K of parity per 8K payload gave me 50%.
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
Would that scenario still be able to recreate the data if the first two disks (with the actual blocks) were removed? If not, then it wouldn't qualify as raidz2.
 
Status
Not open for further replies.
Top