zfs receive terminated "out of space" with 400GB to go but zpool list says it has 4.25TB spare?

Stilez · Jun 19, 2018

From zfs receive (last few lines):

Code:

receiving incremental stream of mypool/archive0@snapshot_20180611-090000 into mypool/archive0@snapshot_20180611-090000
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of mypool/archive0@snapshot_20180611-091500 into mypool/archive0@snapshot_20180611-091500
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of mypool/archive0@snapshot_20180611-093000 into mypool/archive0@snapshot_20180611-093000
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of mypool/archive0@snapshot_20180611-094500 into mypool/archive0@snapshot_20180611-094500
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of mypool/archive0@snapshot_20180611-100000 into mypool/archive0@snapshot_20180611-100000
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of mypool/archive0@snapshot_20180611-101500 into mypool/archive0@snapshot_20180611-101500
received 312B stream in 1 seconds (312B/sec)
receiving full stream of mypool/archive1@snapshot_20180424-000000 into mypool/archive1@snapshot_20180424-000000
received 459KB stream in 3 seconds (153KB/sec)
receiving incremental stream of mypool/archive1@snapshot_20180425-000000 into mypool/archive1@snapshot_20180425-000000
received 875KB stream in 1 seconds (875KB/sec)
receiving incremental stream of mypool/archive1@snapshot_20180426-000000 into mypool/archive1@snapshot_20180426-000000
cannot receive incremental stream: out of space

 # zpool list
NAME		  SIZE  ALLOC FREE  EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
mypool		18.1T 13.8T 4.25T -		37%  76% 2.66x ONLINE /mnt
freenas-boot  37G   3.80G 33.2G -		15%  10% 1.00x ONLINE -

Out of space?!

garm · Jun 19, 2018

I can’t read that, please use [ code] tags

Stilez · Jun 19, 2018

garm said:
I can’t read that, please use [ code] tags

changed!

garm · Jun 19, 2018

What are you doing exactly?

Stilez · Jun 19, 2018

garm said:
What are you doing exactly?

Simple full backup via CLI of the data pool on my main server to a newly-set up backup server (both 11.1-U5). Hardware is good, see spec in signature (similar for both)

Main server has a 28TB raw capacity with 14.1TB used in the main pool, it's 2.6x dedup so the pool is equivalent to a 35TB non-dedup size (referenced). I set up a new backup server created by clean install 11.1-U5, slightly smaller pool (20TB raw capacity, 18TB formatted capacity). The backup server was fresh, in the sense of no existing pool or data, clean bare disks. I used ZFS in CLI to replicate, because I'm not running periodic snaps in the GUI so I can't use FreeNAS GUI replication tasks.

Once setup done, I ran standard commands to fully backup the pool:

On main server: zfs send -vvDRLe mypool@latest_snapshot_name
On new backup server: zfs receive -vvFsd mypool

It's worked fine right to the end. It spent some days 24/7 copying over all but the last dataset (35.1 of 35.5 TB), and aborted today midway through the last dataset (the last dataset was small, it copied 100GB of 500GB and ended, just 400GB before finishing). But the backup pool isn't full. It has only 13.8 of 18.1TB used (76% capacity), 4.25TB unused, plenty of space free (see 1st post), no other activity or stored data, no deleted data hiding away and spoiling the accounting, disks were empty when this started and nothing else done on it. It may run slow being that full, by switching to space saving mode, but it's certainly not short of space, and it's not short of RAM for metadata tables etc.

MrToddsFriends · Jun 19, 2018

Stilez said:
The backup server was fresh, in the sense of no existing pool or data, clean bare disks. I used ZFS in CLI to replicate, because I'm not running periodic snaps in the GUI so I can't use FreeNAS GUI replication tasks.

I'm asking nevertheless: Are there any reservations set in datasets contained in the receiving pool? AFAICT the value of the ALLOC property shown by zpool list does not account for the reservation and refreservation zfs properties of datasets.

Does the output of
zfs list -r -o space mypool
or alternatively
zfs list -r -o name,avail,used,usedsnap,usedds,usedrefreserv,reserv,usedchild mypool
give some insight?

Stilez · Jun 19, 2018

MrToddsFriends said:
I'm asking nevertheless: Are there any reservations set in datasets contained in the receiving pool? AFAICT the value of the ALLOC property shown by zpool list does not account for the reservation and refreservation zfs properties of datasets.

Does the output of
zfs list -r -o space mypool
or alternatively
zfs list -r -o name,avail,used,usedsnap,usedds,usedrefreserv,reserv,usedchild mypool
give some insight?

I haven't used/set any reservations. It looks like one iSCSI zvol uses them, but it's the same size in both original and backup pools. The output columns of the 2nd command includes all the output from the 1st command, so I haven't posted the output of the 1st command. I've used Excel to make it easier to read, and edited out actual names, but the data (including subdataset order/nesting) is intact. The outputs are almost identical, nothing stands out. The few slight discrepancies are because files have been added/deleted on the original pool since the backup started, so the data includes those later changes, which are small.

The entry for /ds1/ds10 looks odd, but it's the same in the source so maybe it's correct. I've rebooted the backup and want to check what datasets, snaps, and sizes are reported after it comes back and they're all fully mounted, but no obvious/unexpected values in that list, apart from ds10 maybe? But hard to figure if it's right or not, beyond that it's what was in the original.

Also took a look at zdb's view of the blocks and DDT usage, to see if that gives a clue. Original right (black), backup left (red). As you can see, almost identical, nothing. No sign of pool exhaustion or files copied being larger than files as originally stored.

UPDATE:

I've left the above untouched in case I'm wrong, but I think I know what's up. Even if the copy is probably faithful, the backup pool genuinely uses about the same space and block distribution as the original, the data from zpool list that there's 4.25TB of space free is probably 3.19TB out, if the refreserve on the iSCSI share isn't being counted and is the culprit.

So that would give the pool a "real" allocated space of ~ 13.8T (zpool list) + 3.19T (refreserve) = ~ 17.0T, or about 93% full / 1.1T available. Even allowing for rounding and slop space, that's fine, but adding another 400GB would bring it to 17.4/18.1 = ~96% and hitting the slop limit.

It fits, that sounds like the problem. I can test by adding another disk, and trying the resume token magic, I reckon...?

What doesn't then make sense is the original pool. I just added another 4T of raw capacity before the backup, or about 3.8T formatted. Original pool and backup both differ in free space by about 4.25 ish TB, which differs from the 3.19T for that reservation, and is also more than the capacity I just added (and it worked before). Strange......... I'll look into it now I have some idea where to start, but may take a bit of time to figure more out.

Actually, it won't. *Initiates 2 x 3.19TB zfs destroy -r of the iSCSI NTFS zvol which had the reservation and waits for apocalypse.... and see what the score is on token resume after that.

Important Announcement for the TrueNAS Community.

zfs receive terminated "out of space" with 400GB to go but zpool list says it has 4.25TB spare?

Stilez

Guru

garm

Wizard

Stilez

Guru

garm

Wizard

Stilez

Guru

MrToddsFriends

Documentation Browser

Stilez

Guru

Similar threads

Important Announcement for the TrueNAS Community.

zfs receive terminated "out of space" with 400GB to go but zpool list says it has 4.25TB spare?

Stilez

Guru

garm

Wizard

Stilez

Guru

garm

Wizard

Stilez

Guru

MrToddsFriends

Documentation Browser

Stilez

Guru

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "zfs receive terminated "out of space" with 400GB to go but zpool list says it has 4.25TB spare?"

Similar threads