Sudden Loss of ZFS capacity [Bug ?]

GregP · Mar 16, 2013

I have two FreeNas 8.3.0 x64 systems, each with 6 disks. Each system's 6 drives are organized as two ZFS volumes. There are 3x 2TB drives and 3x 3TB drives on each. Each volume on freenas1 replicates to freenas2. This is done by taking a Periodic Snapshot of each volume every other day and then replicating. The snapshot expiration date is for 1 week. Each volume's child datasets recursively snapshot and replicate. These two servers have been stable since they day after 8.3.0 was released. Suddenly, the ZFS volume on freenas1 with 3x 3TB disks has reported the wrong size , it thinks it has no free space, and all attempts to CIFS write to that volume fail because it thinks it has no free space. The disks are fine. Daily run outputs are interesting. A reboot doesn't solve the issue.
report from freenas 1 box today:
Disk status:
Filesystem Size Used Avail Capacity Mounted on
/dev/ufs/FreeNASs2a 926M 378M 473M 44% /
devfs 1.0k 1.0k 0B 100% /dev
/dev/md0 4.6M 3.3M 965k 77% /etc
/dev/md1 823k 2.5k 755k 0% /mnt
/dev/md2 149M 18M 118M 14% /var
/dev/ufs/FreeNASs4 19M 1.7M 16M 9% /data
VRaidZsamsung 2.5T 942M 2.5T 0% /mnt/VRaidZsamsung
VRaidZsamsung/dsCIFS 3.6T 1.1T 2.5T 31% /mnt/VRaidZsamsung/dsCIFS
VRaidZwd 218k 218k 0B 100% /mnt/VRaidZwd <--------------------------
VRaidZwd/RemoteBrian 218k 218k 0B 100% /mnt/VRaidZwd/RemoteBrian
VRaidZwd/RemoteGreg 175k 175k 0B 100% /mnt/VRaidZwd/RemoteGreg
VRaidZwd/RemoteStan 223k 223k 0B 100% /mnt/VRaidZwd/RemoteStan
VRaidZwd/dsAFP 10M 10M 0B 100% /mnt/VRaidZwd/dsAFP
VRaidZwd/dsAFP/TM2 191k 191k 0B 100% /mnt/VRaidZwd/dsAFP/TM2
VRaidZwd/dsCIFS 887G 887G 0B 100% /mnt/VRaidZwd/dsCIFS

Last dump(s) done (Dump '>' file systems):

Checking status of zfs pools:
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
VRaidZsamsung 5.44T 1.67T 3.77T 30% 1.00x ONLINE /mnt
VRaidZwd 8.12T 8.00T 130G 98% 1.00x ONLINE /mnt <----------------------

all pools are healthy
___________________________________________________________________
Two days ago I had:
Disk status:
Filesystem Size Used Avail Capacity Mounted on
/dev/ufs/FreeNASs2a 926M 378M 473M 44% /
devfs 1.0k 1.0k 0B 100% /dev
/dev/md0 4.6M 3.3M 965k 77% /etc
/dev/md1 823k 2.5k 755k 0% /mnt
/dev/md2 149M 18M 118M 14% /var
/dev/ufs/FreeNASs4 19M 1.6M 16M 9% /data
VRaidZsamsung 2.5T 942M 2.5T 0% /mnt/VRaidZsamsung
VRaidZsamsung/dsCIFS 3.6T 1.1T 2.5T 31% /mnt/VRaidZsamsung/dsCIFS
VRaidZwd 196G 218k 196G 0% /mnt/VRaidZwd
VRaidZwd/RemoteBrian 196G 218k 196G 0% /mnt/VRaidZwd/RemoteBrian
VRaidZwd/RemoteGreg 196G 176k 196G 0% /mnt/VRaidZwd/RemoteGreg
VRaidZwd/RemoteStan 196G 224k 196G 0% /mnt/VRaidZwd/RemoteStan
VRaidZwd/dsAFP 196G 10M 196G 0% /mnt/VRaidZwd/dsAFP
VRaidZwd/dsAFP/TM2 196G 192k 196G 0% /mnt/VRaidZwd/dsAFP/TM2
VRaidZwd/dsCIFS 1.1T 968G 196G 83% /mnt/VRaidZwd/dsCIFS

Last dump(s) done (Dump '>' file systems):

Checking status of zfs pools:
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
VRaidZsamsung 5.44T 1.67T 3.77T 30% 1.00x ONLINE /mnt
VRaidZwd 8.12T 7.71T 424G 94% 1.00x ONLINE /mnt

all pools are healthy
_________________________________________________________________
Four days ago:

Backing up package db directory:

Disk status:
Filesystem Size Used Avail Capacity Mounted on
/dev/ufs/FreeNASs2a 926M 378M 473M 44% /
devfs 1.0k 1.0k 0B 100% /dev
/dev/md0 4.6M 3.3M 965k 77% /etc
/dev/md1 823k 2.5k 755k 0% /mnt
/dev/md2 149M 18M 118M 14% /var
/dev/ufs/FreeNASs4 19M 1.6M 16M 9% /data
VRaidZsamsung 2.5T 942M 2.5T 0% /mnt/VRaidZsamsung
VRaidZsamsung/dsCIFS 3.6T 1.1T 2.5T 31% /mnt/VRaidZsamsung/dsCIFS
VRaidZwd 472G 218k 472G 0% /mnt/VRaidZwd
VRaidZwd/RemoteBrian 472G 218k 472G 0% /mnt/VRaidZwd/RemoteBrian
VRaidZwd/RemoteGreg 472G 176k 472G 0% /mnt/VRaidZwd/RemoteGreg
VRaidZwd/RemoteStan 472G 223k 472G 0% /mnt/VRaidZwd/RemoteStan
VRaidZwd/dsAFP 472G 10M 472G 0% /mnt/VRaidZwd/dsAFP
VRaidZwd/dsAFP/TM2 472G 191k 472G 0% /mnt/VRaidZwd/dsAFP/TM2
VRaidZwd/dsCIFS 1.4T 981G 472G 67% /mnt/VRaidZwd/dsCIFS

Last dump(s) done (Dump '>' file systems):

Checking status of zfs pools:
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
VRaidZsamsung 5.44T 1.67T 3.77T 30% 1.00x ONLINE /mnt
VRaidZwd 8.12T 7.30T 840G 89% 1.00x ONLINE /mnt

all pools are healthy
_______________________________________________________________________

When I looked into this, for some reason there are several month's worth of ZFS Snapshots listed in the GUI. Daily use of the dsCIFS dataset is less than 2Gb so I didn't expect even the failure to automatically delete the expired snapshots to kill this server. Starting to manually delete all those old snapshots got me back most of the missing space. The replication tasks have not broken and the files are identical between the two machines.

I have no explanation why the snapshots do not delete past their expiration date. Think this must be a bug when specifying snapshots in days and expiration in weeks perhaps?

GregP · Mar 16, 2013

Searching the forum and bug reports I also am left with the impression that snapshots on the remote system are never expired and eventually will take over all the storage on the remote system. I found this excellent python script for cron on the forum that I think will solve the problem on the remote. http://forums.freenas.org/showthrea...ilar-to-Apple-s-TimeMachine&p=46049#post46049.

Important Announcement for the TrueNAS Community.

Sudden Loss of ZFS capacity [Bug ?]

GregP

Dabbler

GregP

Dabbler

Similar threads

Important Announcement for the TrueNAS Community.

Sudden Loss of ZFS capacity [Bug ?]

GregP

Dabbler

GregP

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Sudden Loss of ZFS capacity [Bug ?]"

Similar threads