All of a sudden, ZFS reported capacity wrong?

TunaMaxx · Jan 25, 2012

After slightly over a month and a half of flawless service, my FreeNAS-8.0.2-RELEASE-amd64 (8288) server has started reporting false drive capacity. It has a raidz1 pool containing 3 (plus a so called "hot spare") 500GB Western Digital Raptor drives. The setup is almost exclusively used as Time Machine server for 5 Mac workstations.

Since the beginning, reported size has been slightly less than 1TB as expected. However as of about 24 hours ago, the reported size (not free space) has dropped to about 4.8GB.

The various Mac workstations started throwing errors that the "Time Machine could not complete the backup" because it was too large for the available 4.8GB of space (See attachment). I don't know if the Time Machine backups running out of space (the pool was getting legitimately full) caused the problem, or if the problem happened and then Time Machine started throwing errors.

I've also included a screen shot of a weekly disk space graphs. You can see right around 24 hours ago, the reported capacity plummets.

Any suggestions on what might be happening here? Are the last month of backups gone? As far as I can see now, all of the historic Time Machine backup data is no longer available to Macs. :(

NOTE: I've seen the following info requested in similar threads. If you need anything else, just ask:

Code:

[root@can-nas01] /mnt/zpool# df -h
Filesystem             Size    Used   Avail Capacity  Mounted on
/dev/ufs/FreeNASs1a    927M    429M    424M    50%    /
devfs                  1.0K    1.0K      0B   100%    /dev
/dev/md0               4.3M    3.6M    380K    91%    /etc
/dev/md1               732K     16K    660K     2%    /mnt
/dev/md2                75M     18M     51M    26%    /var
/dev/ufs/FreeNASs4      20M    1.1M     17M     6%    /data
zpool                  3.2G     46K    3.2G     0%    /mnt/zpool
zpool/Storage          3.2G     11M    3.2G     0%    /mnt/zpool/Storage
zpool/TimeMachine      306G    303G    3.2G    99%    /mnt/zpool/TimeMachine

Code:

[root@can-nas01] /mnt/zpool# zfs list
NAME                USED  AVAIL  REFER  MOUNTPOINT
zpool               903G  4.71G  46.0K  /mnt/zpool
zpool/Storage      1.79G  4.71G  11.3M  /mnt/zpool/Storage
zpool/TimeMachine   901G  4.71G   302G  /mnt/zpool/TimeMachine

Code:

[root@can-nas01] /mnt/zpool# zpool status -v
  pool: zpool
 state: ONLINE
 scrub: scrub completed after 2h33m with 0 errors on Sun Jan 22 03:33:28 2012
config:

	NAME        STATE     READ WRITE CKSUM
	zpool       ONLINE       0     0     0
	  raidz1    ONLINE       0     0     0
	    ada0p2  ONLINE       0     0     0
	    ada1p2  ONLINE       0     0     0
	    ada2p2  ONLINE       0     0     0
	spares
	  ada3p2    AVAIL   

errors: No known data errors

Code:

[root@can-nas01] /mnt/zpool# zpool list
NAME    SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
zpool  1.35T  1.33T  26.4G    98%  ONLINE  /mnt

Daisuke · Jan 25, 2012

Indeed, your size is reported wrong. Here it is mines:

Code:

# zfs list
NAME        USED  AVAIL  REFER  MOUNTPOINT
nas         811G  6.34T   256K  /mnt/nas
nas/media   784G  6.34T   784G  /mnt/nas/media
nas/unix   26.9G  6.34T  26.9G  /mnt/nas/unix

The 'nas' dataset reports the total of my current data present in both sub-datasets. The REFER is the actual dataset size. In your case, USED is 3 times larger than REFER. Even the Storage dataset is wrong, so you should start fresh... Is there a way you can copy your 300GB of data, than destroy your TimeMachine dataset? I'm not familiar with TimeMachine but isn't easier to simply use AFP shares and copy your data directly into a network disk? You can create automatic snapshots in FreeNAS, I don't see the use of TimeMachine for that.

Try this: create a AFP share and copy the data manually. I bet it will report the proper size, always.

Brand · Jan 25, 2012

TunaMaxx said:
It has a raidz1 pool containing 3 (plus a so called "hot spare") 500GB Western Digital Raptor drives.

Why would you setup a RAIDZ-1 with a hot spare instead of a RAIDZ-2?

TunaMaxx · Jan 25, 2012

TECK said:
Even the Storage dataset is wrong, so you should start fresh... Is there a way you can copy your 300GB of data, than destroy your TimeMachine dataset?

That 300GB of data is some weird number based on I have no idea what. Three of the four machines that have been backing up to are buggered; they have history that goes back to this morning. The one machine that has history still is a 'spare' machine that is used sparsely. It appears that any of the Mac's that tried to create a backup larger than the (inaccurate) ~4.8GB space of the pool have no more historic backup data.

TECK said:
I'm not familiar with TimeMachine but isn't easier to simply use AFP shares and copy your data directly into a network disk? You can create automatic snapshots in FreeNAS, I don't see the use of TimeMachine for that.

Try this: create a AFP share and copy the data manually. I bet it will report the proper size, always.

Until yesterday, the reported size has been spot on, and slowly growing each day as new data was added. It's not until whatever happened about noon yesterday that I had anything but the strongest confidence in this FreeNAS setup with Time Machine support. Not so much anymore, unfortunately.

Time Machine backups are like crack to Mac users. It may be old hat to others, but the built in "set and forget" version'd backup of everything, plus ease of restoration is going to be hard to replace.

I'm thinking that I may have to just cut my losses and start fresh... but I'd feel better with some sort of explanation of what happened.

TunaMaxx · Jan 25, 2012

Brand said:
Why would you setup a RAIDZ-1 with a hot spare instead of a RAIDZ-2?

ZFS N00b, pure and simple. :)

I had read something about performance benefits of even numbered data drives, and performance hits with RAIDZ-2. Once I knew more, this box was already in use with a boat load of data on it.

I've since found out that hot spare is not working in the version of FreeNAS, or at least how I assumed it would. So I double shot myself in the foot there.

Unless there is some life-saving advice on how to rescue the data and / or report the correct capacity, I think I will start fresh with all four drives in a RAIDZ-1 or RAIDZ-2 configuration.

Daisuke · Jan 26, 2012

I would definitely start fresh with 8.0.3 p1. Personally I have my datasets spot on for months.
Don't use RaidZ2 for less than 6 disks, not worth IMO. Info

TunaMaxx · Jan 26, 2012

It looks as though I'm going to start from scratch, but is there any explanation on why the dataset reported capacity went south?

This is being used as a backup device, not just a NAS. I'd like to be able to trust it a little more than I do right now... :)

peterh · Jan 29, 2012

what is reported in /var/log/messages during that period ?
Something has gone wrong, if a hardware failure then starting over might repeat the problem.

TunaMaxx · Feb 2, 2012

Unfortunately, I have no idea what was in /var/log/messages because I've already nuked the box and started over.

Hardware-wise, I have scrubbed the disks, etc. No reports of any disk errors in the nightly reports either. Fingers crossed!

Important Announcement for the TrueNAS Community.

All of a sudden, ZFS reported capacity wrong?

TunaMaxx

Dabbler

Attachments

Daisuke

Contributor

Brand

Moderator

TunaMaxx

Dabbler

TunaMaxx

Dabbler

Daisuke

Contributor

TunaMaxx

Dabbler

peterh

Patron

TunaMaxx

Dabbler

Similar threads