investigating old and new reports that ZFS scrubs may cause problems to users of Apple Time Machine

Status
Not open for further replies.

grahamperrin

Dabbler
Joined
Feb 18, 2013
Messages
27
Moderator Note: Pruned from the tail end of a nearly 3 year old thread, please don't necropost - JG

… perhaps the hardware is on edge or woefully underspecced. …

+1

With attention to the opening post, I'm actively testing for scrub- and/or reboot-related misbehaviours with FreeBSD 9.2-RELEASE-p9 on a relatively old, low spec machine (RM Expert 3030, circa 2009, Intel Pentium III Xeon) into which I have put 4 GB memory and four hard disk drives. From the RMEquip File for the product as originally supplied:

Part Number: 1GY550
Description: F/A-2.33C2D 2048DDR43030 MIO MX NFD IXF 160/ XPC**
Range: RM EXPERT 3030

Hardware Information: _
CPU: Intel Pentium III Xeon processor
Memory Slot 0: Samsung M3 78T2863QZS-CE6 1024MB
Memory Slot 2: Samsung M3 78T2863QZS-CE6 1024MB
MainBoard: Intel Corporation DQ35JO[DQ35JOE]
Video Controller: NVIDIA GeForce 8400(512MB)[NV8400GS]
Audio Controller: Realtek High Definition Audio[0ALC268]
Network Controller 1: Intel(R) 82566DM-2 Gigabit Network Connection[082566]
Network Controller 2: 1394 Net Adapter
Hard drive Position(0): (SATA) SAMSUNG_HD161HJ (Size 149GB. Serial No: …)[HD161HJ]
Optical Drive (Master): TSSTcorp CDW/DVD SH-M522C
Case: MX3
Advanced Hardware Information: _
CPU L2 Cache: 6144
CPU Family Model Stepping: x86 Family 6 Model 23 Stepping 6

MainBoard BIOS: JOQ3510J.86A.0865.2008.0404.0146

So far, no breakage of the type described in the opening post.

Never used a Mac, but it sounds like time machine keeps certain files open all the time when your Mac is running. If you reboot the server with those files open then Time Machine ends up being unable to figure out what is going on because those files weren't properly closed before the server rebooted. …

Time Machine doesn't act like that, CJ - no files are held open, and the TM target is completely dismounted by the client between the hourly backup sessions.

… what I think is going on is that TM starts backing up, but is interrupted and uncleanly dismounts the backup target. …

@fracai please: what is the model identifier of the Apple notebook that was used at the time?


An ideal

… if you shut the lid on a Mac laptop while it's writing to its TM destination, it'll tidy up, dismount the TM archive and then go to sleep. On the next opportunity, it'll pick up from where it left off. …


A test result without an ideal unmount

An early 2009 MacBookPro5,2 with 8 GB memory and OS X 10.9.5. I performed a few actions (nothing insane) to get reported failure to eject a volume. From one of the attached files:

2016-01-29 07:36:26.265 com.apple.backupd[1954]: Copied 64.98 GB of 100.37 GB, 1473380 of 2431122 items
2016-01-29 07:36:30.405 com.apple.backupd[1954]: Copied 1473389 items (64.98 GB) from volume OS. Linked 0.
2016-01-29 07:36:30.606 com.apple.backupd[1954]: Error writing to backup log. NSFileHandleOperationException:*** -[NSConcreteFileHandle writeData:]: Not a directory
2016-01-29 07:36:30.998 com.apple.backupd[1954]: Error writing to backup log. NSFileHandleOperationException:*** -[NSConcreteFileHandle writeData:]: Not a directory
2016-01-29 07:36:30.999 com.apple.backupd[1954]: Error writing to backup log. NSFileHandleOperationException:*** -[NSConcreteFileHandle writeData:]: Not a directory
2016-01-29 07:36:31.002 com.apple.backupd[1954]: Error writing to backup log. NSFileHandleOperationException:*** -[NSConcreteFileHandle writeData:]: Not a directory
2016-01-29 07:36:31.004 com.apple.backupd[1954]: Error writing to backup log. NSFileHandleOperationException:*** -[NSConcreteFileHandle writeData:]: Not a directory
2016-01-29 07:36:31.006 com.apple.backupd[1954]: Error writing to backup log. NSFileHandleOperationException:*** -[NSConcreteFileHandle writeData:]: Not a directory
2016-01-29 07:36:31.007 com.apple.backupd[1954]: Error writing to backup log. NSFileHandleOperationException:*** -[NSConcreteFileHandle writeData:]: Not a directory
2016-01-29 07:38:37.144 com.apple.backupd[1954]: Backup canceled.
2016-01-29 07:38:40.304 com.apple.backupd[1954]: [SnapshotUtilities mountPointForVolumeRef] FSGetVolumeInfo returned: -35
2016-01-29 07:38:40.304 com.apple.backupd[1954]: Failed to eject volume (null) (FSVolumeRefNum: -113; status: -35; dissenting pid: -1)
2016-01-29 07:38:40.305 com.apple.backupd[1954]: Failed to eject Time Machine disk image: /Volumes/FreeNAS Time Machine/cces3e-gjp4.sparsebundle



Goodness, post-recovery

2016-01-30 13-59-30 screenshot.png


Additional information; relevance

For yesterday morning's incident I have a variety of other screenshots and other information, including an outline of the sequence of actions, but I should treat those details as not relevant to this topic because:
  • recovery occurred
  • neither reboots, nor a scrub, caused breakage of the backup data.
 

Attachments

  • bootlog, root xnu, shutdown, Shutdown Cause, Sleep Cause, Wake reason.txt
    1.2 KB · Views: 283
  • com.apple.backupd FSGetVolumeInfo returned -35, error 31.txt
    52.1 KB · Views: 280
  • 2016-01-30 08-31-59 recovery complete.txt
    2.3 KB · Views: 251
Last edited:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Please be aware that 4GB is half the minimum required memory for FreeNAS. Results are not necessarily expected to be representative.
 

grahamperrin

Dabbler
Joined
Feb 18, 2013
Messages
27
Background

From irc://chat.freenode.net/#freenas:

Jan 17 14:17:46 <dlavigne> grahamperrin: i thought we had a recent bug on the scrub thing

Jan 17 14:18:16 <grahamperrin> if we can find a reference for that bug report it'll help me

Jan 17 14:18:41 <dlavigne> grahamperrin: it was either a bug or a recent forum report, i remember seeing it float by this past week

Jan 17 14:20:43 <grahamperrin> Side note: my own http://apple.stackexchange.com/a/93382/8546 involved ZFS, scrubbing and sparse bundle disk images for HFS Plus for Time Machine.
Jan 17 14:21:07 <grahamperrin> However that was locally attached storage (ZEVO), not NAS.

Jan 17 14:29:00 <grahamperrin> Back in ten. Restarting, and then I'll seek that scrub-related problem report.

Jan 17 15:11:16 <grahamperrin> dlavigne, I'm seeking but can't find any recent suggestion that scrubbing may cause a problem for a Time Machine client

Jan 17 15:31:32 <grahamperrin> dlavigne, I'm still seeking. https://encrypted.google.com/search?q="Time Machine"|AFP +scrub +FreeNAS finds nothing recent.

Jan 17 15:44:38 <grahamperrin> dlavigne, https://forums.freenas.org/index.php?threads/timemachine-breaks-after-reboot-scrub.12823/#post-61188 "… if a *scrub* operation is causing things to break, perhaps the hardware is on edge or woefully underspecced …"

– I spent more than an hour searching https://bugs.freenas.org/ – plus these forums and other online resources. In the absence of anything recent, I pinpointed the 2013 topic PROBLEM - TimeMachine breaks after reboot / scrub | FreeNAS Community.

@dlavigne or anyone: if you can find the more recent report, please add a note here so that this discussion may be moved/edited appropriately. Thanks :)


Test environments

… 4GB is half the minimum required memory for FreeNAS. …

Thanks.

As @JaimieV gave thought to under-specification, I'm happy to test whether the breakage described by @oracle_sod can be produced by provocative client uses of a server that does not meet requirements.

(From five years with AppleSeed projects, plus other testing that must remain confidential, I have a few tricks up my sleeve.)
 
Status
Not open for further replies.
Top