Resilvering performance

tannebil

Dabbler
Joined
Sep 6, 2023
Messages
20
I'm expanding a pool that consists of a single a 2-drive mirror vdev by successive replacement of the drives with larger sizes (14 TB to 20 TB). The existing pool is about 70% full. I've done it before and it worked without any issue but that was going from 4 TB drives that were 10% full to the 14TB drives and it only took a few minutes. I knew this one would take a lot longer because there was much more data to copy so I wasn't surprised when the initial estimate was 24 hours dropping to 12 hours once it had run for a few minutes. It's now 10 hours later and the estimated time has slipped a couple of hours about 15 hours but nothing appears wrong so I'm not concerned that it's not going to finish.

However, I was curious about the slippage since it seems like it should be pretty much a straight copy. Not having anything better to do, I looked at some performance graphs and noticed that there have been a couple of long period where disk I/O performance dropped by 90% which seems odd. I'd turned off all the data protection tasks and there really isn't much load on the server. All the disk and CPU load/temperatures look fine. There is one application running (Proxmox Backup Server) but it's not doing much at the moment other than occasional backups of some small Proxmox VMs/LXCs)

It's a Terramaster 2-bay NAS that I upgrade with 33GB RAM running 22.12.4.2

Any ideas?
 

Attachments

  • Image 11-9-23 at 2.09 PM.jpeg
    Image 11-9-23 at 2.09 PM.jpeg
    156.5 KB · Views: 33

tannebil

Dabbler
Joined
Sep 6, 2023
Messages
20
Maybe this is related? These high periods of ARC Requests demand_metadata correspond to the low throughput periods. Something that might be helped by a fast metadata special device? ARC cache is currently 15.5 GB

A big chunk of the data is SMB/macOS files which heavily use extended attributes. Do the extended attributes get stored as metadata in ZFS?
 

Attachments

  • Image 11-9-23 at 2.50 PM.jpeg
    Image 11-9-23 at 2.50 PM.jpeg
    46.7 KB · Views: 34
Last edited:

tannebil

Dabbler
Joined
Sep 6, 2023
Messages
20
I had a number of old backup copies of Apple Photos libraries on the pool made over the last 10 years that each were about 100GB, 40K photos, and 400K files. Deleting them made the second replacement drive resilver in 16 hours vs 26 hours to resilver the first drive with no extended slow periods.

Apple restructured Photos a few versions back to reduce the number of files created but these libraries were mostly in the old format. The drives copy at about 150 MiB or 540GB/hour so 17 hours seems about right for 9.28TB of data.

I saw some ideas about creating L2ARC persistent metadata cache but it was easier to just delete them and get on with life.
 
Top