slow zfs scrub, how to find the problem?

tryingtoflash · Mar 13, 2018

I have a newer, HP ML10 server with a couple of LSI SAS2008 cards connected to an external case full of newer 2TB to 6TB drives. Pool is organized as 5 vdevs of mirrored pairs. I have a very old IBM rack server at another site with two of the same LSI SAS2008 cards and I use zfs replication to back up the primary server to the older, IBM server. In fact I rotate disks out from the primary server when they pass their warranty date, so the older server is actually using disks that were retired from the primary server.

The problem I'm having is that the primary server takes too long to complete a zfs scrub. It's reporting scrub speeds like "60.0M/s" and takes several days to finish a scrub. The much older IBM server can scrub its pool (which is organized in the same 5x2 pattern) in 8-10 hours.

Can someone give me some ideas on what to look at to try to determine why the new server is so much slower at zfs scrubs?

Code:


>>>> Some key information about the primary server <<<<

FreeBSD 11.1-RELEASE-p4 #0: Tue Nov 14 06:12:40 UTC 2017
CPU: Intel(R) Xeon(R) CPU E3-1220 v3 @ 3.10GHz (3092.91-MHz K8-class CPU)
real memory  = 17179869184 (16384 MB)
avail memory = 16569171968 (15801 MB)
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 1 package(s) x 4 core(s)
mps0: <Avago Technologies (LSI) SAS2008> port 0x4000-0x40ff mem 0xfbef0000-0xfbef3fff,0xfbe80000-0xfbebffff irq 16 at device 0.0 on pci1
mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
mps0: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
mps1: <Avago Technologies (LSI) SAS2008> port 0x5000-0x50ff mem 0xfbff0000-0xfbff3fff,0xfbf80000-0xfbfbffff irq 17 at device 0.0 on pci2
mps1: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
mps1: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
ses0 at ahciem0 bus 0 scbus8 target 0 lun 0
ses0: <AHCI SGPIO Enclosure 1.00 0001> SEMB S-E-S 2.00 device
ses0: SEMB SES Device

dmesg shows all the disks are connected at SATA III speeds:
da0: 600.000MB/s transfers
da1: 600.000MB/s transfers
da2: 600.000MB/s transfers
da3: 600.000MB/s transfers
---and so on 

and ashift of all disks in the pool is 12

>>>> Same information about the backup server <<<<

FreeBSD 11.1-RELEASE-p4 #0: Tue Nov 14 06:12:40 UTC 2017
CPU: Intel(R) Xeon(R) CPU		   E5530  @ 2.40GHz (2400.13-MHz K8-class CPU)
real memory  = 21474836480 (20480 MB)
avail memory = 20724740096 (19764 MB)
FreeBSD/SMP: Multiprocessor System Detected: 16 CPUs
FreeBSD/SMP: 2 package(s) x 4 core(s) x 2 hardware threads
mps0: <Avago Technologies (LSI) SAS2008> port 0x3000-0x30ff mem 0x97b40000-0x97b43fff,0x97b00000-0x97b3ffff irq 26 at device 0.0 on pci4
mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
mps0: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
mps1: <Avago Technologies (LSI) SAS2008> port 0x2000-0x20ff mem 0x97a40000-0x97a43fff,0x97a00000-0x97a3ffff irq 32 at device 0.0 on pci6
mps1: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
mps1: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
ses0 at mps0 bus 0 scbus0 target 9 lun 0
ses0: <LSILOGIC SASX28 A.1 7016> Fixed Enclosure Services SCSI-3 device
ses0: 300.000MB/s transfers
ses0: Command Queueing enabled
ses0: SCSI-3 ENC Device

dmesg shows shows a range of speeds, which makes sense because some of the disks are much older:

da0: 300.000MB/s transfers
da1: 300.000MB/s transfers
da2: 150.000MB/s transfers
da3: 150.000MB/s transfers
---and so on 

and ashift of all disks in the pool is 12

tryingtoflash · Mar 13, 2018

By the way, zfs scrub performance is the only problem I have noticed. I haven't done any formal benchmarking, but zfs send/recv to the single drive that I use to cary data between the two servers is limited by the speed of that single drive, so both servers can support read and write from/to the pool at over 130 MiB/s. While running those backups, disks in the pool tend to show around 25% utilization, according to gstat.

DrKK · Mar 13, 2018

So, a few comments.

600 MB/s "SATA III" speed is moot. There is no spinning rust on the planet which can even remotely approach that speed. The interface may be at that speed, but of course the drive is not. SATA-2 speeds vs SATA-3 speeds will almost certainly make no difference in a scrub speed. But I am sure you already know this.

Second of all, depending on your CPU (scrubs can use quite a bit of CPU) and fragmentation state, the pool layout, and the average file size, the health/speed of the least healthy drive in each vdev, scrubs are going to take in the vicinity of 30 to 300 minutes per TB (in most cases, it could be slower for pathological or older setups) of data scrubbed. For round figures, that actually does work out to about 60 MB/s on the low end. So this number, while disappointing, is not in and of itself indicative of a problem, especially if you're using older servers and/or older hard drives. When I juxtapose with your 130 MiB/s zfs send/recv speed, I am hard-pressed to really think the number is outrageous.

This assumes an inactive pool. If your pool is actively serving files WHILE it is scrubbing, then naturally the scrub process will be competing there as well.

If you provide more information, we can try to brainstorm more ideas. But 60 MiB/s of scrub speed is not entirely out of the question, given a reasonable smattering of things that might be in your way with your particular setup/use case.

tryingtoflash · Mar 14, 2018

Thanks for the reply.

Yes, I understand the SATA III speed has little relationship to the actual speeds I can get. I was just including that information to show that there isn't a problem with the hardware. On the slower computer, all the drives are newer, are SATA III, and are being recognized as such. On the faster computer, the drives are older and slower, but zfs scrubs run much faster.

I've been doing a lot of reading about zfs fragmentation, and I think that is the likely culprit, here. The pool has over a terabyte free, but that puts it at 94%, and zpool list shows 63% fragmentation.

It seems like there ought to be a way to tell for sure. I'd think that if fragmentation was the problem you'd see very high disk utilization while the scrub is running. But when I watched gstat while the scrubs were running, what I saw seem to be the opposite. On the server where scrubs were fast, disks in the pool were running at 600-800 operations per second, while on the slow server, disks seemed to be mostly in the 200-300 operations per second. BTW, CPU utilization on both servers was very low, with top showing 98% idle or better.

DrKK · Mar 14, 2018

I'm sorry sir, did you say your pool is 94% full?

That is the culprit, with probability 100%.

rs225 · Mar 14, 2018

Not sure what version of FreeNAS that is, but 11.1U2 would have the sequential scrub code, with fixes. It should run faster with fragmented pools.

Important Announcement for the TrueNAS Community.

slow zfs scrub, how to find the problem?

tryingtoflash

Cadet

tryingtoflash

Cadet

DrKK

FreeNAS Generalissimo

tryingtoflash

Cadet

DrKK

FreeNAS Generalissimo

rs225

Guru

Similar threads

Important Announcement for the TrueNAS Community.

slow zfs scrub, how to find the problem?

tryingtoflash

Cadet

tryingtoflash

Cadet

DrKK

FreeNAS Generalissimo

tryingtoflash

Cadet

DrKK

FreeNAS Generalissimo

rs225

Guru

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "slow zfs scrub, how to find the problem?"

Similar threads