Slow scrubs and resilvers on a FreeNAS mini

redoak42 · Jan 10, 2016

Hi all,

We have a pair of FreeNAS Minis, made by iXsystems. One is primary, and the other one gets replicated to nightly . The current storage config on them is 4x6tb WD RED in a RAIDZ1, lz4 compression, no dedupe. We're running FreeNAS-9.3-STABLE-201512121950, and the Minis have 32GB of ram, and an Atom C2750. We had a disk go bad in one (the backup) and the resilver is taking a very long time (about five days so far):

zpool status
pool: backup
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Tue Jan 5 17:45:18 2016
8.45T scanned out of 8.91T at 1.72M/s, 78h18m to go
2.11T resilvered, 94.80% done

Short SMART checks show the disks are good, and there have been no r/w/c errors during the rebuild. The CPU is loafing, and the load (as reported by top) is a pretty steady .5 . The system is using 6.5 TB and has 8.1 TB left.

Running iostat -xc 10 shows the %b (time busy?) on the three "good" disks as 100% a lot of the time, and the disk the is rebuilding is much lower. Does this indicate what I hope it doesn't, which is my disks are just working as fast they can, and there's nothing much I can do?

This isn't the first time a scrub or resilver has taken many days on one of these systems. Anyone have any insight into why, and what can be done to speed things up?

Warmest regards,

Jordan

Ericloewe · Jan 10, 2016

Is being actively used? ZFS will throttle the resilver to allow for client requests to be handled.

That's also a rather full pool. That's never good for performance. You might want to upgrade (at least the primary system) to something larger that can handle more drives, to allow for a larger pool. The backup should be able to move along with the shingled 8TB drives.

redoak42 · Jan 10, 2016

The system is not being actively used. Right now, the only thing it should be doing is rebuilding. I don't think the pool is that full either (probably could have been clearer in how I described it) . There is 6.5 TB used and 8.5. TB free, meaning I'm less than 50% utilized at the moment.

Ericloewe · Jan 10, 2016

redoak42 said:
The system is not being actively used. Right now, the only thing it should be doing is rebuilding. I don't think the pool is that full either (probably could have been clearer in how I described it) . There is 6.5 TB used and 8.5. TB free, meaning I'm less than 50% utilized at the moment.

Ah, misread that as 6.5 free out of 8.5...

Well, the next likeliest option is a second dying drive. Have you monitored the drives properly with regular long tests?

redoak42 · Jan 10, 2016

No, I'm doing short only at this point (they report fine). I'll start doing weekly longs.

BigDave · Jan 10, 2016

I had one question. Did you buy the machines as diskless?
The reason I ask is that perhaps the mini's have the motherboards that have the Marvel controllers
and if you are using those SATA ports for your disks, this may have something to do with your issues
of slow speeds.

redoak42 · Jan 10, 2016

I'm not quite sure if this is the answer to the question you're asking. I bought the systems from iXsystems with disks included, but that just means the disks (wd red) are in the box, and we put them in the caddies, an loaded them into the system to use. So we have 4 WD RED 6TB disks (provided by iXsystems) in the system currently.

redoak42 · Jan 10, 2016

the %b on iostat makes me think that I'm just at the limit of what the system can do. There are (I'm told) lots of small files on these systems, but I don't know enough about scrubs and resilvers to be able to tell if file size has anything to do with rebuild and scan time.

BigDave · Jan 10, 2016

redoak42 said:
I'm not quite sure if this is the answer to the question you're asking. I bought the systems from iXsystems with disks included, but that just means the disks (wd red) are in the box, and we put them in the caddies, an loaded them into the system to use. So we have 4 WD RED 6TB disks (provided by iXsystems) in the system currently.

Yes, that tells me that you did not switch the data cables (inside the box) to different SATA ports before installing your drives.
Have you contacted iX Systems for help, or has the warranty expired?

redoak42 · Jan 10, 2016

Both systems are under warranty. I was under the impression that on the Minis the warranty was pretty bare bones (they'll replace components when they go bad), but you are right, I should give it a shot. I will, and will report back.

The Minis are really nice for our application, which is an office on the West coast that has very few people, moderate performance needs, but quite a bit of data that grows slowly, but steadily. I can even zfs send back to our freenas install on the east coast nightly, (got to love zfs send/recv).

redoak42 · Jan 11, 2016

I got in contact with support, and they were competent and friendly, and checked for hardware issues. They weren't able to find any, and that is about as far as freenas mini support goes. When I run a systat -vm, it shows the three healthy disks at about 100% utilization, and the rebuilding disk is pretty much idling. The system isn't really doing anything except rebuilding at this point, so my current conclusion (I hope I'm wrong and there is a config setting I can tinker with, but I can't find it) is that I've reached some sort of performance limit on this system config with the type of data we have on disk, and that rebuilds are going to take 6-7 days. I'm stuck with 6tb disks is a RAIDz1 for now, but when the 8tb WD RED come out (dunno when) I'll rebuild as a striped mirror (mirrored strip?) or raidz2, since 6-7 days rebuild is asking for trouble on a raidz. At least I have the two systems replicating to each other.

jgreco · Jan 11, 2016

redoak42 said:
the %b on iostat makes me think that I'm just at the limit of what the system can do. There are (I'm told) lots of small files on these systems, but I don't know enough about scrubs and resilvers to be able to tell if file size has anything to do with rebuild and scan time.

Yes. If you have lots of small files, that's probably it.

The way a ZFS scrub or resilver works is based on a metadata traversal of the pool, which means that it is effectively crawling the entire pool. This is unlike a traditional RAID5 controller which just does a linear traversal of the LBA's on the disk, without any knowledge (or need-to-know) of the layout of the filesystem.

This allows ZFS to go very fast on pools with large files and pools that aren't particularly full, but it stinks for lots of small files.

Your next suggestion might be that this is a bug in ZFS. Yes and no. If you look at the way RAIDZ works, for example, it becomes clear that this is a much more complicated issue due to the clever way ZFS stores its RAIDZ data. It's necessary to have some mechanism to identify which sectors in use, where the data blocks are, and to compute the parity as needed. We get a lot of benefits from RAIDZ but this isn't one of them.

redoak42 · Jan 11, 2016

Thanks for the explanation/confirmation jgreco. I'll take the slow rebuild and scrub times with small files and work around them for the other benefits ZFS provides.

Important Announcement for the TrueNAS Community.

Slow scrubs and resilvers on a FreeNAS mini

redoak42

Dabbler

Ericloewe

Server Wrangler

redoak42

Dabbler

Ericloewe

Server Wrangler

redoak42

Dabbler

BigDave

FreeNAS Enthusiast

redoak42

Dabbler

redoak42

Dabbler

BigDave

FreeNAS Enthusiast

redoak42

Dabbler

redoak42

Dabbler

jgreco

Resident Grinch

redoak42

Dabbler

Similar threads

Important Announcement for the TrueNAS Community.

Slow scrubs and resilvers on a FreeNAS mini

Dabbler

Server Wrangler

Dabbler

Server Wrangler

Dabbler

FreeNAS Enthusiast

Dabbler

Dabbler

FreeNAS Enthusiast

Dabbler

Dabbler

Resident Grinch

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Slow scrubs and resilvers on a FreeNAS mini"

Similar threads