Scrubs / SMART tests frequency

Meyers

Patron
Joined
Nov 16, 2016
Messages
211
I initially followed a guide on these forums to set up scrubs for my pools on the 1st and 15th and SMART long and short tests throughout the month. For a variety of reasons I was thinking about running the scrub on the 1st and maybe just one long SMART test and two short SMART tests a month (the long tests take over a week now!). Anything more seems a bit overkill especially on a fairly heavily loaded production server.

Thoughts?
 

Meyers

Patron
Joined
Nov 16, 2016
Messages
211
I think I will simply set up one SMART long and one scrub a month like I said. I just wanted to get some input on this.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
If the system is heavy use then a single scrub a month is fine. Reading a file is the same as a scrub. Smart long one time a month is ok also but I would try for 2x a month. I would run short test more frequent because they are free and don't affect the overall system much.
 

Meyers

Patron
Joined
Nov 16, 2016
Messages
211
If the system is heavy use then a single scrub a month is fine. Reading a file is the same as a scrub. Smart long one time a month is ok also but I would try for 2x a month. I would run short test more frequent because they are free and don't affect the overall system much.

I get scrubs. It's just that with such a busy server that scrubs and long tests are running into each other so I'm rethinking my configuration.

Any reasoning behind doing 2x SMART longs? Just curious.

Also, they seem to be taking a lot longer in recent updates than before. I don't get why. Load hasn't gone up. In fact, due to some heavy caching it's probably gone DOWN. I have longs scheduled on the 8th & 22nd of the month and they will run into the 1st and 15th scrubs.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
My smart long tests take ~16h on my 8TB drives. Smart long tests are impacted by how busy the disks are. Can you run the smart tests over a weekend at night? Really the smart tests just give you warnings about failing drives. So it all depends on how soon you want to know about a failing drive.
 

Meyers

Patron
Joined
Nov 16, 2016
Messages
211
My smart long tests take ~16h on my 8TB drives. Smart long tests are impacted by how busy the disks are. Can you run the smart tests over a weekend at night? Really the smart tests just give you warnings about failing drives. So it all depends on how soon you want to know about a failing drive.

This is a production system that's busy 24x7. We have drops in traffic in the early AM of course but that wouldn't matter here. Long tests run over a week now.

For now I've adjusted my testing schedule as follows:

Scrub: 1st
SMART long: 15th
SMART short: 14th, 29th

I had longs running on the 8th and 22nd before. The last long ran into the 15th so I had to pause the scrub to let it finish.

Plus with 11.2 I've had to increase vfs.zfs.scrub_delay again from 20 to 40 otherwise the system becomes unusable. Scrubs take about 3-4 days now on 14.5TB.

More research is probably needed but for now I'll go with this schedule.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
This is a production system that's busy 24x7. We have drops in traffic in the early AM of course but that wouldn't matter here. Long tests run over a week now.

For now I've adjusted my testing schedule as follows:

Scrub: 1st
SMART long: 15th
SMART short: 14th, 29th

I had longs running on the 8th and 22nd before. The last long ran into the 15th so I had to pause the scrub to let it finish.

Plus with 11.2 I've had to increase vfs.zfs.scrub_delay again from 20 to 40 otherwise the system becomes unusable. Scrubs take about 3-4 days now on 14.5TB.

More research is probably needed but for now I'll go with this schedule.
Wow I scrub 60TB used 109TB total in 6h. What is your hardware? Something seems strange. What workload?
 

Meyers

Patron
Joined
Nov 16, 2016
Messages
211
File server for a high traffic website. Hardware in sig (production systems). Like I said, I had to dial down the scrubs because otherwise it affects the site.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Can't see signatures so I guess I'm done helping you.
 

Meyers

Patron
Joined
Nov 16, 2016
Messages
211
Can't see signatures so I guess I'm done helping you.

Motherboard: Supermicro X10DRH
CPU: Xeon E5-2620 v3 @ 2.40GHz (12 cores)
Ram: 64GB ECC
OS disk: 2 x Kingston DataTraveler 3.0 64GB mirrored
Data disk: 12 x HGST Ultrastar 7K6000 4 TB SAS 12Gb/s (6 x 2 disk mirrored vdevs)
FreeNAS 11.2-U1
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
If the scope of the read workload is the entire 14.5T or so you have then it's entirely possible that you're just dealing with outright disk contention, and the scrub just represents too much active workload. What's the average read rate you're hitting from the unit, what's your ARC hit%, and how busy are the disks in iostat/gstat?
 

Meyers

Patron
Joined
Nov 16, 2016
Messages
211
No where near the entire 14.5TB. I asked about this and was told it was a recent change to scrub performance in 11.1 or 11.2 so changing vfs.zfs.scrub_delay was the right thing to do.

Average read rate: 277 Mbps (looking at the network graphs for the past week because there doesn't appear to be a better way to gauge this)
ARC hit ratio (past week): 99%
Busy: since boot, about 23% (23-25% past few minutes)
 
Top