Why is my scrub taking so long?

Status
Not open for further replies.

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
  • Resilvers, and scrubs, tend to ACCELERATE (for some reason) over time. Thus, the estimates for "time left" often are unreasonably high at the beginning.

That's because at the start it's working its way through all sorts of crappy metadata, but as things progress, then it is hitting file data blocks where there's a lot more opportunity for many-KB-per-block, which boosts reported speeds. One should ignore reported resilver/scrub speeds until at least ~10% done.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
@jgreco
As you can see the scrub starts at 60 mbytes/s and ends up about 8 mbytes/s.
This is my first and only freenas server so i have no prior knowledge of scrubs or any other scrubs to campare.

is this a typical scrub when the pool gets to 80% full and 9% fragmented ?

It could be. What kind of data's on this pool, again? I'm not seeing it. 13TB(*)/400K files suggests an average filesize of ~32MB, but that has to be multiple blocks. If we were to say an IOPS for each 1MB, then that's like 12.8M IOPS or nearly a day and a half in the life of a disk drive.

This becomes dominated by seek speeds very quickly.

(*) I realize the amount of data's less than that.
 

ethereal

Guru
Joined
Sep 10, 2012
Messages
762
95% of the space are video files (about 4600 files).

but in total there are 385,982 files and i think there are many thousand smaller than 128k
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
If there's ones that you can put into a zip or tar file, doing so might reduce the scrub time somewhat.
 

ethereal

Guru
Joined
Sep 10, 2012
Messages
762
files.png


I think the majority of these files are related to my SQL database jail. I have a SSD boot device now - perhaps i can move it to the jail there.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Moving your data to a single drive has risks like if the drive dies, so does all your data.

Out of curiosity, your pool structure... You have what might be two pools, a 6x3TB pool and a 2x1TB + 2x3TB pool? Could you explain the structure a little please. I actually thought we had discussed this before but I don't see it in the previous postings. Or maybe a "zpool status" output would help me understand as well. Configuration is everything.
 

ethereal

Guru
Joined
Sep 10, 2012
Messages
762
the pool with 4 hdds was 4 x 1tb (RaidZ1) and my intention was to replace the 1tb hdds with 3tb hdds when there was a problem with the drives. and once all the 1tb hdds were replaced i'd destroy the pool and make a RaidZ2 with the 4 3tb drives. (2 down 2 to go)


[root@freenas] ~# zpool status
pool: Storage
state: ONLINE
scan: scrub repaired 0 in 28h29m with 0 errors on Wed Apr 13 12:24:36 2016
config:

NAME STATE READ WRITE CKSUM
Storage ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/c8d701ef-d778-11e3-bc69-003067d55a0a ONLINE 0 0 0
gptid/c943beeb-d778-11e3-bc69-003067d55a0a ONLINE 0 0 0
gptid/c9ac29bd-d778-11e3-bc69-003067d55a0a ONLINE 0 0 0
gptid/a2b449d2-53ee-11e4-a54b-002590d51bcc ONLINE 0 0 0
gptid/b5bdff1f-4e6f-11e4-9354-002590d51bcc ONLINE 0 0 0
gptid/af67ec15-bbb0-11e4-8f6d-002590d51bcc ONLINE 0 0 0

errors: No known data errors

pool: Working
state: ONLINE
scan: scrub repaired 0 in 5h5m with 0 errors on Sun Apr 10 05:07:33 2016
config:

NAME STATE READ WRITE CKSUM
Working ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
gptid/5aae55c9-fcca-11e4-81fe-002590d51bcc ONLINE 0 0 0
gptid/981ce82d-efe2-11e4-9d8e-002590d51bcc ONLINE 0 0 0
gptid/82b56d33-d231-11e5-a548-002590d51bcc ONLINE 0 0 0
gptid/105fe507-d23a-11e5-a548-002590d51bcc ONLINE 0 0 0

errors: No known data errors

pool: freenas-boot
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: scrub repaired 0 in 0h0m with 0 errors on Sun Apr 10 03:45:13 2016
config:

NAME STATE READ WRITE CKSUM
freenas-boot ONLINE 0 0 0
gptid/b61600de-c6ae-11e5-a70e-002590d51bcc ONLINE 0 0 0

errors: No known data errors
 

titan_rw

Guru
Joined
Sep 1, 2012
Messages
586
While it's scrubbing at slow speed, run gstat and monitor drive busy percent. If the drives in the pool in question are very busy (near 100%), but the data coming from the drives is relatively low, then you can infer that this particular point in the scrub involves a lot of random reads. Not much you can do at that point besides removing or rewriting data. Not really a big deal. A scrub takes as long as it takes.

I have two pools. The first one is almost all large files that rarely change (mostly media). Scrubbing this pool happens at a fairly uniform 1.5 GB/sec or so.

The second pool is a backup destination for all PC's in the house. Lots of write / deletes happening every day. This pool starts off scrubbing at very low speeds of a couple of MB/sec. It stays like that for quite a few hours until it finally gets past the random access part and speeds up to around 600MB/sec.
 

ethereal

Guru
Joined
Sep 10, 2012
Messages
762
While it's scrubbing at slow speed, run gstat and monitor drive busy percent. If the drives in the pool in question are very busy (near 100%), but the data coming from the drives is relatively low, then you can infer that this particular point in the scrub involves a lot of random reads. Not much you can do at that point besides removing or rewriting data. Not really a big deal. A scrub takes as long as it takes.

I have two pools. The first one is almost all large files that rarely change (mostly media). Scrubbing this pool happens at a fairly uniform 1.5 GB/sec or so.

The second pool is a backup destination for all PC's in the house. Lots of write / deletes happening every day. This pool starts off scrubbing at very low speeds of a couple of MB/sec. It stays like that for quite a few hours until it finally gets past the random access part and speeds up to around 600MB/sec.

i'll try that. thank you. we seem to have similar type pools.

i have moved the mysql jail from my storage pool to working pool. this has reduced the number of files on storage from 385,982 to 33,184 - with under 10% the original number of files - i expect a faster scrub time. obviously slower scrub on working
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
files.png


I think the majority of these files are related to my SQL database jail. I have a SSD boot device now - perhaps i can move it to the jail there.
I see you use "TreeSize" by JAM software. That is another DrKK recommendation. Well done sir.
 

ethereal

Guru
Joined
Sep 10, 2012
Messages
762
I see you use "TreeSize" by JAM software. That is another DrKK recommendation. Well done sir.

it is a very useful tool and is now an ethereal recommendation
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
@ethereal Which pool is causing the slow scrub "Storage" or "Working", I don't recall you mentioning that. If you are trying to scrub both at the same time, that could be causing an issue.
 

ethereal

Guru
Joined
Sep 10, 2012
Messages
762
HTML:
   WORKING              |   Storage
                        |
                        |
Jan 10 - 2h53m          |   21h10m
                        |
Jan 17 - 2h50m          |   21h18m
                        |
Jan 24 - 2h45m          |   20h59m
                        |
Jan 31 - 2h54m          |   21h04m
Mar 13 - 5h19m          |   At 9:30pm 61.54% 12h47m to go
                        |
Mar 20 - 4h30m          |   At 9:30pm 90.91% 2h8m to go
                        |
Mar 27 - 5h15m          |   At 9:30pm 29.61% 51h4m to go - took 67h47m

storage is the problem - both scrubs were always run at the same time
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
While it's scrubbing at slow speed, run gstat and monitor drive busy percent. If the drives in the pool in question are very busy (near 100%), but the data coming from the drives is relatively low, then you can infer that this particular point in the scrub involves a lot of random reads. Not much you can do at that point besides removing or rewriting data. Not really a big deal. A scrub takes as long as it takes.
Great advice and I tested it on my system, it makes you appreciate how hard your drives are working during a scrub.

@ethereal , I'd be curious on what those results are as well. To run this command I'd open an SSH (I like using Putty) and enter the command "gstat -a -I 5s -f ad" and this will average out the results for 5 seconds (you could increase the sample time if you see fit) and filter only on drives starting with the letters "ad" (to remove the pool assigned letter strings). To exit gstat just hit the "q" key. You don't need to run it all the time, just at the beginning and when you see it slow down like crazy and compare the results. With the Interval being 5s or longer, it allows you to take a screen capture and share your results.

Additionally what does your AV-GP hard drive report as the time it should take to complete a SMART extended test? It looks like it's taking almost 9 hours which seems a bit excessive to me but I have not been able to find anything with respect to AV-GP drives for test times. I have had my doubts about using those drives for a NAS but after reading some testing about them, I gave up that path, but now I'm headed back down that path. Almost 9 hours for a SMART long test seems a bit excessive to me, but maybe you are doing some background work you are not aware of. gstat will also tell you that.
 

ethereal

Guru
Joined
Sep 10, 2012
Messages
762
Great advice and I tested it on my system, it makes you appreciate how hard your drives are working during a scrub.

@ethereal , I'd be curious on what those results are as well. To run this command I'd open an SSH (I like using Putty) and enter the command "gstat -a -I 5s -f ad" and this will average out the results for 5 seconds (you could increase the sample time if you see fit) and filter only on drives starting with the letters "ad" (to remove the pool assigned letter strings). To exit gstat just hit the "q" key. You don't need to run it all the time, just at the beginning and when you see it slow down like crazy and compare the results. With the Interval being 5s or longer, it allows you to take a screen capture and share your results.

i will definitely do this during my scrub tomorrow.

Additionally what does your AV-GP hard drive report as the time it should take to complete a SMART extended test? It looks like it's taking almost 9 hours which seems a bit excessive to me but I have not been able to find anything with respect to AV-GP drives for test times. I have had my doubts about using those drives for a NAS but after reading some testing about them, I gave up that path, but now I'm headed back down that path. Almost 9 hours for a SMART long test seems a bit excessive to me, but maybe you are doing some background work you are not aware of. gstat will also tell you that.

i can't see where the drive tells you how long the test took - where can i find this information ? the storage scrub is scheduled for when there are no smart tests running. it is possible that when i started a scrub manually to do some testing a long test may have been running. there is not a lot running in the background normally.

i will have to check my smart test tasks - at least one of the long tests is not being run - i think it is because the device changed name after a reboot (da2 to da0)
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
I too, wondered about his ethereal's use of the AV-GP drives. I've always *thought* that these drives were optimized for recording video surveillance.

Here's a message on another forum that supports that theory - http://forums.anandtech.com/showpost.php?p=34347762&postcount=3

It would be nice to find something official from WD regarding the comment about ignoring bad/weak sectors. I did a quick look, but didn't find anything.

Additionally what does your AV-GP hard drive report as the time it should take to complete a SMART extended test?
 

ethereal

Guru
Joined
Sep 10, 2012
Messages
762
anandtech has no evidence to backup the claim.

these hdd will act like ordinary drives unless the system sets the streaming command.

http://www.wdc.com/wdproducts/library/other/2579-772012.pdf

bottom of page 7

The primary purpose of an HDD is to store data reliably so the correction of any errors in the data before sending it to the host is of paramount importance – data that cannot be corrected (after all error correction attempts are exhausted) is not sent to the host. The only way to guarantee command completion time (while still returning the right number of sectors to the host) is to use the STREAMING command set.
 
Last edited:

ethereal

Guru
Joined
Sep 10, 2012
Messages
762
for one of the av-gp (3tb)

Extended self-test routine
recommended polling time: ( 433) minutes.
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
Where does it say that these drives "will act like ordinary drives ..." From your same document, it does say:

Use the correct HDD for the purpose

"WD AV and WD AV-25 HDDs are specialized for the consumer electronics market, with mechanical and firmware features specifically designed to give quiet, vibration - free operation in DVR applications."


these hdd will act like ordinary drives unless the system sets the streaming command.

The primary purpose of an HDD is to store data reliably so the correction of any errors in the data before sending it to the host is of paramount importance – data that cannot be corrected (after all error correction attempts are exhausted) is not sent to the host. The only way to guarantee command completion time (while still returning the right number of sectors to the host) is to use the STREAMING command set.
 

ethereal

Guru
Joined
Sep 10, 2012
Messages
762
where does it say it doesn't ?

you quoted -

"WD AV and WD AV-25 HDDs are specialized for the consumer electronics market, with mechanical and firmware features specifically designed to give quiet, vibration - free operation in DVR applications."

all that says is that it will be quiet and vibration free
 
Status
Not open for further replies.
Top