Extensive Scrub Duration

cyberjock · May 4, 2014

NTFS isn't recommended for data that is important. There's a ticket in bugs.freenas.org to make the importing be read-only in the future because of how many NTFS partitions it has eaten. It's not a FreeNAS problem, it's a FreeBSD problem. :/

joelmusicman · May 5, 2014

Did anyone else notice that he has a i5-2500K processor (which means no ECC support)?

cyberjock · May 5, 2014

Yes, but I've given up on trying to recommend ECC anymore. If people don't want to read the well-written documentation we already provide I'm not going to waste my time further by forcing them to drink. :/

joelmusicman · May 5, 2014

I think there's a ticket in for this already but it'd be great if FreeNAS would check for ECC functionality and warn if it isn't active. That way it isn't just a bunch of guys telling people on forums about it. :)

Another thought: Take it one step further and add memtest functionality in the GUI that can be scheduled just like SMART tests & scrubs!

cyberjock · May 5, 2014

You can't check for ECC functionality. Been discussed many many times. There are ways you can test *some* intel-based systems with *some* chipsets, but each chipset has it's own way of testing for it, and it's not even shared across motherboard manufacturers.

amitkhas · May 5, 2014

I did not know about ECC support. When I purchased the hardware 2 years back, I was not aware of the recommendation for ECC.

I don't think I ever looked into it either. I didn't even know i5-2500 did not support ECC. I thought as long as you had ECC supported memory, that was sufficient. I suppose I figured that other hardware are typically ECC compliant. Apparently not :(

Ericloewe · May 5, 2014

amitkhas said:
I did not know about ECC support. When I purchased the hardware 2 years back, I was not aware of the recommendation for ECC :(

I don't think I ever looked into it either. I didn't even know i5-2500 did not support ECC :( I thought as long as you had ECC supported memory, that was sufficient. I suppose I figured that other hardware are typically ECC compliant. Apparently not :(

It's an artificial limitation, so don't feel stupid. Intel limits ECC to server chipsets (even though the chipset has no interaction with RAM these days) with Xeon (or some i3 and Celeron models) CPUs. AMD is more liberal, but most motherboards don't support ECC.

amitkhas · May 15, 2014

So I've managed to back up the most of the critical data to another drive. However, I'm unable to grab everything since some files are not able to fully transfer. It'll attempt to, but after some time, it'll fail. Most likely, it's because of the SAMBA/Windows that Cyberjock mentioned.

As cyberjock previously mentioned, I should connect another drive directly to the system and copy the files that way. Since I don't have any more SATA slots, I am forced to use a USB drive. First, how do I get USB drive to be detected by FreeNAS? Does the drive have to formatted a certain way? Once FreeNAS sees the drive, how do I copy the files over to that drive? There is probably a command I need to issue via Shell.

Also, this is a really noob question: With the hard drives failing (i.e. having lots of bad sectors), could the ZFS scrub potentially correct this? It would take 200 hours to run, probably because of the number of bad sectors, but I wouldn't mind leaving it on if it means it may repair the disks.

Also, another very noob question: Based on the code print outs of the SMART values, you mentioned 4 disks are "failing." What do you mean by failing? There are a high number of bad sectors? I suppose, I am still surprised that 4 of 6 drives are failing when they are only 2 years old. They were not running 24/7. Only occasionally - perhaps 30hrs a week.

amitkhas · May 19, 2014

Anyone have any ideas on this?

Robert Trevellyan · May 22, 2014

I have a suggestion for a completely different approach. The idea is to try to rescue the contents of each failing drive independently, using GNU ddrescue.

Buy four replacements for your failing drives. Each replacement must be at least as large as the drive it will replace.
Download and burn a bootable CD of Ubuntu Rescue Remix, or install GNU ddrescue on an existing Ubuntu system.
Use ddrescue one disk at a time to clone each failing drive to a new drive.
Boot your system with the two old drives and the four clones and run a ZFS scrub.
Backup your data.
Create a new RAIDZ2 pool.
Restore your data.

I'm suggesting this because I've had great results rescuing data from failing drives using ddrescue.

cyberjock · May 22, 2014

That could really screw ZFS up because multiple disks will appear to be the exact same disk to ZFS.

Robert Trevellyan · May 22, 2014

How so?

Did you think I was proposing making multiple clones of the good disks? That's not what I suggested (in step 3).

amitkhas · May 28, 2014

Hi All,

I figured I would keep everyone updated with the status.

Using FTP, I managed to back-up the critical data. Of the 2.6 TB, 600GB was critical, and the other 2 TB was important but not critical. It took a VERY long time (~100 hrs to back up 600 GB). The transfer rates were terribly slow, but it did transfer. Backing up the other 2 TB (less important), would take an exorbitant amount of time, and that data isn't worth the time and effort required to rescue it.

I purchased 4x 2 TB WD RED disks. Since I backed up the critical stuff, I figured I would try to rebuild the RAIDz1. I replaced the disk with the highest unreadable sectors. The resilver rate is 4.5 MB/sec. That means, it will take approximately 150 hours (yikes!).

Is this a typical resilver rate for a 6 disk RAID-Z1?

Ericloewe · May 28, 2014

amitkhas said:
Hi All,

I figured I would keep everyone updated with the status.

Using FTP, I managed to back-up the critical data. Of the 2.6 TB, 600GB was critical, and the other 2 TB was important but not critical. It took a VERY long time (~100 hrs to back up 600 GB). The transfer rates were terribly slow, but it did transfer. Backing up the other 2 TB (less important), would take an exorbitant amount of time, and that data isn't worth the time and effort required to rescue it.

I purchased 4x 2 TB WD RED disks. Since I backed up the critical stuff, I figured I would try to rebuild the RAIDz1. I replaced the disk with the highest unreadable sectors. The resilver rate is 4.5 MB/sec. That means, it will take approximately 150 hours (yikes!).

Is this a typical resilver rate for a 6 disk RAID-Z1?

Well, 6 disks is a suboptimal configuration for RAIDZ1 which does lead to slowdowns, but it still sounds absurdly low, so it's probably because of the failing disks.

amitkhas · May 28, 2014

I started the "replace disk" last night. It's about 11% complete.

This is a bit of a noobish question:

When I use the shell, I have the view settings to the maximum size. However, some of the text scrolls through after a command. There are no scroll bars. How do I view the full text (i.e. scroll up)?

Ericloewe · May 28, 2014

amitkhas said:
I started the "replace disk" last night. It's about 11% complete.

This is a bit of a noobish question:

When I use the shell, I have the view settings to the maximum size. However, some of the text scrolls through after a command. There are no scroll bars. How do I view the full text (i.e. scroll up)?

I'm not sure you can, but you should be able to do so with an SSH client like PuTTY.

danb35 · May 29, 2014

You can also stop the output every page by adding | more to the end. Like this:

Code:

# smartctl -a /dev/ada0 | more

Or you can save the output to a file like this:

Code:

# smartctl -a /dev/ada0 > somefile.txt

amitkhas · May 30, 2014

The resilvering is taking an immense amount of time. At this point, I want to stop the resilvering and simply rebuild the pool. All data has been backed up.

How do I stop the resilvering?

cyberjock · May 30, 2014

Then stop the scrub (zpool scrub -s <poolname>) and destroy the pool. :P

amitkhas · May 30, 2014

It wouldn't let me stop the scrub because it said that the resilvering was in progress.

I ended up shutting down the system from the GUI. Now it won't book up because of ada4 (one of the failing disks).

Can I simply replace the failing disks and create a pool? Or will it give errors because 2 of the drives will be from the previous RAIDZ1 pool?

Important Announcement for the TrueNAS Community.

Extensive Scrub Duration

Inactive Account

Patron

Inactive Account

Patron

Inactive Account

Dabbler

Server Wrangler

Dabbler

Dabbler

Pony Wrangler

Inactive Account

Pony Wrangler

Dabbler

Server Wrangler

Dabbler

Server Wrangler

Hall of Famer

Dabbler

Inactive Account

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Extensive Scrub Duration"

Similar threads