zpool Showing Errors After Drive Replacement

Status
Not open for further replies.

NicCrockett

Dabbler
Joined
Aug 1, 2013
Messages
20
Last week my server detached two drives from the RAID controller. I was able to restore the controllers configuration without replacing the drives. However, after the system had been up for a short time one of the drives died. I've replaced the bad drive with a new drive. The system says that the storage, datasets, and drives are healthy and online, but running a zpool status -xv shows me that there are corrupted files. The system is also still showing a warning message about data corruption (message below). I've fixed the files that are actually files of mine, but the rest look like system files. How do I clear the system of these errors?

Warning Message:
WARNING: The volume PegasusVolume (ZFS) status is ONLINE: One or more devices has experienced an error resulting in data corruption. Applications may be affected.Restore the file in question if possible. Otherwise restore the entire pool from backup.
 

Attachments

  • Screenshot (1).png
    Screenshot (1).png
    28.4 KB · Views: 291

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
You can't. Your pool has suffered irreparable damage to the pool's metadata. In effect, your pool could fail you at any time and you'll never be able to access the pool again. You need to do 2 things to solve the problem:

1. Get rid of that damn RAID. That's a recipe for nothing but problems. We keep telling people not to use RAID controllers and they keep ignoring the warnings.
2. Destroy and recreate your pool, and this time use RAIDZ2. Again, RAIDZ1 is a recipe for nothing but problems. We keep telling people not to use RAIDZ1 and they keep ignoring the warnings. Its been such a problem I've put a damn warning in my sig... and people still do it!

The corruption you currently have on your pool cannot be fixed. Proper management of your hardware/disks, ZFS format, and ensuring pool resiliency is suppose to prevent the kind of corruption you have. But, due to your mistakes I've mentioned above you have failed ZFS. I'd seriously consider destroying and recreating your pool from backup sooner than later!
 

NicCrockett

Dabbler
Joined
Aug 1, 2013
Messages
20
Cyborjock, please show me to this perfect utopia that you live in because unfortunately I live in the real world. You know the place where things cost money and budgets don't comprehend what's best. This is what I have to work with. I can make the most of it or I can roll over, say it can't be done, and put the company I work for at risk. FYI, I'm not using RAID. I have 4 drives on a RAID controller. They are setup as individual disks per what I found in the documentation and forum recommendations at the time I built the server. Also, RAID 5 isn't dead, it's just not recommended. I have several servers in the decade old range that are RAID 5 and are running just fine. Before anyone goes off on me about why I shouldn't use RAID 5 stop. I understand the reasons and I agree with you. Again, I live in the real world where money doesn't grow on trees. Trust me, I would love to replace my entire infrastructure with new and shiny things, but this is what I have to work with.
 

survive

Behold the Wumpus
Moderator
Joined
May 28, 2011
Messages
875
Hi NicCrockett,

If that system you asked about is holding company data then I'd say the company you work for is already very much at risk.

That said, everything you wrote above pretty much avoids the problem you initially asked about. Really the only thing you can do is dump the data off and rebuild the pool. While you are at it (and the surviving data is somewhere safe) I would address whatever caused your controller to drop 2 drives.....my guess is you have a RAID card presenting individual disks to FreeNAS, which isn't as bad as using proper hardware RAID it is still far from ideal.

-Will
 

NicCrockett

Dabbler
Joined
Aug 1, 2013
Messages
20
Will, thanks for the reply! The system is holding backups of my other servers. So losing the data isn't a problem as long as another server doesn't go down. I can move most of the backups off to a temporary location and then rebuild the pool. Fixing the RAID card might be problematic, but I'll look into it. Yes, the RAID card is presenting the disks as individual disks. Is there any better way for me to set them up using the same equipment or is this the best I'm going to get out of this server?

Thanks,
Nic
 

survive

Behold the Wumpus
Moderator
Joined
May 28, 2011
Messages
875
Hi NicCrockett,

The best thing you could do would be to replace the RAID card with a plain old dumb SAS HBA that presents the disks to the system directly. That might be kind of complex on a 2950 due to how the disk backplace connects to the existing RAID controller (what sort of cable\connector does it use). Best case there could be an SFF-8087 cable (or 2) you could simply plug directly into an IBM M1015 controller (~$125 on the ebay) in which case you could be up & running in the time it takes to swap the card.

Now that I think about it Dell should have a similar card, the SAS/6iR (http://accessories.us.dell.com/sna/...x?sku=341-9536&?~lt=popup&c=us&cs=&l=en&s=dfb) that might fit better. You would want to flash it to what's called "IT" mode that dumbs the card down by removing the build in RAID code.

-Will
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Cyborjock, please show me to this perfect utopia that you live in because unfortunately I live in the real world. You know the place where things cost money and budgets don't comprehend what's best. This is what I have to work with. I can make the most of it or I can roll over, say it can't be done, and put the company I work for at risk. FYI, I'm not using RAID. I have 4 drives on a RAID controller. They are setup as individual disks per what I found in the documentation and forum recommendations at the time I built the server. Also, RAID 5 isn't dead, it's just not recommended. I have several servers in the decade old range that are RAID 5 and are running just fine. Before anyone goes off on me about why I shouldn't use RAID 5 stop. I understand the reasons and I agree with you. Again, I live in the real world where money doesn't grow on trees. Trust me, I would love to replace my entire infrastructure with new and shiny things, but this is what I have to work with.

Sorry, you've already put the company at risk with using the RAID controller. Passthrough is NOT a solution for using RAID controllers. It's just a bad idea. Consider the cost of doing it right or losing your pool. I just spent the last 4 days trying to recover the pool from someone that used a RAID card, but did passthrough. Guess what? They have no data now! Their pool was horribly broken. You know why their pool was lost? They did passthrough but were unable to run SMART tests, SMART monitor, and the RAID card doesn't tell the OS when a disk detaches itself. There's even more reasons why RAID controllers are bad, but hopefully you get the idea. As I told someone a year or so ago, if you worked for me and I asked you to build me a ZFS server and you included a hardware RAID controller I'd have to fired from my business, immediately. That kind of mistake is far far too dangerous for a business to ever consider doing. And you clearly aren't putting the business first if you are going to make such horrible choices like using a hardware RAID card with ZFS. If you want to do it with your personal data, feel free. I won't lose sleep over your lost bits though.

As for the real world.. I hear you. I'm a disabled vet, and money is tight, so you don't need to give me a lesson on reality. I've lived in the reality for years, and I know what it's like. I also know that ZFS will NOT forgive you for whatever mistake you made because you couldn't afford it. You either do it right or you shouldn't use ZFS. If that's too much to swallow, feel free to use another OS that isn't ZFS. It's not like we sit here and try to force people to spend money.

You need to realize ZFS doesn't forgive you for making bad decisions, no matter what reason or excuse you wan't to come up with. There are NO ZFS tools in existence, so the consequence of doing things that are stupid is high. You'll have a false sense of security today, and tomorrow your data will be gone. No warning, no running some tool to get your data back, it's just 100% gone. Yes, this is exactly what happens to people. Yes, this is exactly what will continue to happen to people that want to come up with an excuse for doing it wrong.

Sorry if it seems harsh, ZFS can be harsh. It's not your standard file system. And as such, it doesn't require the same hardware as your standard file system. Like I said, if these requirements are too steep consider going back to Windows or whatever OS you are natively familiar with.

Edit: And to be honest, I'm really not sure why you need this all explained to you. This has been covered in the forums at least 100 times, it's in my presentation, and its in the manual. Are you really going to sit there and still argue that you just gotta use RAID "because"?
 
Status
Not open for further replies.
Top