Drive will not go offline to replace

Drk · Jan 10, 2014

I have tried both from command and gui. Some background, i rebooted the system after power outage, yes on a ups. Once it came up this drive was missing serial so, smart seem fine but was going to replace with the cold spare and pull drive out and test. I can not get it to go offline and repalce. It tells me the drive is online and working too that worries me a little. This is on a hightpoint controller too so it has been a big pain in the ass anywhys. Any ideas?

NAME STATE READ WRITE CKSUM
Chenini ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
gptid/c64cbd29-2a07-11e3-818b-0015173d5e02 ONLINE 0 0 0
gptid/c6a19988-2a07-11e3-818b-0015173d5e02 ONLINE 0 0 0
gptid/c6f9f17f-2a07-11e3-818b-0015173d5e02 ONLINE 0 0 0
gptid/c7933b9e-2a07-11e3-818b-0015173d5e02 ONLINE 0 0 0
gptid/c7e8bdb3-2a07-11e3-818b-0015173d5e02 ONLINE 0 0 0
raidz1-1 ONLINE 0 0 0
gptid/c846f74f-2a07-11e3-818b-0015173d5e02 ONLINE 0 0 0
gptid/c89febb2-2a07-11e3-818b-0015173d5e02 ONLINE 0 0 0
gptid/c8fb690c-2a07-11e3-818b-0015173d5e02 ONLINE 0 0 0
gptid/c957be73-2a07-11e3-818b-0015173d5e02 ONLINE 0 0 0
gptid/c9b13187-2a07-11e3-818b-0015173d5e02 ONLINE 0 0 0
raidz1-2 ONLINE 0 0 0
gptid/b8fc4f41-7a04-11e3-b27b-0015173d5e02 ONLINE 0 0 0
gptid/cb1ee32e-2a07-11e3-818b-0015173d5e02 ONLINE 0 0 0
gptid/cb8c6d02-2a07-11e3-818b-0015173d5e02 ONLINE 0 0 0
gptid/cbec7049-2a07-11e3-818b-0015173d5e02 ONLINE 0 0 0
gptid/2c6de7cb-79b7-11e3-b27b-0015173d5e02 ONLINE 0 0 0
spares
gptid/ccb53c9c-2a07-11e3-818b-0015173d5e02 AVAIL

zpool offline -t Chenini gptid/c64cbd29-2a07-11e3-818b-0015173d5e02

cannot offline gptid/c64cbd29-2a07-11e3-818b-0015173d5e02: no such device in pool

cyberjock · Jan 10, 2014

Try running a scrub on the pool first. Also, you didn't put the whole zpool status output in CODE or pastebin, so the actual formatting really does matter.

I used to be a fan of Highpoint for home hardware RAID setups. They still are "decent", but I'd never recommend them for ZFS. I wouldn't be the least bit surprised if this was related to your controller. I'm not sure I've seen a Highpoint controller that I would recommend with FreeNAS.

Drk · Jan 10, 2014

Yea i am sure it is, i have had so much trouble with them and zfs, i need to get 2 m1015 just not have not had the cash or to time to move data over. I did the scrub first thing to make sure of that. http://pastebin.com/89V84NXq that should be what you are looking for.

cyberjock · Jan 10, 2014

Umm.. you bigger problems bro...

scan: resilvered 242G in 0h49m with 0 errors on Fri Jan 10 09:29:29 2014

You should NOT be getting zpool status outputs like that. That means something more sinister is going on. You should be getting something like this...

scan: scrub repaired 0 in 20h26m with 0 errors on Fri Jan 3 13:52:28 2014

The fact that any data had to be resilvered is a bad sign. Often a few kb here or there due to small disk errors is normal(but not good). In your case, a scrub resilvered an amazing 242GB. What that means is that 242GB of data didn't have the necessary redundancy in your pool. Naturally this is a very bad and very serious thing. If I were in your shoe's I'd be SERIOUSLY ordering those M1015s while you still have your data.

This looks like yet another example of how Highpoint based pools are just ill conceived.

Drk · Jan 10, 2014

Yea that was me, doing that. I removed the wrong drive and had to rebuild the raid. I was useint he command line and looked at the wrong pool. here i will do a scrub and post that in a sec. It is personal data too so i am not all that worried about it most of it is on backup.

cyberjock · Jan 10, 2014

Ah. User error. ;)

Rule #1: Always use disk serial numbers to determine which disk to replace.
Rule #2: Always shutdown the server so you have problems like what you had. ;)

If you are 100% sure that everything is okay, you could just shutdown the server and remove the disk you want to offline. That'll certainly offline the disk! But, I'd consider this to be a rather dangerous choice as offlining a disk for replacement first is the proper procedure for replacing disks in pools. If ZFS isn't letting you offline a disk for replacement, that usually means there's some kind of redundancy issue with your vdev.

Another note: Being that you have RAIDZ1 for vdevs, there's a very good possibility that when you resilver your new disk you will be unable to complete the resilver without errors(possibly leading to the pool failing during resilver). This is why I have the "RAID5 is dead" link in my signature. We've had quite a few lately, and I've had that in my sig for months hoping people would learn. But too many aren't familiar with the fact that single disk redundancy basically stopped being statistically reliable in 2009 and 5 years later things aren't any better(hint: they are getting worse!)

Drk · Jan 10, 2014

I understand all that raid 5 and why, but really there was no reason in having zraid2 for that few discs there "should" not be a problem. The hot spare should take care of that problem as a fail safe if all works right than i have a cold spare. Yes both of them should always be done but it is a test/backup box so like i said i am not 100% worried about it. I will shut down the box do a test on the drive and than see what happens tonight when i am back at home. One of two things smart failed on the controller not bug shocker there and the drive is bad or the card is being dumb. Thanks for all that. Just do nto understand since it is all showing fine but just will not let me remove the drive in the giu/command.

Rodney

cyberjock · Jan 10, 2014

No, hot spares do NOT remove the issue with having zero redundancy during a rebuild. There is no correlation between the two. The problem is strictly related to UREs and known error rates for disks. And that "should" becomes a "always bites you in the butt" as statistically it turns into a situation where 80%+ have lost data or entire pools. It's simple statistics. If the math says that 80% of the time you will not be able to recover, then 80% of users will lose data. And as you add each disk the % chance of being unable to rebuild a pool grows exponentially.

Not to mention hot-spares do not go online automatically(and never have). It's in the manual that these don't work.

Yeah, the fact that Highpoint doesn't support SMART is a major red flag for "do not use" if you care about your data. For a backup server and someone that wants the risk, great. I'd never ever use Highpoint(and I spent money on an M1015 instead of using one of the 5 highpoints I had at my disposal.

And since you can't do SMART monitoring, that part that I marked in red above for exponential. Yeah, those odds just skyrocketed all over again because you can't even easily predict a failing disk. That's what SMART is supposed to do, but that's exactly what you can't use because Highpoint is crap on FreeBSD. :(

Just an FYI, I'm the unofficial Highpoint guru in these parts. I've been maintaining the highpoint sticky that is around here and I really wish people took my warnings in that thread seriously. There's a major problem with wanting ZFS(it's super redundant.. blah blah blah) and then turn around and use a controller that won't use SMART. There's a serious disconnect with mitigating risk if you somehow think that you can ignore SMART and go with ZFS and think things are better than using a different OS and different file system. But, to each their own.

Good luck!

Drk · Jan 10, 2014

I will just leave it as that, i was just looking for some idea why this happened or how to fix it but i get what i should be doing etc. i understand first i should not be running highpoint for that reason, other yea maybe i should have zriad2 but where does it stop? We can go back and forth all day about that. But the only safe raid is 2 backup types, ie mirror and tape etc. This is data that i have backuped up for the most part, movies that ripped from dvd or blueray, mp3 etc. This all can be got back. If this was a live server it would i would never have this problems being it would be setup right the first time.

The hot spare info i was going by was this http://docs.oracle.com/cd/E19253-01/819-5461/gcvdi/index.html. I guess i was thinking wrong that freenas would do this. I read up on the freenas and hot spares, yea no point if i would have known that i would have use it for a raid2 or cold spare.

Yes i understand that smart is really important to zfs but when i ordered this cards a while go there was hopes in getting it working,etc. That has gone by the side, yes i plan to get the cards when i have money for my project again next month. Than i will rebuild it again :)

I have been running hightpoint and freenas for along time now so i have a fair deal of understanding with them. Cause of the freebsd problems, i do not think i will never use them again just to many problems.

Do not take any of this as me being a dick. I think you for your time and not trying to flame you at all. I knwo you time is important as mine i can not think you enough.

Rodney

cyberjock · Jan 10, 2014

Yeah, unfortunately ZFS has been split with Oracle close-sourcing their versions of ZFS(v29+) and the open source community went to v5000. Their behaviors are different in some places. It's difficult even as a knowledgable ZFS person(and I'm not that knowledgable compared to what there is to know about ZFS) to be able to figure out what is and isn't applicable. Even some documentation for Solaris isn't applicable to FreeBSD. So it's a total nightmare.

The whole "hotspare" thing was supposed to be in FreeBSD 10 as part of the zfsd code. But as someone in the forums has said the zfsd code isn't even in the RC yet so FreeBSD 10 isn't looking too likely right now.

Those cards work well for hardware RAID(or at least good enough for home use like I used them for over 8 years).

Didn't think you were being a dick. Hopefully you didn't think I was being a dick. I just want people to understand the risks they are getting into and too many people buy the "it worked yesterday, it works today, it'll work tomorrow" mindset which is proven wrong almost daily here by someone that loses their data.

Important Announcement for the TrueNAS Community.

Drive will not go offline to replace

Drk

Dabbler

cyberjock

Inactive Account

Drk

Dabbler

cyberjock

Inactive Account

Drk

Dabbler

cyberjock

Inactive Account

Drk

Dabbler

cyberjock

Inactive Account

Drk

Dabbler

cyberjock

Inactive Account

Similar threads

Important Announcement for the TrueNAS Community.

Drive will not go offline to replace

Dabbler

Inactive Account

Dabbler

Inactive Account

Dabbler

Inactive Account

Dabbler

Inactive Account

Dabbler

Inactive Account

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Drive will not go offline to replace"

Similar threads