Unable to Mount Storage

Status
Not open for further replies.
Joined
Apr 17, 2015
Messages
6
Hi,

we have the following issue with one of our freenas Boxes

Freenas version - 9.3 201504100216
Zpool 1 20x 4TB disk - setup with 4 x 5 Disk vdevs
Please see status below when we mount the drive as read only

state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://illumos.org/msg/ZFS-8000-8A
scan: resilvered 10.5G in 7h52m with 56590606 errors on Thu Apr 2 19:40:09 2015
config:

NAME STATE READ WRITE CKSUM
pool1 ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
gptid/93b4b933-b3ab-11e4-9dcc-0cc47a44f42e ONLINE 0 0 0
gptid/949b1fc7-b3ab-11e4-9dcc-0cc47a44f42e ONLINE 0 0 0
gptid/955bd194-b3ab-11e4-9dcc-0cc47a44f42e ONLINE 0 0 0
gptid/961a184c-b3ab-11e4-9dcc-0cc47a44f42e ONLINE 0 0 0
gptid/96da185e-b3ab-11e4-9dcc-0cc47a44f42e ONLINE 0 0 0
raidz1-1 ONLINE 0 0 0
gptid/97978983-b3ab-11e4-9dcc-0cc47a44f42e ONLINE 0 0 0
gptid/9855bdc9-b3ab-11e4-9dcc-0cc47a44f42e ONLINE 0 0 0
gptid/990b7193-b3ab-11e4-9dcc-0cc47a44f42e ONLINE 0 0 0
gptid/99d37070-b3ab-11e4-9dcc-0cc47a44f42e ONLINE 0 0 0
gptid/9a9969e2-b3ab-11e4-9dcc-0cc47a44f42e ONLINE 0 0 0
raidz1-2 ONLINE 0 0 0
gptid/9b617d81-b3ab-11e4-9dcc-0cc47a44f42e ONLINE 0 0 0
gptid/9c244c16-b3ab-11e4-9dcc-0cc47a44f42e ONLINE 0 0 0
gptid/9ce14d78-b3ab-11e4-9dcc-0cc47a44f42e ONLINE 0 0 0
gptid/9da84822-b3ab-11e4-9dcc-0cc47a44f42e ONLINE 0 0 0
replacing-4 ONLINE 0 0 0
gptid/9e6ea11e-b3ab-11e4-9dcc-0cc47a44f42e ONLINE 0 0 0
gptid/1c596109-d968-11e4-97e3-0cc47a44f42e ONLINE 0 0 0
raidz1-3 ONLINE 0 0 0
gptid/9f2e4501-b3ab-11e4-9dcc-0cc47a44f42e ONLINE 0 0 0
gptid/9ff3ee92-b3ab-11e4-9dcc-0cc47a44f42e ONLINE 0 0 0
gptid/a0c598e1-b3ab-11e4-9dcc-0cc47a44f42e ONLINE 0 0 0
gptid/a190bfac-b3ab-11e4-9dcc-0cc47a44f42e ONLINE 0 0 0
gptid/a2597adb-b3ab-11e4-9dcc-0cc47a44f42e ONLINE 0 0 0



The Issue is that we had a Disk Failure so replaced the disk and during the resilvering the Box Crashed

Since then i have been unable to boot as it keeps crashing (Panic) when mount the zpool

I have stuck another USB Disk in to boot from, and can only mount the device as read only.

when running zpool import -f pool name we get the Panic Error

Im Currently runing zpool status -v and will post when finished

I have run a zdb -e pool1 and it fails checking vdev 0

I have also tried an zpool clear -f pool1 with no joy

Any thouhgts

Please not i am using the LSI 9211-8i HBA, this was on firmware version 20 and has been downgrade to 16 as i thought this may have been an issue.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
Since then i have been unable to boot as it keeps crashing (Panic) when mount the zpool
I'm not the person who's going to be able to solve this for you, but I can tell you two things:
  1. I've read enough threads in these forums about pool trouble to know that when the import crashes, you're in deep.
  2. If you want one of the experienced people who might be able to help you to come onboard, you'll save time if you follow the forum rules from the start.
 

krikboh

Patron
Joined
Sep 21, 2013
Messages
209
Running the v20 firmware may have killed the pool.


Sent from my iPhone using Tapatalk
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
You seem to have lost 4 disk. You definitely lost the entire RAIDZ1-4 and you are missing one drive on RAIDZ1-2 which mean you cannot afford to loose another drive on RAIDZ1-2.
I don't know the structure for you vdevs, but you may have lost your entire pool possibly if all RAIDZ1 where stripped.
You will have to wait for the expert to give a more accurate prognosis.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
You seem to have lost 4 disk. You definitely lost the entire RAIDZ1-4 and you are missing one drive on RAIDZ1-2 which mean you cannot afford to loose another drive on RAIDZ1-2.
I don't know the structure for you vdevs, but you may have lost your entire pool possibly if all RAIDZ1 where stripped.
You will have to wait for the expert to give a more accurate prognosis.
Great a observation! There is only 15 disks in your listing and you said you had 20.

Update: reading on my phone makes things complicated when people don't use code tags. There is 20 disks
 
Last edited:

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
You seem to have lost 4 disk. You definitely lost the entire RAIDZ1-4 and you are missing one drive on RAIDZ1-2 which mean you cannot afford to loose another drive on RAIDZ1-2.
I don't know the structure for you vdevs, but you may have lost your entire pool possibly if all RAIDZ1 where stripped.
You will have to wait for the expert to give a more accurate prognosis.
Huh? I see 20 disks. I also see one pool made of 4 vdevs, so yes, if one vdev fails the pool is gone.
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
Huh? I see 20 disks. I also see one pool made of 4 vdevs, so yes, if one vdev fails the pool is gone.
More like 21 disks. But who's counting.
This would indeed be a 5*5 vdevs totaling 25 disks. Is this correct or is your pool messed up because it can't assign proper vdev?
 

Starpulkka

Contributor
Joined
Apr 9, 2013
Messages
179
@ Tom Stevensuk "The Issue is that we had a Disk Failure so replaced the disk and during the resilvering the Box Crashed"
Well that i find an easy to believe as box is going to crash on single hdd failure such multiple raidz1 vdevs pool.

But im kindof curious, as you say you have more freenas boxes, is those others same config as this?

Edit:
I almost get the feeling you dont realize the seriousness of what you have in hand.
Or you have from all data latest working backups in safe.

- i will be downgrading the firmware to 16 on them all - will this cause any issue?
As cyberjock sayed, dry without any lubrication.
 
Last edited:
Joined
Apr 17, 2015
Messages
6
Thanks,
@Starpulkka, We have 3-4 Smaller Box's with 4 Hard Drives in Running for nearlly a year with Freenas 9.2.x, 2 of them have had disks failures and we replaced the drives with no issues, we also have another to big boys like this one, one with 15 disks and the other with 20 disks, both running 9.2.x these have been up for over a year, with no issues.
They is another box with 20 disks running 9.3 which has been running for a couple of weeks.

the Config of the smaller box's is 4 disk raidz1 with a spare drive.

The other big boys are 3.5 Disk or 4x 5 Disk vdev Raidz1 Striped.

I believe some of the Firmware of the HBA in these is version 16, some have version 17 and one has version 20.

i will be downgrading the firmware to 16 on them all - will this cause any issue?

@ apollo - In this box there 24 disk in total - 20 disks used for the Zpool and the other 4 as spare drives, one the disk failed use zpool replace to replace he drive with one of the spare disks.

Would Downgrading to Freenas 9.2 bring the Zpool back online? or upgradeing to version 10?
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
I have a feeling that during the resilver you had a read error in the vdev being resilvered. This would cause the pool to be corrupt. I'm not sure what steps to take to verify this but I hope you have a backup or are quickly making one. This would be an example of a reason to not use single disk parity like raidz1.
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
Thanks,

Would Downgrading to Freenas 9.2 bring the Zpool back online? or upgradeing to version 10?
Downgrading to Freenas 9.2 from 9.3 is not possible if the pool has been upgraded with flags used with the ZFS 9.3 releaese.
It wouldn't matter anyhow.
I am just curious why the resilvering only repaired 10.5G of data. If the entire pool was corrupted it would have rewritten more than that. But I am not an expert.
You could try turning your system off and start it again and see if the missing drives would reconnect.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
More like 21 disks. But who's counting.
This would indeed be a 5*5 vdevs totaling 25 disks. Is this correct or is your pool messed up because it can't assign proper vdev?
We're both counting and your math seems a bit off. I'm counting the disk being replaced as one, which is arguable, but you seem to be finding anywhere from 15-25 disks, even though the OP said "Zpool 1 20x 4TB disk - setup with 4 x 5 Disk vdevs"
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
We're both counting and your math seems a bit off. I'm counting the disk being replaced as one, which is arguable, but you seem to be finding anywhere from 15-25 disks, even though the OP said "Zpool 1 20x 4TB disk - setup with 4 x 5 Disk vdevs"
My math is fine, my understanding is maybe off.
I see the list for RAIDZ1-0 through RAIDZ1-3 with one extra named Replacing-4, so I thought there was a total of 5 RAIDz1 arrays composed of 5 disk each, with one disk missing from RAIDZ2 and 3 missing from Replace-4.

I blame FreeBSD and ZSF for the total lack of proper description.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
My math is fine, my understanding is maybe off.
I see the list for RAIDZ1-0 through RAIDZ1-3 with one extra named Replacing-4, so I thought there was a total of 5 RAIDz1 arrays composed of 5 disk each, with one disk missing from RAIDZ2 and 3 missing from Replace-4.
Replacing-4 isn't a separate array--if OP had posted his zpool status output in code tags or using pastebin it would have been clearer. It's showing a disk in the process of being replaced in raidz1-2.
 
Joined
Apr 17, 2015
Messages
6
Thanks, To confirm their are only 20 disk's
And yes i did Copy and past, It's Clear this end so not sure how you are seeing it.

we have rebooted/Shutdown the Box, a few times without any luck.

What about Downgrade the Firmware on the Other Freenas Box's.

Should we be using Firmware version 16 for All?

Also Why would running firmware version 20 "Kill the Pool?" who is this a Dev Question?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Thanks, To confirm their are only 20 disk's
And yes i did Copy and past, It's Clear this end so not sure how you are seeing it.

we have rebooted/Shutdown the Box, a few times without any luck.

What about Downgrade the Firmware on the Other Freenas Box's.

Should we be using Firmware version 16 for All?

Also Why would running firmware version 20 "Kill the Pool?" who is this a Dev Question?

The driver and firmware must be matched. Nothing else is supported.

FreeNAS 9.3 uses the P16 driver, so you must use the P16 firmware. P20 firmware is known to cause serious issues with the P16 driver.

There may be an update to the P20 driver on the way, but, until then, the P16 firmware is pretty much mandatory.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Yes, unless damage has already been caused.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Uhh.. without the code tags for the zpool command outputs, it's all guesswork.

BUT, assuming you created 5 disks of RAIDZ1 for 5 vdevs: (and assumptions are often the bane of your data)

You've got 2 disks resilvering in the raidz1-4 vdev. No big deal, except you have RAIDZ1, so any error in that vdev is permanent since you have no more redundancy for that vdev.

If the zpool mounting is crashing the box, you are probably screwed. So if you have backups, it's probably a good idea to get those out and dust them off. The odds of you needing them is pretty close to 100%. If you don't have them, polish off your resume, you'll be needing it pretty soon.

If your firmware was v20 and you've reflashed to v16 now that you've figured out your zpool is corrupted and having problems, you waited much too long. Couple using v20 firmware with a RAIDZ1 and you've basically told the ZFS gods "give it to me hard, fast and without lube".

I'd say unless you're about to add information to this thread that totally invalidates information you've already provided, I think you're on a ship that's sunk. Recreate the zpool, DO NOT USE RAIDZ1!!!!!!, and pay more attention to warnings in my sig that say RAIDZ1 is dead and warnings that the webgui gives you. Warnings are there to warn you of very nasty things you should know about. It should have never been ignored.
 
Status
Not open for further replies.
Top