Testing 8.0.1 - ZFS - Disk Replacement

Status
Not open for further replies.

newmember

Cadet
Joined
Aug 28, 2011
Messages
8
I have two mirrored drives with a spare for 3 x 3TB drives.
All drives are hot swapable.
When I pull ada4p2 for example the status of the pool stays the same with no degregation. When I pull drive from the first pair the same result no degregation.

My intent is to replace the HDD with the available spare.

I try to replace the HDD through the FN-GUI and i receive an error by selecting "Replace" for the failed drive on the "View Disks" page and I get this error "Sorry an error occured"

Shouldn't I be able to replace the failed HDD this way?



Even after 10min with the two drives removed I still have the following zpool status:


Code:
[root@freenas-2] ~# zpool status
  pool: tank1
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank1       ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            ada0p2  ONLINE       0     0     0
            ada1p2  ONLINE       0     0     0
        spares
          ada2p2    AVAIL

errors: No known data errors

  pool: tank2
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank2       ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            ada3p2  ONLINE       0     0     0
            ada4p2  ONLINE       0     0     0
        spares
          ada5p2    AVAIL

errors: No known data errors
 

firestorm99

Dabbler
Joined
Sep 17, 2011
Messages
22
Can confirm this. Even when pulling out 2 drives on a raidz-1 status stays healthy. I wonder how an email notification in case of drive fail should work then?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Did you try to read any data from the NAS and what version are you running?
 

ProtoSD

MVP
Joined
Jul 1, 2011
Messages
3,348
Email notifications are screwed up again in 8.01 Release, I'd try 8.01 RC2.

A bunch of people here have tried the same 'test' of pulling cables on their drives for testing and have had the same results. I think it's still being worked on. I recall a ticket about it. It might actually be a problem with FreeBSD that hadn't been fixed in 8.2, not sure.
 

Durkatlon

Patron
Joined
Aug 19, 2011
Messages
414
This is not an email problem though, is it? It seems that in this case when you pull a drive, "zpool status" still gives the all-clear. That sounds more a ZFS issue, or perhaps caused by the fact that all the drives in the pool were asleep when the drive was pulled? I'm not sure if ZFS can correctly ascertain the pool status if it has never been read from since the drive disappeared.
 

Milhouse

Guru
Joined
Jun 1, 2011
Messages
564
Even if email notifications were working it's unlikely it would help here, as the volumes aren't being flagged as degraded even though one of the drives has been removed. This is a major issue and should be addressed as the #1 priority by the FreeNAS developers.

What good is FreeNAS if it isn't able to recognise when a disk has failed? All other problems tend to pale into insignificance. ;-)

Disk failure detection and, subsequent to that event, disk replacement needs to be 100% fool proof (GUI) and bullet-proof reliable (back-end). So far we're a long way from that, and it's one of the main reasons why - IMHO - FreeNAS 8.0.1 is far from "Production Ready".
 

Milhouse

Guru
Joined
Jun 1, 2011
Messages
564
This is not an email problem though, is it? It seems that in this case when you pull a drive, "zpool status" still gives the all-clear. That sounds more a ZFS issue, or perhaps caused by the fact that all the drives in the pool were asleep when the drive was pulled? I'm not sure if ZFS can correctly ascertain the pool status if it has never been read from since the drive disappeared.

When the disk is removed, it should be raising events - via camcontrol? - which should "tip off" the system that all is not well. Currently, these seem to go unnoticed. A scrub or reboot is often required before ZFS realises it's lost a disk and marks the volume as DEGRADED.

I just pulled a disk (/dev/da4) connected to a 9211-8i controller, and this is what I see in /var/log/messages:
Code:
Oct  4 23:53:01 freenas kernel: mps0: mpssas_remove_complete on target 0x0004, IOCStatus= 0x0
Oct  4 23:53:01 freenas kernel: (da4:mps0:0:4:0): lost device
Oct  4 23:53:01 freenas kernel: (da4:mps0:0:4:0): Synchronize cache failed, status == 0xa, scsi status == 0x0
Oct  4 23:53:01 freenas kernel:
Oct  4 23:53:01 freenas kernel: (da4:mps0:0:4:0): removing device entry


and after pulling the disk (da4 is the first disk in the second vdev) zpool status shows...

Code:
[root@freenas] /var/log# zpool status
  pool: share
 state: ONLINE
 scrub: none requested
config:

        NAME                                            STATE     READ WRITE CKSUM
        share                                           ONLINE       0     0     0
          raidz1                                        ONLINE       0     0     0
            gptid/072310ef-bc9e-11e0-aa03-001b2188359c  ONLINE       0     0     0
            gptid/07e00db6-bc9e-11e0-aa03-001b2188359c  ONLINE       0     0     0
            gptid/08964a9b-bc9e-11e0-aa03-001b2188359c  ONLINE       0     0     0
            gptid/09544ae0-bc9e-11e0-aa03-001b2188359c  ONLINE       0     0     0
          raidz1                                        ONLINE       0     0     0
            gptid/1df2aa9d-bc9e-11e0-aa03-001b2188359c  ONLINE       0     0     0
            gptid/20768e35-bc9e-11e0-aa03-001b2188359c  ONLINE       0     0     0
            gptid/22e0fbfa-bc9e-11e0-aa03-001b2188359c  ONLINE       0     0     0
            gptid/2588c9d5-bc9e-11e0-aa03-001b2188359c  ONLINE       0     0     0

errors: No known data errors
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Wonder if this is due to patching to hide DRQ and other problems encountered with NanoBSD compatibility issues.
 

firestorm99

Dabbler
Joined
Sep 17, 2011
Messages
22
Did you try to read any data from the NAS and what version are you running?
yes i did.
it's a raidz1 with 5 2TB drives.

removed first hdd: no problems accessing shares.
removed second hdd: no able to access shares anymore.

it wasn't possible to replace disk via webgui.

version: freeNAS 8.0.1 amd64
 

Durkatlon

Patron
Joined
Aug 19, 2011
Messages
414
It seems this should be the absolute #1 priority on the list of things to fix...
 

ProtoSD

MVP
Joined
Jul 1, 2011
Messages
3,348
I agree, it should be a top priority. If you dig through the tickets, one of developers was trying to help troubleshoot the problem. *I THINK*, it was determined to be a FreeBSD issue. Sorry I don't remember the ticket # right now.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I rifled through the tickets and probably missed it but I didn't see any dealing with the drive not being detected as removed/failed. There are some on hotswap issues. Someone else may need to take a look and maybe a new ticket needs to be introduced.

Since the failure was reported by FreeNAS when data access was attempted I'm not certain that aspect is an issue. Guess it would depend on if or how often drive status is checked.

Something that I think I understand (correct me if I'm wrong) you cannot install a disk with data on it, it doesn't work, it must be a clean drive. I know that use to be an issue, not sure it still is.

Replacing a failed drive via the GUI should be a priority and the user should be able to selectively remove and replace a drive.
 

Durkatlon

Patron
Joined
Aug 19, 2011
Messages
414
Since the failure was reported by FreeNAS when data access was attempted I'm not certain that aspect is an issue. Guess it would depend on if or how often drive status is checked.

Where do you get that. The OP reported:

Code:
removed first hdd: no problems accessing shares.
removed second hdd: no able to access shares anymore. 


So you remove one drive, no problem accessing shares, no errors reported. That seems like a real big problem to me.

It makes sense that you cannot access shares anymore after removing the second drive. In fact this is logically impossible in a RAIDZ. But from what I understand still no warning, just a busted share. And zpool status still reports everything is fine.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Where do you get that.

Don't pay attention to me, been a long day and you are absolutely correct. I'm gonna bow out for the evening and maybe tomorrow my head will be screwed on right.
 
Status
Not open for further replies.
Top