ZFS Pool with spare stop working after a disk crash

vikozo6 · Oct 1, 2019

Hello
i have FreeNAS with 6 x 3TB Disk.
As i know the 6 Disk should be as

4 x this should be the 10TB Volume i have ZFS Pool
1 x spare Disk
1 x Hotspare

2 Days a go a Disk started to do funny noise and dayed.
and some VM which used the Disk on the Volume dayed also.

So the question what is wrong configured the system did not use the spare disk?

Is it possible to see in the GUI which disk is spare?

have a nice day
vinc

FreeNAS-11.2-U6
(Build Date: Sep 17, 2019 0:16)

PhiloEpisteme · Oct 1, 2019

Hi @vikozo6, could you please provide a bit more information? To start, what is the output of zpool status? I suspect that will help make the situation more clear.

vikozo6 · Oct 1, 2019

to tell, in the GUI i removed the defective Disk, and back it works after booting the VM.
The VM i tell are from another Server (Proxmox) which use the Disk of the Frrenas
====================

Code:

# zpool status
  pool: NAS-04vol
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
  scan: scrub repaired 0 in 0 days 08:35:09 with 0 errors on Sun Aug 25 08:35:10 2019
config:

        NAME                                            STATE     READ WRITE CKSUM
        NAS-04vol                                       DEGRADED     0     0     0
          raidz2-0                                      DEGRADED     0     0     0
            gptid/732dc3fb-dc5b-11e7-9959-0cc47a86a2f0  ONLINE       0     0     0
            gptid/748fa88e-dc5b-11e7-9959-0cc47a86a2f0  ONLINE       0     0     0
            16427590367862335297                        OFFLINE      0     0     0  was /dev/gptid/7551e57b-dc5b-11e7-9959-0cc47a86a2f0
            gptid/763d68bb-dc5b-11e7-9959-0cc47a86a2f0  ONLINE       0     0     0
            gptid/779aa00e-dc5b-11e7-9959-0cc47a86a2f0  ONLINE       0     0     0
            gptid/787d356e-dc5b-11e7-9959-0cc47a86a2f0  ONLINE       0     0     0

errors: No known data errors

  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:00:40 with 0 errors on Tue Sep 24 03:45:40 2019
config:

        NAME        STATE     READ WRITE CKSUM
        freenas-boot  ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            ada0p2  ONLINE       0     0     0
            ada1p2  ONLINE       0     0     0

errors: No known data errors

PhiloEpisteme · Oct 1, 2019

Would you mind editing the above post to wrap the code in code tags similar to the following

[CODE]
some code here
some more code here
[/CODE]

You can copy-paste again to preserve formatting as well.

vikozo6 said:
NAME STATE READ WRITE CKSUM
NAS-04vol DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
gptid/732dc3fb-dc5b-11e7-9959-0cc47a86a2f0 ONLINE 0 0 0
gptid/748fa88e-dc5b-11e7-9959-0cc47a86a2f0 ONLINE 0 0 0
16427590367862335297 OFFLINE 0 0 0 was /dev/gptid/7551e57b-dc5b-11e7-9959-0cc47a86a2f0
gptid/763d68bb-dc5b-11e7-9959-0cc47a86a2f0 ONLINE 0 0 0
gptid/779aa00e-dc5b-11e7-9959-0cc47a86a2f0 ONLINE 0 0 0
gptid/787d356e-dc5b-11e7-9959-0cc47a86a2f0 ONLINE 0 0 0

Based on this it looks like you don't have any spares configured for this pool, this would be why no spare automatically replaced the failed drive. In fact, I think you may have a pool of 6 drives? If you edit the above post with proper formatting it would help make it more clear to me. Here is an example of my output using the code tags I mentioned above

Code:

    NAME                                            STATE     READ WRITE CKSUM
    vault                                           ONLINE       0     0     0
      raidz2-0                                      ONLINE       0     0     0
        gptid/5f0e7777-2f8d-11e9-89f4-ac1f66666b2c  ONLINE       0     0     0
        gptid/634d7777-2f8d-11e9-89f4-ac1f66666b2c  ONLINE       0     0     0
        gptid/677b7777-2f8d-11e9-89f4-ac1f66666b2c  ONLINE       0     0     0
        gptid/754d7777-36b8-11e9-8b81-ac1f66666b2c  ONLINE       0     0     0
        gptid/702f7777-2f8d-11e9-89f4-ac1ff666662c  ONLINE       0     0     0
        gptid/74c67777-2f8d-11e9-89f4-ac1f66666b2c  ONLINE       0     0     0

vikozo6 · Oct 1, 2019

so should be changed with code!

it works like this now, but with one disk away

PhiloEpisteme · Oct 1, 2019

Thanks @vikozo6. It looks to me like you do not have any spares, hot or cold. Did you intend to set up spares but perhaps accidentally did it incorrectly? What is the output of zpool history NAS-04vol | grep "zpool create" and zpool history NAS-04vol | grep spare?

Here is the User Guide for the most recent version discussing how to set up a spare disk.

If I understand your situation correctly, you have a few good options.

1. You can purchase a new 3TB disk and then follow the disk replacement procedure from the User Guide.
2. You can create your pool following the User Guide for how to set up a pool to create a spare and restore your data from backup.

I think you're actually not in a terrible situation. Spares are only useful if a drive completely dies. They don't kick in if a drive has a few reallocated sectors and you'll likely want to replace a drive when it starts having bad sectors. You're also using RAIDZ2 so losing a single disk isn't the end of the world so long as you replace it fairly quickly.

vikozo6 · Oct 1, 2019

:D

Code:

# zpool history NAS-04vol | grep "zpool create"
2017-12-08.22:05:09 zpool create -o cachefile=/data/zfs/zpool.cache -o failmode=continue -o autoexpand=on -O compression=lz4 -O aclmode=passthrough -O aclinherit=passthrough -f -m /NAS-04vol -o altroot=/mnt NAS-04vol raidz2 /dev/gptid/732dc3fb-dc5b-11e7-9959-0cc47a86a2f0 /dev/gptid/748fa88e-dc5b-11e7-9959-0cc47a86a2f0 /dev/gptid/7551e57b-dc5b-11e7-9959-0cc47a86a2f0 /dev/gptid/763d68bb-dc5b-11e7-9959-0cc47a86a2f0 /dev/gptid/779aa00e-dc5b-11e7-9959-0cc47a86a2f0 /dev/gptid/787d356e-dc5b-11e7-9959-0cc47a86a2f0

the other command it empty no information, but also the system works for a couple of years so far.
and yes Disk are orderd should come tomorow
and found this video https://www.youtube.com/watch?v=mb_1pKI398Y

PhiloEpisteme · Oct 1, 2019

Yeah, you never set up hot spares with that pool. You have a pool with 1 RAIDZ2 vdev with 6 disks. That is a totally find setup.

vikozo6 said:
and found this video https://www.youtube.com/watch?v=mb_1pKI398Y

This video seems to be basically correct, but I would recommend you use the User Guide for your version of FreeNAS as well because it mentions other situations which may come up. Use both, and where they disagree the User Guide should be followed instead. That video also didn't discuss burning in a new drive; you should be sure to stress test a new drive before using it.

vikozo6 · Oct 1, 2019

@PhiloEpisteme Thanks so far for your support.

hmm ok, i think this is the document i should follow
https://www.ixsystems.com/documentation/freenas/11.2-U6/storage.html#replacing-a-failed-disk

but Still this Raidz2 - i should not be loosing the Disk im my VM's?
after doing the Offline of the Disk the VM came back to live

PhiloEpisteme · Oct 1, 2019

vikozo6 said:
but Still this Raidz2 - i should not be loosing the Disk im my VM's?
after doing the Offline of the Disk the VM came back to live

I'm sorry, I don't fully understand what you mean. Can you provide more information? Exact error messages, screenshots, etc would be helpful.

vikozo6 · Oct 2, 2019

having a Raidz2 - means there are some "spare" disk in this system!
why a disk get more or less lost and the system is not working?

and how long it will take this "resilvering "?

PhiloEpisteme · Oct 2, 2019

vikozo6 said:
having a Raidz2 - means there are some "spare" disk in this system!

Ah, this is perhaps part of the confusion. RAIDZ2 does not mean there is a spare disk. A spare disk is one which is not in use at all and is ready to replace a disk which has failed. RAIDZ2 means that alongside the data you are storing, enough parity data is also stored so that you could lose 2 disks in the vdev and the vdev would still be functional. Data is still spread across all disks in the vdev. You may find this useful.

vikozo6 said:
why a disk get more or less lost and the system is not working?

I can't say for sure; I'm not 100% sure what you mean. The disk can become "lost" if it experiences a catastrophic failure or is unable to communicate with the host system. I can't say more about why you are experiencing issues with your VMs without knowing more about how they are set up, what VMs you're running, what storage they have access to etc.

vikozo6 said:
and how long it will take this "resilvering "?

Many hours, likely less than a day.

vikozo6 · Oct 2, 2019

found something
Resilvering NAS-04vol - 16%
This in 4 Hours

====
i got a NFS Share on Freenas

i got a VM on a Proxmox server (other Hardware)
the VM created on Proxmox has the Disk from this FreeNAS NFS Share which is attached to Proxmox Server

i thought this would be a nice setup and there is more freedisk Space in FreeNAS

PhiloEpisteme · Oct 4, 2019

I would expect that unless your pool goes down or your NFS share is turned off that your NFS share would still be available to your Proxmox VM.

Important Announcement for the TrueNAS Community.

ZFS Pool with spare stop working after a disk crash

vikozo6

Patron

PhiloEpisteme

Guru

vikozo6

Patron

PhiloEpisteme

Guru

vikozo6

Patron

PhiloEpisteme

Guru

vikozo6

Patron

PhiloEpisteme

Guru

vikozo6

Patron

PhiloEpisteme

Guru

vikozo6

Patron

PhiloEpisteme

Guru

vikozo6

Patron

PhiloEpisteme

Guru

Similar threads

Important Announcement for the TrueNAS Community.

ZFS Pool with spare stop working after a disk crash

Patron

Guru

Patron

Guru

Patron

Guru

Patron

Guru

Patron

Guru

Patron

Guru

Patron

Guru

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "ZFS Pool with spare stop working after a disk crash"

Similar threads