Disk failure, data complete but situation ugly!

Goose · Sep 15, 2017

I had a disk fail on me the other that I should have known about but didn't, long story... I decided at the time to replace all the disks (I had 4 * 2TB) with their bigger brothers (4 * 4TB) and started with the failed disk by adding one of the new 4TB disks and using the UI to replace the failed disk. The pool resilvered all looked to be well. I scrubbed the pool and was told that the pool had way too many checksum errors. Interesting as the resilvering completed without error but there you go.

I moved on to the next disk, offlined it, replaced it and then the next doing the same. So I have 3 new disks and the last one still needing to be done. The date for my nightly srcub came up before I got to the 4th disk and now I have this.

Code:

NAME													   STATE		   READ	 WRITE	 CKSUM
Data													   DEGRADED	   0		 0		 1.69K
	   replacing-0										   DEGRADED	   0		 0		 3.38K
		 gptid/0b63a81e-10f6-11e4-b2a7-6cf049e04edf		 DEGRADED	   0		 0		 3.38K  too many errors
		 12009836222180542762							   UNAVAIL		 0		 0		 0  was /dev/gptid/f621acb1-95ff-11e7-b8b5-6cf049e04edf
		 gptid/d90bd830-963c-11e7-b8b5-6cf049e04edf		 ONLINE		 0		 0		 3.38K
	   mirror-1											 DEGRADED	   0		 0		 3.38K
		 replacing-0										 DEGRADED	   3.38K	 0		 0
		   10203767184787037519							 UNAVAIL		 0		 0		 0  was /dev/gptid/0c23ad2d-10f6-11e4-b2a7-6cf049e04edf
		   gptid/dfc4a8c5-9190-11e7-95b3-6cf049e04edf	   ONLINE		 0		 0		 3.38K
		 gptid/28028392-833c-11e7-b186-6cf049e04edf		 ONLINE		 0		 0		 3.38K

errors: Permanent errors have been detected in the following files:

  Data:<0x0>

So, yes, the pool has issues and disks that were removed and replaced are now in the pool list and the permanent errors files list is empty, what does that mean?

My question is how to deal with this. I can rebuild it completely, I could try and force detach disks and hope that it works itself out etc. I have copied the whole pool off to the 4th disk I didn't get to replace so I have all options open. If I force detach disks and it all looks to be ok, could anything residual come back and bite me later? I'm on 9.10.1-U4 as I have been running VBox jails and cannot move to 11 until I change my CPU to support UG.

Input gratefully received!

Cheers,

Goose

Stux · Sep 15, 2017

Are you using ECC ram?

the Data:<0x0> bit means you have pool corruption.

Goose · Sep 15, 2017

Sadly not, I repurposed an old desktop and never got round to putting the hardware in that perhaps I should have! That said it is a home server and really only a cache as all data is backed up to cold storage on AWS... So I'm dumping the pool and recreating it whatever the hardware desicion I make?

Stux · Sep 15, 2017

Goose said:
Sadly not, I repurposed an old desktop and never got round to putting the hardware in that perhaps I should have! That said it is a home server and really only a cache as all data is backed up to cold storage on AWS... So I'm dumping the pool and recreating it whatever the hardware desicion I make?

The lack of ECC is probably where your pool corruption came from, ie a memory error. Its strange to have the same number of checksum errors on all disks.

You should try restarting, and doing a scrub.

Goose · Sep 15, 2017

Thanks, I'll have a play and see what I end up with!

Goose · Oct 7, 2017

So, data off, two files that I couldn't get back. One was a a VM HD which I needed so I used dd to block copy it and pad the bits it couldn't read. I'm kind of surprised but it worked and I recovered the OS on the VHD internally. The second file really was a bit odd, if I tried to delete it the server would reboot! I'm not even sure how that is possible but I could do it every time...

Anyway, on to the reason for my post. I had to restore part of the file set back to the system as soon as possible (my wife's business) so I put three of the four new 4TB WD Red drives in the chassis and created a raidz1 volume just to get some storage up and accessible. I'm currently running an extended memory test on my new motherboard, CPU and ECC memory that I plan to fit tomorrow. I have one 4TB disk with all the data on it and ideally I don't want to stay with raidz1. Is there any way I can get the data off the 4TB disk and add the disk to the raidz1 volume and then convert it mirrored? Using only the four 4TB disks, the FreeNAS system, and my workstation!

styno · Oct 8, 2017

No, but if 4TB initial space is enough you should be able start on the new system with a single disk, copy data over, create a mirror from the single disk and then extend the pool with another mirror. But be very careful with any data that has been put on the old system, as long as you don't know what caused the corruption the data integrity can't be trusted.

Goose · Oct 8, 2017

Ok, that sounds workable but I'm a little confused by the order I should aproach this. I deleted the pool and added a single disk as a new volume. I then added a single disk as a mirror and now I have a mirrored pair of the capacity of the usable data on the 4TB disk which reports as being 3.6TiB. If I look to add the third disk I only have the options of Stripe, Log, Cache and Spare. Do I pick Spare or wait until I can get the 4th disk into the chassis and then add them both to the pool?

danb35 · Oct 8, 2017

Goose said:
wait until I can get the 4th disk into the chassis and then add them both to the pool?

This.

Goose · Oct 8, 2017

Great, thanks!

Goose · Oct 12, 2017

Hmm... Ok, what am I missing here? It seems that the UI doesn't support extending the component parts of the mirror?

Looks like I'm going to build this in a VM and prcatice so I don't trash my volume doing it wrong manually!

danb35 · Oct 12, 2017

Goose said:
It seems that the UI doesn't support extending the component parts of the mirror?

What do you mean? The screen shot you posted shows you expanding the pool data with a mirror of two 4 TB disks. You'll end up with two mirrors striped.

Goose · Oct 12, 2017

Ok, I read it as I add a further two 4TB drives and my whole pool is still only 3.64TiB! Is the capacity mentioned only what I'm adding then?

danb35 · Oct 12, 2017

Goose said:
Is the capacity mentioned only what I'm adding then?

Yes.

Goose · Oct 12, 2017

Once again, thank you! ;-)

Important Announcement for the TrueNAS Community.

Disk failure, data complete but situation ugly!

Goose

Dabbler

Stux

MVP

Goose

Dabbler

Stux

MVP

Goose

Dabbler

Goose

Dabbler

styno

Patron

Goose

Dabbler

danb35

Hall of Famer

Goose

Dabbler

Goose

Dabbler

danb35

Hall of Famer

Goose

Dabbler

danb35

Hall of Famer

Goose

Dabbler

Similar threads

Important Announcement for the TrueNAS Community.

Disk failure, data complete but situation ugly!

Dabbler

MVP

Dabbler

MVP

Dabbler

Dabbler

Patron

Dabbler

Hall of Famer

Dabbler

Dabbler

Hall of Famer

Dabbler

Hall of Famer

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Disk failure, data complete but situation ugly!"

Similar threads