ZFS pool RW import causes reboot

Cyderic · Oct 30, 2016

Hi together,

I know there are already many different threads regarding reboot on pool imports but my situation is a bit different.

First of all my system:
Build FreeNAS-9.10.1-U2 (f045a8b)
Platform AMD Athlon(tm) II X3 450 Processor
Memory 8124MB (non-ECC)
HDD: 8x 4TB RED WD40EFRX

Short story:
I'm not able to import my zfs pool anymore since I started a scrub that caused a system reboot (please see pre story below why I try to import).
Everytime I try to import the pool. hundreds of messages are scrolling through the screen - partly overlapping each other (it looks very weird) - and finally the screen gets blank and the system is rebooting.
The systems comes back again but without the pool.
I also tried to upload my last config backup to a fresh installed freenas. After the config-import reboot, the system hangs in a "kdb panic state"

What I can do is to mount my pool in read only state. Everything is fine then.

zpool import -f -o readonly=on pool01

What can I do?

Here are some more infos:

Code:

[root@freenas ~]# zpool import												
   pool: pool01																
	 id: 12316858264337946													
  state: ONLINE																
status: The pool was last accessed by another system.						
action: The pool can be imported using its name or numeric identifier and	
		the '-f' flag.														
   see: http://illumos.org/msg/ZFS-8000-EY									
config:																		
																				
		pool01										  ONLINE				
		  raidz1-0									  ONLINE				
			gptid/a4340e86-fe3e-11e4-86b6-3085a93c9a92  ONLINE				
			gptid/50c291bd-095f-11e5-bd5a-3085a93c9a92  ONLINE				
			gptid/36980e9b-279c-11e5-96b6-3085a93c9a92  ONLINE				
			gptid/e4d99054-04a3-11e6-89a9-3085a93c9a92  ONLINE				
		  raidz1-1									  ONLINE				
			gptid/985ad909-9e3c-11e6-8250-3085a93c9a92  ONLINE				
			gptid/f75484c6-9dab-11e6-a69a-3085a93c9a92  ONLINE				
			gptid/cde8b704-9d40-11e6-819e-3085a93c9a92  ONLINE				
			gptid/8fac6843-9c65-11e6-97c1-3085a93c9a92  ONLINE				
[root@freenas ~]#

Code:

[root@freenas ~]# zpool import -f -o readonly=on pool01 mnt													
[root@freenas ~]# zpool status mnt																								
  pool: mnt																														
state: ONLINE																													
  scan: scrub in progress since Sun Oct 30 03:04:09 2016																			
		9.04G scanned out of 13.6T at 1/s, (scan is slow, no estimated time)														
		0 repaired, 0.06% done																									
config:																															
																																	
		NAME											STATE	 READ WRITE CKSUM												
		mnt											 ONLINE	   0	 0	 0												
		  raidz1-0									  ONLINE	   0	 0	 0												
			gptid/a4340e86-fe3e-11e4-86b6-3085a93c9a92  ONLINE	   0	 0	 0												
			gptid/50c291bd-095f-11e5-bd5a-3085a93c9a92  ONLINE	   0	 0	 0												
			gptid/36980e9b-279c-11e5-96b6-3085a93c9a92  ONLINE	   0	 0	 0												
			gptid/e4d99054-04a3-11e6-89a9-3085a93c9a92  ONLINE	   0	 0	 0												
		  raidz1-1									  ONLINE	   0	 0	 0												
			gptid/985ad909-9e3c-11e6-8250-3085a93c9a92  ONLINE	   0	 0	 0												
			gptid/f75484c6-9dab-11e6-a69a-3085a93c9a92  ONLINE	   0	 0	 0												
			gptid/cde8b704-9d40-11e6-819e-3085a93c9a92  ONLINE	   0	 0	 0												
			gptid/8fac6843-9c65-11e6-97c1-3085a93c9a92  ONLINE	   0	 0	 0												
																																	
errors: No known data errors																										
[root@freenas ~]#

I can list my files beneath "mount" in this state so I would say my pool is not corrupted / destroyed or whatever.

Pre Story so far:
This system runs smoothly since years with 8 x 2TB green HDDs. I also had some drive failures in the past but zfs could handle it after replacing the failed disks.

Across the last weeks I started to expand this pool in a way that I replaced one drive after another with a 4TB RED one.

Today I replaced the last one succesfully and the resilver process also finished without errors.

After that I could expand the pool and everything was fine. Access was possible.

And then...I made something bad obviously: Still with Freenas 9.3 I started the scrub process. After some minutes I realised that the system is rebooting. However it didn't come back because it hanged everytime in a "kdb panic" error.

So I thought to try it with the latest freenas system freshly installed. With the fresh installed system I tried to import the volume but (see short story above)

Robert Trevellyan · Oct 30, 2016

Add more RAM, if possible.
Did you burn in the new disks?
Are you sure your PSU can handle the increased load?
Post the output of smartctl -a for each disk (in CODE tags).

Ericloewe · Oct 31, 2016

It looks a lot like the pool is hosed.

rs225 · Oct 31, 2016

Run a memory test on the hardware, just to rule that out.

After that, you transfer all your data out of that pool and re-create it from scratch.

Cyderic · Nov 2, 2016

Robert Trevellyan said:
Add more RAM, if possible.

Did you burn in the new disks?

Are you sure your PSU can handle the increased load?

Post the output of smartctl -a for each disk (in CODE tags).

Thanks for this reply. Really aprecciate that one.

Today I swapped the RAM against 3 x 8GB ECC RAM. Nothing changed

No I didn't burn in the disks.

Yes the PSU can handle the load.

smart doesn't show any errors.

Guys, I don't understand this. I can mount it RO but not writeable anymore. This is weird.
Sorry but ZFS is designed to be reliable and robust and shouldn't (at least imho) crash because of a random reboot.

Why should I even use ZFS when such a dramatical thing can happen? Don't get me wrong... I know that nothing can "replace" a backup. This is totally clear.

However, in this situation nothing really bad happened like electric issues, random drive failures or something like that. It was just a "random restart". May caused by a RAM issue but I really can't believe that this can destroy the whole pool. One of the good things of ZFS is the fast "rebuild" time against classical RAID solutions. But now I think this is a joke. In my case I would have to transfer 10TB out of a backup system. This will take much longer than a rebuild - even in enterprise environments

It is just sooooo disappointing to me. I don't know why I should continue to recommend ZFS to my business customers anymore.

Ericloewe · Nov 2, 2016

Cyderic said:
Sorry but ZFS is designed to be reliable and robust and shouldn't (at least imho) crash because of a random reboot.

Your timeline is wrong. What probably happen is that serious corruption happened. When ZFS comes across it, it panics the kernel because there's nothing safe it can do.

Cyderic said:
It was just a "random restart".

Again, probably due to a kernel panic.

Cyderic said:
May caused by a RAM issue but I really can't believe that this can destroy the whole pool.

You'd better believe it.

Cyderic said:
One of the good things of ZFS is the fast "rebuild" time against classical RAID solutions. But now I think this is a joke.

None of that has anything to do with the current problem.

Cyderic said:
In my case I would have to transfer 10TB out of a backup system. This will take much longer than a rebuild - even in enterprise environments

If it bothers you, use ECC RAM on recommended hardware and don't use RAIDZ1.

Cyderic said:
It is just sooooo disappointing to me. I don't know why I should continue to recommend ZFS to my business customers anymore.

Again, you did not follow the recommendations, so it's rather unfair to blame ZFS. If you ruin your car's engine by running it with some slop made out of whatever used oils you could find, are you going to go around telling people your car sucks, despite not having used proper fuels?

Cyderic · Nov 2, 2016

Thanks for your response Ericloewe

Ericloewe said:
Your timeline is wrong. What probably happen is that serious corruption happened. When ZFS comes across it, it panics the kernel because there's nothing safe it can do.

Well I thought ZFS is designed to detect that and fix it. For what do I run the scrub every second week?
It just feels so weak to me atm, sorry.

Ericloewe said:
Your timeline is wrong. What probably happen is that serious corruption happened. When ZFS comes across it, it panics the kernel because there's nothing safe it can do.
If it bothers you, use ECC RAM on recommended hardware and don't use RAIDZ1.

Ok, what would you recommend on a 8x 4TB system?

Since I don't have any options left I will start from scratch again.

rs225 · Nov 2, 2016

I agree; it doesn't seem that this should happen.

However, if you assume corruption in the spacemap in RAM, which then gets to disk as 'valid' metadata, it does make sense that very bad things will happen. My understanding is that readonly mode doesn't access the spacemaps, because they are only needed to find free space for writing.

Some explanations from ZFS developers would be very useful to understand what might be happening and if there are any ways to reduce it or detect it earlier. For instance, an invalid spacemap could just be considered permanently 100% full and the pool continues operating normally.

Ericloewe · Nov 2, 2016

Cyderic said:
Well I thought ZFS is designed to detect that and fix it.

Yes, but ZFS is not magic.

rs225 said:
However, if you assume corruption in the spacemap in RAM, which then gets to disk as 'valid' metadata, it does make sense that very bad things will happen. My understanding is that readonly mode doesn't access the spacemaps, because they are only needed to find free space for writing.

Interesting theory that explains what might have happened.

rs225 said:
For instance, an invalid spacemap could just be considered permanently 100% full and the pool continues operating normally.

"Operating normally" would be "read only", as literally no changes could be made. Not too dissimilar to the current situation. Might actually be exactly what's happening - vdev write fails, kernel panics.

rs225 said:
Some explanations from ZFS developers would be very useful to understand what might be happening

Yeah, it would be fascinating.

Cyderic said:
Ok, what would you recommend on a 8x 4TB system?

RAIDZ2, it's much safer.

Robert Trevellyan · Nov 2, 2016

ZFS is very, very good, but it isn't magic. We don't know what happened to your pool, whether it was a hardware problem, or a bug in ZFS, or some other layer of the FreeNAS software, we can only speculate. However, we do know that you can still access your data, despite whatever disaster happened, by importing the pool read-only. That's a pretty impressive outcome in my book.

Cyderic said:
smart doesn't show any errors

I'm still curious to see those smartctl outputs if you feel like posting them.

EDIT: we also know that, at some point, you were running a 32TB pool on 8GB of non-ECC RAM.

depasseg · Nov 2, 2016

Cyderic said:
Guys, I don't understand this. I can mount it RO but not writeable anymore. This is weird.
Sorry but ZFS is designed to be reliable and robust and shouldn't (at least imho) crash because of a random reboot.

If you can mount RO, but not RW (which causes a crash), then my guess is you have corrupt pool metadata. Maybe it's from non error correcting RAM, maybe it's from a pool version upgrade during the middle of a scrub. I don't know what caused it, but that's what it smells like to me.

rs225 · Nov 3, 2016

Ericloewe said:
"Operating normally" would be "read only", as literally no changes could be made. Not too dissimilar to the current situation. Might actually be exactly what's happening - vdev write fails, kernel panics.

I think it would be read-write. A pool is usually broken up into 200 metaslabs. Each one has a separate spacemap. So, if the current spacemap is bad, just stop using that spacemap for writes. 99.5% of the spacemaps would still be available.

This presumes that a 'bad' spacemap can even be detected.

Ericloewe · Nov 3, 2016

rs225 said:
This presumes that a 'bad' spacemap can even be detected.

Well, finding a block where there should be none should handle that.

rs225 said:
I think it would be read-write. A pool is usually broken up into 200 metaslabs. Each one has a separate spacemap. So, if the current spacemap is bad, just stop using that spacemap for writes. 99.5% of the spacemaps would still be available.

Possibly, but I'd probably stop trusting all spacemaps, since something is very wrong.

rs225 · Nov 4, 2016

Ericloewe said:
Well, finding a block where there should be none should handle that.

That is actually harder to determine than you would think. Spacemap is the only record, and the only way to know if you have a valid block (vs an abandoned block) is to walk the entire metadata tree looking for a reference. Think about those multi-hour de-dup pool imports for an idea of what that means.

Ericloewe said:
Possibly, but I'd probably stop trusting all spacemaps, since something is very wrong.

Maybe; that is where some ZFS developer analysis would be nice. If it is survivable, and warnable after the first, it might improve reliability. It is a fair assumption that for every person who manages to recover their pool, there are a couple more who have no idea what has happened.

Important Announcement for the TrueNAS Community.

ZFS pool RW import causes reboot

Cyderic

Cadet

Robert Trevellyan

Pony Wrangler

Ericloewe

Server Wrangler

rs225

Guru

Cyderic

Cadet

Ericloewe

Server Wrangler

Cyderic

Cadet

rs225

Guru

Ericloewe

Server Wrangler

Robert Trevellyan

Pony Wrangler

depasseg

FreeNAS Replicant

rs225

Guru

Ericloewe

Server Wrangler

rs225

Guru

Similar threads

Important Announcement for the TrueNAS Community.

ZFS pool RW import causes reboot

Cadet

Pony Wrangler

Server Wrangler

Guru

Cadet

Server Wrangler

Cadet

Guru

Server Wrangler

Pony Wrangler

FreeNAS Replicant

Guru

Server Wrangler

Guru

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "ZFS pool RW import causes reboot"

Similar threads