Permanent errors have been detected...

mloiterman · Aug 1, 2016

I've had an issue with power to my FreeNAS server this morning and it ended up shutting down because the battery to my battery backup ran out.

When I rebooted the server, I was welcomed to this error:

The volume tank (ZFS) state is ONLINE: One or more devices has experienced an error resulting in data corruption. Applications may be affected.

Initially:
zpool status -xv
errors: Permanent errors have been detected in the following files:

var/db/system/syslog-afc0e33369874e5c93e97ec2a8fb0019/log/debug.log

I foolishly thought, no problem, I don't need that .log file. I'll just create a blank file and continue logging. So I wiped out that file and created a new file with the same permissions and ownership. System seems ok and is now using the new log file.

However...

Now:
zpool status -xv
errors: Permanent errors have been detected in the following files:

tank/.system/syslog-afc0e33369874e5c93e97ec2a8fb0019:<0x97>

My storage and boot both show status as being healthy and all my drives show as being online.

Why the differences in path?

Before I screw this up further, can I fix this or is my entire volume screwed up?

DrKK · Aug 1, 2016

I wouldn't worry about that file.

What I'd worry about is this:

1) Why didn't your FreeNAS shut down when the UPS got low? Are you not using an appropriate UPS with appropriate communication ability to the FreeNAS?

2) Why did you have a corrupted file? What is your pool lay out?

3) Also, as per the forum rules, we should be seeing a full list of your components, and your storage volume layout, for a question like this, sir :)

mloiterman · Aug 2, 2016

DrKK said:
I wouldn't worry about that file.

What I'd worry about is this:

1) Why didn't your FreeNAS shut down when the UPS got low? Are you not using an appropriate UPS with appropriate communication ability to the FreeNAS?

2) Why did you have a corrupted file? What is your pool lay out?

3) Also, as per the forum rules, we should be seeing a full list of your components, and your storage volume layout, for a question like this, sir :)

1. UPS is not yet connected via USB. I recently moved and server is physically setup in a temporary environment.
2. I don't know and would like to understand root cause. Pool is a raidz2-0:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/80a32995-40bb-11e6-bbbd-6805ca1d22f6 ONLINE 0 0 0
gptid/c0989d2e-8508-11e3-b2bc-6805ca1d22f6 ONLINE 0 0 0
gptid/c0e1eedc-8508-11e3-b2bc-6805ca1d22f6 ONLINE 0 0 0
gptid/c14cc407-8508-11e3-b2bc-6805ca1d22f6 ONLINE 0 0 0
gptid/c1981a83-8508-11e3-b2bc-6805ca1d22f6 ONLINE 0 0 0
gptid/c1e02cfe-8508-11e3-b2bc-6805ca1d22f6 ONLINE 0 0 0
3. Configuration is as follows:
a. Dell Precision T7400
b. Xeon E5420 @ 2.50GHz
c. 22496MB RAM
d. 6 x (2.0TB) WD2002FAEX-007B via 3ware 9000 series RAID Controller in JBOD configuration (required for additional SATA ports)
e. RAIDZ2-0

As a follow up, I did a scrub and now:

pool status -xv
all pools are healthy

So I guess deleting that file and performing the scrub cleared it up.

I would like to know why this happened, however.

DrKK · Aug 2, 2016

Best guess? Some kind of pernicious interaction with the RAID controller brought on by the power outage; perhaps some caching mechanism tricked ZFS into thinking the data was written to disk, and that the power went out? That's all I got. Total speculation. Maybe someone can bring actual science here.

DrKK · Aug 3, 2016

You're going to want to get that UPS set up with proper USB communications ASAP I think. Sudden power losses in the middle of writes have been known to thrash pools. You may have gotten lucky.

philhu · Aug 3, 2016

I think he had a crossconnected file, sharing a block . So deleting one, caused the other one to show since it too then changed.

Always do a scrub when presented with that message.

HoneyBadger · Aug 3, 2016

mloiterman said:
d. 6 x (2.0TB) WD2002FAEX-007B via 3ware 9000 series RAID Controller in JBOD configuration (required for additional SATA ports)

Just confirming here - you have the 3Ware configured with Export JBOD disks = Yes in its BIOS, correct?

mloiterman · Aug 3, 2016

HoneyBadger said:
Just confirming here - you have the 3Ware configured with Export JBOD disks = Yes in its BIOS, correct?

I'm not sure that's EXACTLY what it says, but that's the setting. Each disk is treated as an individual. The 3Ware card is effectively disabled and only serves to facilitate the physical connection of the drives.

You're going to want to get that UPS set up with proper USB communications ASAP I think. Sudden power losses in the middle of writes have been known to thrash pools. You may have gotten lucky.

Yep. I have two APC 1500 rack mount units ready to go just need to install the shelves and clean up a few things.

Stux · Aug 3, 2016

I'd suspect/blame the raid card losing a sync write during the crash.

Would consider replacing with a true HBA

DrKK · Aug 3, 2016

Stux said:
I'd suspect/blame the raid card losing a sync write during the crash.

Would consider replacing with a true HBA

That's where I'm at.

mloiterman · Aug 5, 2016

Stux said:
I'd suspect/blame the raid card losing a sync write during the crash.

Would consider replacing with a true HBA

What is an HBA?

I'm likely to replace the entire machine with a SuperMicro based X11 shortly.

rs225 · Aug 6, 2016

If you can see drive lights for these drives, do they come on in a sequence during writes? I've seen that 3ware controllers have a strange behavior which causes them to serialize writes across all drives when one drive does a flush(or maybe when all drives do a flush at the same time). It could be the cause of this corruption if the power failure happened in the middle.

Setting the storsave policy to performance seems to stop the serialization, but it could be doing something just as risky if a power failure happened.

Important Announcement for the TrueNAS Community.

Permanent errors have been detected...

mloiterman

Dabbler

DrKK

FreeNAS Generalissimo

mloiterman

Dabbler

DrKK

FreeNAS Generalissimo

DrKK

FreeNAS Generalissimo

philhu

Patron

HoneyBadger

actually does care

mloiterman

Dabbler

Stux

MVP

DrKK

FreeNAS Generalissimo

mloiterman

Dabbler

rs225

Guru

Similar threads