Permanent errors have been detected...

Status
Not open for further replies.

mloiterman

Dabbler
Joined
Jan 30, 2013
Messages
45
I've had an issue with power to my FreeNAS server this morning and it ended up shutting down because the battery to my battery backup ran out.

When I rebooted the server, I was welcomed to this error:

The volume tank (ZFS) state is ONLINE: One or more devices has experienced an error resulting in data corruption. Applications may be affected.

Initially:
zpool status -xv
errors: Permanent errors have been detected in the following files:


var/db/system/syslog-afc0e33369874e5c93e97ec2a8fb0019/log/debug.log


I foolishly thought, no problem, I don't need that .log file. I'll just create a blank file and continue logging. So I wiped out that file and created a new file with the same permissions and ownership. System seems ok and is now using the new log file.

However...

Now:
zpool status -xv
errors: Permanent errors have been detected in the following files:


tank/.system/syslog-afc0e33369874e5c93e97ec2a8fb0019:<0x97>

My storage and boot both show status as being healthy and all my drives show as being online.

Why the differences in path?

Before I screw this up further, can I fix this or is my entire volume screwed up?
 
Last edited:

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
I wouldn't worry about that file.

What I'd worry about is this:

1) Why didn't your FreeNAS shut down when the UPS got low? Are you not using an appropriate UPS with appropriate communication ability to the FreeNAS?

2) Why did you have a corrupted file? What is your pool lay out?

3) Also, as per the forum rules, we should be seeing a full list of your components, and your storage volume layout, for a question like this, sir :)
 

mloiterman

Dabbler
Joined
Jan 30, 2013
Messages
45
I wouldn't worry about that file.

What I'd worry about is this:

1) Why didn't your FreeNAS shut down when the UPS got low? Are you not using an appropriate UPS with appropriate communication ability to the FreeNAS?

2) Why did you have a corrupted file? What is your pool lay out?

3) Also, as per the forum rules, we should be seeing a full list of your components, and your storage volume layout, for a question like this, sir :)


1. UPS is not yet connected via USB. I recently moved and server is physically setup in a temporary environment.
2. I don't know and would like to understand root cause. Pool is a raidz2-0:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/80a32995-40bb-11e6-bbbd-6805ca1d22f6 ONLINE 0 0 0
gptid/c0989d2e-8508-11e3-b2bc-6805ca1d22f6 ONLINE 0 0 0
gptid/c0e1eedc-8508-11e3-b2bc-6805ca1d22f6 ONLINE 0 0 0
gptid/c14cc407-8508-11e3-b2bc-6805ca1d22f6 ONLINE 0 0 0
gptid/c1981a83-8508-11e3-b2bc-6805ca1d22f6 ONLINE 0 0 0
gptid/c1e02cfe-8508-11e3-b2bc-6805ca1d22f6 ONLINE 0 0 0

3. Configuration is as follows:
a. Dell Precision T7400
b. Xeon E5420 @ 2.50GHz
c. 22496MB RAM
d. 6 x (2.0TB) WD2002FAEX-007B via 3ware 9000 series RAID Controller in JBOD configuration (required for additional SATA ports)
e. RAIDZ2-0


As a follow up, I did a scrub and now:

pool status -xv
all pools are healthy


So I guess deleting that file and performing the scrub cleared it up.

I would like to know why this happened, however.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
Best guess? Some kind of pernicious interaction with the RAID controller brought on by the power outage; perhaps some caching mechanism tricked ZFS into thinking the data was written to disk, and that the power went out? That's all I got. Total speculation. Maybe someone can bring actual science here.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
You're going to want to get that UPS set up with proper USB communications ASAP I think. Sudden power losses in the middle of writes have been known to thrash pools. You may have gotten lucky.
 

philhu

Patron
Joined
May 17, 2016
Messages
258
I think he had a crossconnected file, sharing a block . So deleting one, caused the other one to show since it too then changed.

Always do a scrub when presented with that message.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
d. 6 x (2.0TB) WD2002FAEX-007B via 3ware 9000 series RAID Controller in JBOD configuration (required for additional SATA ports)

Just confirming here - you have the 3Ware configured with Export JBOD disks = Yes in its BIOS, correct?
 

mloiterman

Dabbler
Joined
Jan 30, 2013
Messages
45
Just confirming here - you have the 3Ware configured with Export JBOD disks = Yes in its BIOS, correct?

I'm not sure that's EXACTLY what it says, but that's the setting. Each disk is treated as an individual. The 3Ware card is effectively disabled and only serves to facilitate the physical connection of the drives.

You're going to want to get that UPS set up with proper USB communications ASAP I think. Sudden power losses in the middle of writes have been known to thrash pools. You may have gotten lucky.

Yep. I have two APC 1500 rack mount units ready to go just need to install the shelves and clean up a few things.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
I'd suspect/blame the raid card losing a sync write during the crash.

Would consider replacing with a true HBA
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
I'd suspect/blame the raid card losing a sync write during the crash.

Would consider replacing with a true HBA
That's where I'm at.
 

mloiterman

Dabbler
Joined
Jan 30, 2013
Messages
45
I'd suspect/blame the raid card losing a sync write during the crash.

Would consider replacing with a true HBA

What is an HBA?

I'm likely to replace the entire machine with a SuperMicro based X11 shortly.
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
If you can see drive lights for these drives, do they come on in a sequence during writes? I've seen that 3ware controllers have a strange behavior which causes them to serialize writes across all drives when one drive does a flush(or maybe when all drives do a flush at the same time). It could be the cause of this corruption if the power failure happened in the middle.

Setting the storsave policy to performance seems to stop the serialization, but it could be doing something just as risky if a power failure happened.
 
Status
Not open for further replies.
Top