Unhealthy pool, no errors at all

Migsi · Jul 15, 2021

Hello Community,

today I reach out to you to ask for help regarding some strange behavior I recently found on one of my TrueNAS systems. It consists of consumer grade HW, more concrete an Athlon 3000G, 16GB RAM (non ECC), the mobo is an Asrock A320M-HDV R4.0 and its single pool consists of 2 500GB NVMe SSDs (one using the native slot on the board, the other one plugged into a generic pcie -> M.2 adapter card on the x16 (x4 with this CPU) slot) + 3 2TB Seagate IronWolf Red HDDs, the OS (TrueNAS-12.0-U4) is running on its own single 128GB SATA M.2 drive (more details can be found in the following screenshots) and finally a HP NC523SFP card on the x1 slot via a riser cable. The drives were bought new this february. All in all its nothing to special and its single purpose is to serve as a home NAS at my parents place and "backup" most of their data (which stays in sync with their PCs).

Long story short, two weeks ago I recieved a critical warning about data corruption in the pool. I checked the system immediately, looked at the pool status, checksums, smart data. I checked literally everything which could help me figuring out what was going on, but nothing. Besides that alert, the system ran fine (and is still running fine) except for the additional warning that popped up yesterday about a core file that was dumped. I still have no clue what happened and I'm unsure about what to do. The only possibility I see is that a recent unscheduled reboot broke something, but that happened already a week before, so I doubt that was the case. Should I wait for the upcoming scrub of the pool tomorrow and, if it goes fine, clear the error state of the pool, going ahead as if nothing happened or are there any other checks I could/should run?

If you need any more infomration to help my out, I'll kindly provide it asap.

Best regards

EDIT: I just ran a zpool status -v <pool> for the first time and that cleared things up a little. Appearently some files got corrupted, which might indeed be caused by unscheduled reboots. I'd appreciate any confirmation though.

Arwen · Jul 16, 2021

ZFS was specifically design from the very beginning to survive a sudden power loss, (or reboot), WITHOUT DATA LOSS.

That said, any data in flight could be lost, but it simply would not show up. And not be listed as an error. Further, while ZFS itself can survive, if the hardware lies to ZFS, (like a hardware RAID controller re-ordering writes and using it's write cache), then data loss can occur.

It would be helpful for you to, (per forum rules / suggestions);
- List the complete hardware, (for example, you have only one M.2 slot, so how is the other 2 wired up)
- Version of TrueNAS

QonoS · Jul 16, 2021

Well a lot could have happend that caused that error. You should provide more data to narrow it down.

System logs around when the error happened would be valueable.

Migsi · Jul 17, 2021

Arwen said:
ZFS was specifically design from the very beginning to survive a sudden power loss, (or reboot), WITHOUT DATA LOSS.

That said, any data in flight could be lost, but it simply would not show up. And not be listed as an error. Further, while ZFS itself can survive, if the hardware lies to ZFS, (like a hardware RAID controller re-ordering writes and using it's write cache), then data loss can occur.

It would be helpful for you to, (per forum rules / suggestions);
- List the complete hardware, (for example, you have only one M.2 slot, so how is the other 2 wired up)
- Version of TrueNAS

Good to get confirmed ZFS itself is designed to survive power losses, that makes it unlikely that an outage was causing the issue. I never touched the native RAID functions of this board, well knowing that would mess with ZFS capabilities. But I can't say for sure if the controller isn't doing any shady stuff regardlessly, though I highly hope thats not the case.

Pardon me for skipping out on those two important parts of information, I must have overlooked them missing before submitting the post. I've edited the post and added it.

QonoS said:
Well a lot could have happend that caused that error. You should provide more data to narrow it down.

System logs around when the error happened would be valueable.

I'll add logs ASAP I have time to access the system again.

RAIDZ1	Special, Mirror
3x HDD	2x NVMe

RAIDZ1	Special, Mirror
3x HDD	2x NVMe

RAIDZ1	Special, Mirror
3x HDD	2x NVMe

RAIDZ1	Special, Mirror
3x HDD	2x NVMe

Important Announcement for the TrueNAS Community.

Unhealthy pool, no errors at all

Migsi

Dabbler

Attachments

Arwen

MVP

QonoS

Explorer

Migsi

Dabbler

Similar threads

Important Announcement for the TrueNAS Community.

Unhealthy pool, no errors at all

Migsi

Dabbler

Attachments

Arwen

MVP

QonoS

Explorer

Migsi

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Unhealthy pool, no errors at all"

Similar threads