VDev show degraded but fine after restart?

DMDComposer · Feb 25, 2024

Hi,
I'm new to NAS servers in general. I saw one of my VDEVs was showing degraded. It seems "sda" and "sdb" are showing some errors with "sdb" being the one faulted. However, after a restart it now looks fine, but "sdb" has some errors still just a lot lower. I've provided two pictures below.

My question is, is one of my hdd's, presumably "sdb" in risk of failing? Is this a situation that I need to replace the drive asap? Or, do I just keep an eye on it? I would appreciate any insight!

Cheers,
DMDComposer

artlessknave · Feb 25, 2024

please read the forum rules.
a reboot will reset error counts. you should plan to replace at least one of those drives asap, because if they do both die that pool is dead.

backups are also a really really good idea.

note that, with this pool layout, you might have been better with mirrors. both your laout and all mirrors would be 50% usable.
usually raidz2 is used with 6-12 drives to get the space efficiency.

Patrick M. Hausen · Feb 25, 2024

artlessknave said:
you should plan to replace at least one of those drives asap, because if they do both die that pool is dead.

Not yet, my friend, not yet. It's a RAIDZ2 vdev, so two disks may fail. Still @DMDComposer should run SMART tests and plan for replacement.

artlessknave · Feb 26, 2024

Patrick M. Hausen said:
Not yet, my friend, not yet. It's a RAIDZ2 vdev, so two disks may fail

oops. you are right. correction: if they both die the pool will be at risk as there will be zero redundancy.
my brain basically is yelling at me "it's red; replace it, replace it REPLACE IT NOW!"

DMDComposer · Feb 26, 2024

Thank you guys for your responses. I purchased another drive this morning to prepare. I'm going to run a SMART test and see what results I get.

Thank you also for the suggestion that I should've used a mirror perhaps instead of RAIDZ2. I'm still new at this so I thought RAIDZ2 was better. I will be upgrading to 12 drives soon though! Backups and backups, where does the space go? Haha!

Cheers,
DMDComposer

DMDComposer · Feb 26, 2024

Hi Guys,

Running a long SMART test on "sdd" now (should be done tomorrow), as that one faulted which last time it was "sdb". Starting to think I should grab another drive (making two) to be replaced.

Is this normal to have the faulted drives switch like that?

Cheers,
DMDComposer

System Information:
- Platform: Generic
- Version: TrueNAS-SCALE-22.12.4.2
- Hostname: truenas
- Uptime: 1 day, as of 10:41
CPU:
- Model: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
- Cores: 4 cores (8 threads)
- Average Usage: 19%
- Highest Usage: 55% (Thread #0)
- Hottest: 46°C (All Cores)
Memory:
- Total Available: 31.2 GiB
- Free: 3.7 GiB
- ZFS Cache: 15.6 GiB
- Services: 11.9 GiB
Storage (main pool):
- Pool Status: DEGRADED
- Path: /mnt/main
- Data: 2 vdev
- Used Space: 71%
- Free Space: 18.48 TiB
- Disks with Errors: 2
- Total Disks: 8

artlessknave · Feb 26, 2024

DMDComposer said:
Is this normal to have the faulted drives switch like that?

no. if the errors are moving around, thats usually an indicator of deeper issues, like cabling or the storage controller.

I see you have a signature with some info, but you seem to have skipped one of the most important parts...a motherboard?

DMDComposer said:
Platform: Generic

this tells us nothing. hardware info includes the motherboard model. best guess if you dont know it specifically, but at least something.
the fact that you have an i7 tells me its likely a consumer motherboard, and those do have issues.

DMDComposer said:
Storage (main pool):

how are these disks connected? HBA? onboard SATA?

magic fairies?

danb35 · Feb 26, 2024

DMDComposer said:
Is this normal to have the faulted drives switch like that?

But did they actually switch? Or is it the same disk, and just showing up under a different name? The latter isn't that uncommon. If it actually is a different disk, that points strongly to the problem being somewhere other than with the disk itself.

There are a number of reasons ZFS could show errors and have nothing to do with the disk itself, power and cabling being the main ones. It's also not uncommon for SMART tests to show errors, and ZFS not--if ZFS hasn't stored any data on the sectors in question, for example, it won't see anything wrong.

DMDComposer · Feb 26, 2024

Hi,

Sorry about that, I legit just copied all the info from my dashboard on Truenas. Here is the corrected info!

System Information:
- Version: TrueNAS-SCALE-22.12.4.2
- Hostname: TrueNAS
Motherboard:
- Manufacturer: Gigabyte Technology Co., Ltd.
- Product Name (Model): Z170XP-SLI
CPU:
- Model: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
- Cores: 4 cores (8 threads)
Memory:
- Total Available: 32 GiB (2 x 16 GB)
- Type: DDR4
Storage(main pool):
- Pool Status: DEGRADED
- Path: /mnt/main
- Data: 2 vdev
  - 4 Drives connected to Motherboard via SATA
  - 4 Drives connected by PCI-E Expansion. (Card Listed Below)
    - LSI Broadcom SAS 9300-8i 8-port 12Gb/s SATA+SAS PCI-Express 3.0 Low Profile Host Bus Adapter
- Used Space: 71%
- Free Space: 18.48 TiB
- Disks with Errors: 2
- Total Disks: 8

artlessknave said:
how are these disks connected? HBA? onboard SATA?

If I'm reading correctly by the vDev ids, assuming that the alphabetical vDev id's are in order since I recently expanded my vDevs with the extra 4 drives. The degraded pool drives (sda - sdd) are connected by SATA directly to the motherboard.

My second pool and drives for reference is connected from a PCIE expansion "LSI Broadcom SAS 9300-8i 8-port 12Gb/s SATA+SAS PCI-Express 3.0 Low Profile Host Bus Adapter", which I'm assuming are (sde - sdh).

DMDComposer · Feb 26, 2024

danb35 said:
But did they actually switch? Or is it the same disk, and just showing up under a different name? The latter isn't that uncommon. If it actually is a different disk, that points strongly to the problem being somewhere other than with the disk itself.

Hi,

According to the hdd serial for "sdb", which at first was the one with 12 errors, and after restart it is showing 4 errors. "sdb" has an identical serial number of the hdd since I took note of it last night before the restart. I unfortunately, did not take note of any other hdd serial number in the pool.

danb35 said:
There are a number of reasons ZFS could show errors and have nothing to do with the disk itself, power and cabling being the main ones. It's also not uncommon for SMART tests to show errors, and ZFS not--if ZFS hasn't stored any data on the sectors in question, for example, it won't see anything wrong.

The only thing I changed was adding the additional 4 drives, but the system has been running for a week without issues if that helps narrow things down. Are there any other tests or stability checks I can do besides SMART?

Cheers,
DMDComposer

artlessknave · Feb 26, 2024

DMDComposer said:
Sorry about that, I legit just copied all the info from my dashboard on Truenas. Here is the corrected info!

that looks much better. im not thrilled about the board..but at least it's an intel NIC!

DMDComposer said:
The degraded pool drives (sda - sdd) are connected by SATA directly to the motherboard.

I would suggest moving them to the LSI. as this is a gaming motherboard the SATA controller could very well be a little flaky, especially if its old.
zfs usually works fine with modern motherbaord sata but that LSI is going to be superior in almost every way; its designed to manage 256 drives or whatever sas3 maxes at nowadays.

Important Announcement for the TrueNAS Community.

VDev show degraded but fine after restart?

DMDComposer

Cadet

artlessknave

Wizard

Patrick M. Hausen

Hall of Famer

artlessknave

Wizard

DMDComposer

Cadet

DMDComposer

Cadet

artlessknave

Wizard

danb35

Hall of Famer

DMDComposer

Cadet

DMDComposer

Cadet

artlessknave

Wizard

Similar threads

Important Announcement for the TrueNAS Community.

VDev show degraded but fine after restart?

Cadet

Wizard

Hall of Famer

Wizard

Cadet

Cadet

Wizard

Hall of Famer

Cadet

Cadet

Wizard

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "VDev show degraded but fine after restart?"

Similar threads