VDev show degraded but fine after restart?

DMDComposer

Cadet
Joined
Feb 1, 2023
Messages
5
Hi,
I'm new to NAS servers in general. I saw one of my VDEVs was showing degraded. It seems "sda" and "sdb" are showing some errors with "sdb" being the one faulted. However, after a restart it now looks fine, but "sdb" has some errors still just a lot lower. I've provided two pictures below.

1708880745176.png

1708880724234.png

My question is, is one of my hdd's, presumably "sdb" in risk of failing? Is this a situation that I need to replace the drive asap? Or, do I just keep an eye on it? I would appreciate any insight!

Cheers,
DMDComposer
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
please read the forum rules.
a reboot will reset error counts. you should plan to replace at least one of those drives asap, because if they do both die that pool is dead.

backups are also a really really good idea.

note that, with this pool layout, you might have been better with mirrors. both your laout and all mirrors would be 50% usable.
usually raidz2 is used with 6-12 drives to get the space efficiency.
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
you should plan to replace at least one of those drives asap, because if they do both die that pool is dead.
Not yet, my friend, not yet. It's a RAIDZ2 vdev, so two disks may fail. Still @DMDComposer should run SMART tests and plan for replacement.
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
Not yet, my friend, not yet. It's a RAIDZ2 vdev, so two disks may fail
oops. you are right. correction: if they both die the pool will be at risk as there will be zero redundancy.
my brain basically is yelling at me "it's red; replace it, replace it REPLACE IT NOW!"
 

DMDComposer

Cadet
Joined
Feb 1, 2023
Messages
5
Thank you guys for your responses. I purchased another drive this morning to prepare. I'm going to run a SMART test and see what results I get.

Thank you also for the suggestion that I should've used a mirror perhaps instead of RAIDZ2. I'm still new at this so I thought RAIDZ2 was better. I will be upgrading to 12 drives soon though! Backups and backups, where does the space go? Haha!

Cheers,
DMDComposer
 

DMDComposer

Cadet
Joined
Feb 1, 2023
Messages
5
Hi Guys,

Running a long SMART test on "sdd" now (should be done tomorrow), as that one faulted which last time it was "sdb". Starting to think I should grab another drive (making two) to be replaced.

1708962072019.png


Is this normal to have the faulted drives switch like that?

Cheers,
DMDComposer
  • System Information:
    • Platform: Generic
    • Version: TrueNAS-SCALE-22.12.4.2
    • Hostname: truenas
    • Uptime: 1 day, as of 10:41
  • CPU:
    • Model: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
    • Cores: 4 cores (8 threads)
    • Average Usage: 19%
    • Highest Usage: 55% (Thread #0)
    • Hottest: 46°C (All Cores)
  • Memory:
    • Total Available: 31.2 GiB
    • Free: 3.7 GiB
    • ZFS Cache: 15.6 GiB
    • Services: 11.9 GiB
  • Storage (main pool):
    • Pool Status: DEGRADED
    • Path: /mnt/main
    • Data: 2 vdev
    • Used Space: 71%
    • Free Space: 18.48 TiB
    • Disks with Errors: 2
    • Total Disks: 8
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
Is this normal to have the faulted drives switch like that?
no. if the errors are moving around, thats usually an indicator of deeper issues, like cabling or the storage controller.

I see you have a signature with some info, but you seem to have skipped one of the most important parts...a motherboard?
Platform: Generic
this tells us nothing. hardware info includes the motherboard model. best guess if you dont know it specifically, but at least something.
the fact that you have an i7 tells me its likely a consumer motherboard, and those do have issues.

Storage (main pool):

how are these disks connected? HBA? onboard SATA?
magic fairies?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Is this normal to have the faulted drives switch like that?
But did they actually switch? Or is it the same disk, and just showing up under a different name? The latter isn't that uncommon. If it actually is a different disk, that points strongly to the problem being somewhere other than with the disk itself.

There are a number of reasons ZFS could show errors and have nothing to do with the disk itself, power and cabling being the main ones. It's also not uncommon for SMART tests to show errors, and ZFS not--if ZFS hasn't stored any data on the sectors in question, for example, it won't see anything wrong.
 

DMDComposer

Cadet
Joined
Feb 1, 2023
Messages
5
Hi,

Sorry about that, I legit just copied all the info from my dashboard on Truenas. Here is the corrected info!
  • System Information:
    • Version: TrueNAS-SCALE-22.12.4.2
    • Hostname: TrueNAS
  • Motherboard:
    • Manufacturer: Gigabyte Technology Co., Ltd.
    • Product Name (Model): Z170XP-SLI
  • CPU:
    • Model: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
    • Cores: 4 cores (8 threads)
  • Memory:
    • Total Available: 32 GiB (2 x 16 GB)
    • Type: DDR4
  • Storage(main pool):
    • Pool Status: DEGRADED
    • Path: /mnt/main
    • Data: 2 vdev
      • 4 Drives connected to Motherboard via SATA
      • 4 Drives connected by PCI-E Expansion. (Card Listed Below)
        • LSI Broadcom SAS 9300-8i 8-port 12Gb/s SATA+SAS PCI-Express 3.0 Low Profile Host Bus Adapter
    • Used Space: 71%
    • Free Space: 18.48 TiB
    • Disks with Errors: 2
    • Total Disks: 8


how are these disks connected? HBA? onboard SATA?
If I'm reading correctly by the vDev ids, assuming that the alphabetical vDev id's are in order since I recently expanded my vDevs with the extra 4 drives. The degraded pool drives (sda - sdd) are connected by SATA directly to the motherboard.

My second pool and drives for reference is connected from a PCIE expansion "LSI Broadcom SAS 9300-8i 8-port 12Gb/s SATA+SAS PCI-Express 3.0 Low Profile Host Bus Adapter", which I'm assuming are (sde - sdh).
 

DMDComposer

Cadet
Joined
Feb 1, 2023
Messages
5
But did they actually switch? Or is it the same disk, and just showing up under a different name? The latter isn't that uncommon. If it actually is a different disk, that points strongly to the problem being somewhere other than with the disk itself.
Hi,

According to the hdd serial for "sdb", which at first was the one with 12 errors, and after restart it is showing 4 errors. "sdb" has an identical serial number of the hdd since I took note of it last night before the restart. I unfortunately, did not take note of any other hdd serial number in the pool.

There are a number of reasons ZFS could show errors and have nothing to do with the disk itself, power and cabling being the main ones. It's also not uncommon for SMART tests to show errors, and ZFS not--if ZFS hasn't stored any data on the sectors in question, for example, it won't see anything wrong.
The only thing I changed was adding the additional 4 drives, but the system has been running for a week without issues if that helps narrow things down. Are there any other tests or stability checks I can do besides SMART?

Cheers,
DMDComposer
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
Sorry about that, I legit just copied all the info from my dashboard on Truenas. Here is the corrected info!
that looks much better. im not thrilled about the board..but at least it's an intel NIC!
The degraded pool drives (sda - sdd) are connected by SATA directly to the motherboard.
I would suggest moving them to the LSI. as this is a gaming motherboard the SATA controller could very well be a little flaky, especially if its old.
zfs usually works fine with modern motherbaord sata but that LSI is going to be superior in almost every way; its designed to manage 256 drives or whatever sas3 maxes at nowadays.
 
Top