2 Disk Errors on all drives (how to fix?)

vaursechs

Cadet
Joined
Feb 10, 2024
Messages
5
Hello everyone,

I am sorry if this has been asked before, I could not find a thread with the same issue.
First of all let me say that I am a 100% newbie to Truenas and I only used it for storage (video files) and to run Plex.

I have been using Truenas Scale for over a year and never had any problems with it, however a month ago I noticed that my poolstatus was unhealty and ZFS shows that there are 2 errors on all 8 disks. It would be possible that the disks are failing, but I highly doubt that 8 new Seagate Ironwolf 14TB disks all are failing at the same time?

I am running Truenas Scale 23.10.1.3 on a Ryzen 9 5950X, Asrock B550 motherboard, 64GB ECC memory, 8x Seagate Ironwolf 14tb (4 are connected to SATA motherboard, the other 4 are connected to a StarTech PCI-E to 8x SATA card), Intel 10Gbit Converged Network Adapter.

dashboard.jpg
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
What's the output of zpool status?
 

vaursechs

Cadet
Joined
Feb 10, 2024
Messages
5
Were do I type that command? I tried to type it in the shell but it said "zsh: command not found: zpool"
(like I said, I am a total noob at this :oops: )
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Were do I type that command? I tried to type it in the shell but it said "zsh: command not found: zpool"
(like I said, I am a total noob at this :oops: )
System Settings -> Shell
Enter the command there. Cut and paste the results here in [code] [/code] brackets
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Great, they've removed it from $PATH. Bah. You'd need to use the full path to the zpool binary and I can never remember where it is (if I had a time machine, I'd figure out a way to standardize where Unix-like systems put their binaries).

The GUI must have an equivalent page, showing the details of the pool's status. That should suffice, too.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Great, they've removed it from $PATH. Bah.
Mine is good, using SCALE 23.10.1.3. The funny thing is, I thought I was having a $PATH issue as well the other day. The application I was using was in ./sbin and when I checked $PATH, it was there. I reestablished $PATH and all was working. Very odd. A reboot didn't break it.

It's another gremlin.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Hmm, this is the second such case I've come across in the past few days, but I don't use Scale myself... But this is really irritating and worth a bug report if it indeed is unintentional...
 

PhilD13

Patron
Joined
Sep 18, 2020
Messages
203
When you are told to use a command in the shell (System Settings -> Shell) such as the command "zpool status" and you get the error
"zsh: command not found: zpool". You just need to use sudo like shown below:
sudo zpool status
Enter the admin password at the command prompt when requested and the command will run. This has something to do with the change from root user to admin user as the system administrator and affects commands that need additional privileges to run. Yea, it's probably an unfixed path issue or command privileges issue with Truenas and Debian, but all my Scale installs act and respond the same way where older ones and core that used root did not need sudo.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I log in as 'root' myself. Not good practice but it's in my house and not in the internet.

Hmm, this is the second such case I've come across in the past few days, but I don't use Scale myself... But this is really irritating and worth a bug report if it indeed is unintentional...
Agreed. If I see the issue again, I will submit it. I honestly thought it was just me until you posted about it. I do much crap to my NAS and I switch between CORE and SCALE all the time.
 

PhilD13

Patron
Joined
Sep 18, 2020
Messages
203
I log in as 'root' myself. Not good practice but it's in my house and not in the internet.
Did they change the process? The only thing I had setup before was Core which setup root and that was what I used but that was over a year ago.

I setup a bare metal Bluefin Scale last July and during the install it sent me through the setup for the admin user as Local Administrator (950) with the (root) password I chose during install and disabled the root account though the account is still there. If I enable the root account Bluefin complains loudly I should not be using the root account for login, so I'm fine with the admin account and any extra setps needed to get some things to work.

In December I set up a bare metal Cobia system and it did not send me through the admin setup and it did not disable root. On first login a popup appeared strongly suggesting I disable root change the password and use the admin Local Administrator (950) user which I went through the prompts, rebooted and I now use the admin account. I think it disabled root after I went through the modals but I might have manually taken care of it. I thought it was very strange that Cobia would do this so very different and convoluted, but whatever, I was in a hurry and ignored the strangeness as I had what I needed.

I do see quite often in the forums where a command result is requested and someone needs to point out that the command has to be preceded by sudo in order for the command to work or they will get the "zsh: command not found: xxxx" error. This is how some Linux distributions work where you are an administrator and not root and some command line commands need the added sudo in front of them in order to run. I consider it normal operation to need the sudo for certain commands.
 

vaursechs

Cadet
Joined
Feb 10, 2024
Messages
5
Finally managed to get the command to work:
Code:
admin@truenas[~]$ sudo zpool status
[sudo] password for admin: 
  pool: NICKNAS
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 08:53:21 with 2 errors on Sun Feb  4 08:53:23 2024
config:

        NAME                                      STATE     READ WRITE CKSUM
        NICKNAS                                   ONLINE       0     0     0
          raidz2-0                                ONLINE       0     0     0
            a82155ff-d72e-40d1-a5a6-1c8eaa1c9bed  ONLINE       0     0     0
            b773f97e-697a-43e5-8342-0fe15b8b6c41  ONLINE       0     0     0
            9c9d1d9e-cae6-412a-8621-708961958507  ONLINE       0     0     0
            6e92f1e4-b959-440f-90b1-ce62e1e9081c  ONLINE       0     0     0
            5873ea13-0ae2-440c-a2f4-a703b67f0dd0  ONLINE       0     0     0
            92008db1-a32d-4a80-b9bf-a4d733d597fe  ONLINE       0     0     0
            f99e0e23-7ce3-4b68-a2bc-9aadd67493f0  ONLINE       0     0     0
            7d354229-90e3-4cf2-adce-5af2eb78a563  ONLINE       0     0     0

errors: 2 data errors, use '-v' for a list

  pool: boot-pool
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 00:00:05 with 0 errors on Fri Feb  9 03:45:06 2024
config:

        NAME         STATE     READ WRITE CKSUM
        boot-pool    ONLINE       0     0     0
          nvme0n1p3  ONLINE       0     0     0

errors: No known data errors
admin@truenas[~]$ 
 

vaursechs

Cadet
Joined
Feb 10, 2024
Messages
5
After checking with sudo zpool status -v it appeared to be a single video file that was corrupted.
I deleted the file and I replaced it with the correct one (from a different backup).
I guess I need to do a fresh scrub now to check if all errors are gone?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Did you reboot since the errors first popped up? I'm not liking the lack of errors in that output, all the counters are at zero.
I guess I need to do a fresh scrub now to check if all errors are gone?
Definitely a good plan. Let's work from there.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994

vaursechs

Cadet
Joined
Feb 10, 2024
Messages
5
The scrub did not fix the issue, however
Code:
sudo zpool clear *poolname*
did the trick!
Pool is now healthy again with no errors.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Zpool clear doesn't resolve anything, it only resets the error counters and stuff.

That said, if the scrub is done and no errors were found, that's a good start - but the question remains of what happened.
 
Top