Pool State Degraded with no disk failure

sajid

Cadet
Joined
Feb 28, 2024
Messages
2
We have a Cisco M4 240 server where we have installed FreeNAS. The overall pool utilization is at 97%, and it was healthy. We plan to install a JBOD to create an additional pool, for which we shut down the server for controller installation. After the controller was installed, we turned on the server and found that the Pool status changed to degraded, and one dataset volume was reported as raw. We did not find any failed drives, but the pool is still reporting as degraded. Kindly assist in recovering the pool.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Per forum rules, (link above), please give more details on your hardware. Especially your pool configuration from these commands;

zpool status zpool list zfs list

It is recommended to keep a ZFS pool at 80% or under to prevent fragmentation. Though with extreme pool sizes, 90% can be okay. But, 97% is a bit too full and will likely impact performance.

Your description of "one dataset volume was reported as raw" does not make sense. Perhaps you used "dataset" when you meant "zVol", which is a special ZFS dataset that is basically a container for raw data. Like for a VM's file system. The above command, "zfs list", will clarify that issue.


As to the cause of "degraded", I can only guess. But, on occasion a disk data cable can become loose and cause a drive to have quite a few errors. Too many, and ZFS will consider the drive failed and degrade that ZFS vDev. If you have proper redundancy, (ZFS command "zpool status" will show what you have), you don't need "recovering the pool". Just a simple cable fix.

Now I happen to know a bit about Cisco M4 240 servers, so I am guessing their are not individual drive cables. So we would need to see more information before we can assist further.
 

sajid

Cadet
Joined
Feb 28, 2024
Messages
2
Thank you for the input.

Following are the server specs:-
Cisco 240 M4
Single Intel 2630 v4
64GB Memory
10x10TB Seagate EXOS 10TB NL SAS 6Gbps
1x480GB for OS

As the pool utilization was on 97%, we planned to attach JBOD by installing additional LSI. We attached the JBOD to TrueNAS and booted the server. Initially, the existing pool appeared healthy, but within a minute, the server experienced a kernel panic and restarted automatically. After rebooting, the server began reporting I/O errors, and the pool became degraded despite no disk failures being detected.

We performed troubleshooting to the best of our ability but have been unable to resolve the degraded pool issue. Notably, there is no drive showing as failed. It appears that the storage appliance is retrieving outdated disk information for a drive that failed long ago and was subsequently replaced. As a result of this degraded state, the customer's backup volume, the F drive with a size of 25TB, has been rendered into a raw partition.

As soon we execute import, server trigger kernal panic and restart. Then we upgrade the Freenas to Truenas and executed import pool and now server does not restart, but giving I/O error.

Before upgrade this is what the error we were getting, when executed import
1709238469677.png


After upgrade this is the error we are getting
1709238546128.png


1709238577172.png
 
Top