Replaced disk fails resilver and marks drive REMOVED

ArsenalNAS · Apr 21, 2022

I am new to this forum. I am running TrueNAS CORE 12.0 U6 with 22 drives (2 groups of 11). I have replaced drives previously, but I am not running into a problem with rebuilding this time around. The usage is high (~95%) and I have now have 2 troubled drives in the pool. Both questionable drives are in the same group. When I make attempts to offline and replace the drive with a new drive, the process looks standard. I see the new drive show up as the replacement pulldown. I start the replacement process (resilver), and shortly there after the system pauses and then continues. If I refresh after that pause, the replacement disk is not listed as "REMOVED" with the GPID number string listed.

Screen Shot 2022-04-21 at 11.36.07 AM.png

I have not discovered a process to get the system to make another attempt at using the replacement disk. The system has labeled that failed disk as REMOVED and will not proceed, if it the UI has been refreshed. The drive has been marked out. If I put in another drive, I can start the process again. However, I have done that, and the 2nd replaced disk did the same error.

When I look at the console and look at the disk serial history (da21), I can see the different disks that failed the attempts.

Screen Shot 2022-04-21 at 12.25.08 PM.png

I would appreciate any insights into how to best resolve the issue.

sretalla · Apr 22, 2022

Sounds like a problem with that port on the backplane or the cable to it.

Have a look at dmesg and see if CAM is reporting errors that point at CRC errors or something else.

Patrick M. Hausen · Apr 22, 2022

What brand and model of disk, exactly?

ArsenalNAS · Apr 22, 2022

The 2 disk types I have used are both Seagate. The initial disk installations are 3TB drives, which I changed out the replacement disks using 6TB (ST6000NM0044). I am inclined to think the faulty connector (or midplane). Since it is a unified midplane it is a bit more difficult to replace. I may have to map out the connector location. Although, I will have an imbalanced RAID config now between the two zgroups.

The "dmesg" simply lists SCSI ERROR. (I am not familiar with the CDB addressing)

Screen Shot 2022-04-22 at 1.05.00 PM.png

ArsenalNAS · Apr 22, 2022

As a side question, if I have a replament disk which has failed (in this case, connection failure). Would I be able to use it in another slot as a replacement disk? I know if I use it in the same slot, the system seems to ignore it. Or would it be best to for erase, and use as replacement in different slot?

sretalla · Apr 25, 2022

ArsenalNAS said:
if I have a replament disk which has failed (in this case, connection failure). Would I be able to use it in another slot as a replacement disk? I know if I use it in the same slot, the system seems to ignore it. Or would it be best to for erase, and use as replacement in different slot?

I'd start by running badblocks on it for a bit to make sure you're not just introducing trouble for yourself... just search for badblocks and you'll find some good resources on how to run it.

The errors are certainly pointing at a failure in the path between the system and the disk(s) (only da21 implicated at this point).

Maybe the backplane port, so try working around it and see how that goes.

Important Announcement for the TrueNAS Community.

Replaced disk fails resilver and marks drive REMOVED

ArsenalNAS

Cadet

sretalla

Powered by Neutrality

Patrick M. Hausen

Hall of Famer

ArsenalNAS

Cadet

ArsenalNAS

Cadet

sretalla

Powered by Neutrality

Similar threads

Important Announcement for the TrueNAS Community.

Replaced disk fails resilver and marks drive REMOVED

ArsenalNAS

Cadet

sretalla

Powered by Neutrality

Patrick M. Hausen

Hall of Famer

ArsenalNAS

Cadet

ArsenalNAS

Cadet

sretalla

Powered by Neutrality

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Replaced disk fails resilver and marks drive REMOVED"

Similar threads