Problem detaching 2 replaced drives

sdspieg · Jan 4, 2018

Since two disks in my volume were degraded, I replaced them through the webgui. After replacing them, they immediately started resilvering, a process I didn't want to interrupt. But when that was done (6 hours later), the new disks were online, but the old disks still seem to be in the system. Detaching them in the GUI doesn't work - it says succesfully detached; but they're still there. Also upon a reboot. See here. Any ideas on how to fix this?

Thanks!

Alecmascot · Jan 4, 2018

Well, no one else has given you the bad news......
I did this myself a few years back and it is easy to do.
You did not follow the correct procedure to replace a failed drive as per the manual, and just added a new drive to the pool.
The only way back is to destroy the pool and restore from a backup.
After it happened to me I keep a full backup of my storage pool just in case.

Bidule0hm · Jan 4, 2018

@Alecmascot Actually no, that's not what he has done,; the zpool status command output clearly shows a 3 disks RAID-Z1 with two disks currently being replaced.

@OP AFAIK you can't detach the drives because they are not available. Are they still connected and powered?

sdspieg · Jan 4, 2018

Thanks for getting back to me. I did start the whole procedure by clicking on the 'replace' option (they were already offline - the 'detach' option was not available, just the replace one). And no the disks are no longer connected. But I obviously still have them. I still have enough sata- and power-cables to reconnect them - would that help? And would it matter then, that they are no longer connected to the the same sata-ports? Please advise - I will wait to do anything until I get a reply. Thanks!

Alecmascot · Jan 5, 2018

"3 disks RAID-Z1 with two disks currently being replaced."
That indicates the pool integrity is destroyed.
If the OP got here by strictly following the drive replacement instructions then he should open a bug report.
Anyway the pool is toast :-((

Bidule0hm · Jan 5, 2018

As he has the two original disks the pool is maybe recoverable (but with some corruption).

So yeah, reconnect the missing disks, it should start to resilver.

sdspieg · Jan 5, 2018

I put the two 'degraded' disks back in (alongside the three that were already in there) - on different sata ports. FreeNAS did start resilvering them - and said it would take 17 hours, which is about twice the time it took for each of them to resilver when I replaced them. Fingers crossed. BTW - everything keeps running perfectly in the meanwhile...

Bidule0hm · Jan 5, 2018

Ok, don't touch anything until it's over and post the output of zpool status -v after so we can see exactly the pool' state and what to do next.

sdspieg · Jan 5, 2018

Deal! TBC...

sdspieg · Jan 6, 2018

Hmmmm. No cigar.

Code:

login as: root
root@192.168.1.10's password:
Last login: Sat Jan  6 02:56:25 2018 from 192.168.1.17
FreeBSD 11.1-STABLE (FreeNAS.amd64) #0 r321665+d4625dcee3e(freenas/11.1-stable):																		   Wed Dec 13 16:33:42 UTC 2017

		FreeNAS (c) 2009-2017, The FreeNAS Development Team
		All rights reserved.
		FreeNAS is released under the modified BSD license.

		For more information, documentation, help or support, go here:
		http://freenas.org
Welcome to Stephan's FreeNAS

Warning: settings changed through the CLI are not written to
the configuration database and will be reset on reboot.

root@freenas:~ # zpool status -v
  pool: Stephan
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
		corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
		entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: resilvered 656K in 0 days 06:05:48 with 71 errors on Sat Jan  6 13:16:06 2018
config:

		NAME											  STATE	 READ WRITE CKSUM
		Stephan										   DEGRADED	 0	 0	74
		  raidz1-0										DEGRADED	 0	 0   148
			3404696166831701594						   REMOVED	  0	 0	 0  was /dev/ada0p2
			replacing-1								   DEGRADED	 0	 0	 0
			  12001118326471259940						UNAVAIL	  0	 0	 0  was /dev/gptid/d1e4cf38-2c1a-11e7-94fe-3085a943b0e3
			  gptid/9e6a269c-ef4d-11e7-9531-3085a943b0e3  ONLINE	   0	 0	 0
			replacing-2								   DEGRADED	 0	 0	 0
			  10566938263912300466						UNAVAIL	  0	 0	 0  was /dev/gptid/d2adb9c6-2c1a-11e7-94fe-3085a943b0e3
			  gptid/e3b1c9fa-f011-11e7-bff4-3085a943b0e3  ONLINE	   0	 0	 0

errors: Permanent errors have been detected in the following files:

		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/geom_stat/geom_busy_percent-da0.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/geom_stat/geom_ops-ada0p1.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/geom_stat/geom_latency-ada0p1.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/cputemp-0/temperature.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc_v2/arcstat_ratio_metadata-demand_metadata_misses.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/geom_stat/geom_ops-ada1p1.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/geom_stat/geom_busy_percent-ada1p1.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/geom_stat/geom_latency-ada1p1.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/geom_stat/geom_bw-ada1p2.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/geom_stat/geom_latency-ada1p2.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/geom_stat/geom_ops-ada1p2.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/geom_stat/geom_ops-ada0.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/disk-ada0/disk_time.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/interface-re0/if_packets.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/geom_stat/geom_bw-ada0.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/df-mnt-Stephan-Media/df_complex-free.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/geom_stat/geom_bw-da0p2.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/disk-ada1/disk_io_time.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/ctl-ioctl/disk_time.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/geom_stat/geom_ops_rwd-ada1.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/df-mnt-Stephan-jails-apacheserver/df_complex-used.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/disk-ada2/disk_octets.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/interface-epair1a/if_packets.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/df-mnt-Stephan-jails-sabnzbd_1/df_complex-free.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/disk-ada2/disk_io_time.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/processes/ps_state-idle.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/cpu-0/cpu-nice.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/cpu-1/cpu-user.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/swap/swap-free.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc/cache_size-anon_size.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc/cache_operation-allocated.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc/mutex_operations-miss.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc/hash_collisions.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc/cache_eviction-ineligible.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc/cache_result-demand_metadata-hit.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/aggregation-cpu-min/cpu-idle.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc/cache_result-demand_data-miss.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc/cache_result-demand_metadata-miss.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/aggregation-cpu-max/cpu-idle.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/aggregation-cpu-average/cpu-interrupt.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc/cache_result-prefetch_metadata-miss.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/aggregation-cpu-average/cpu-nice.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc/cache_result-mru_ghost-hit.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/aggregation-cpu-min/cpu-system.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc_v2/gauge_arcstats_raw_demand-demand_metadata_hits.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/aggregation-cpu-stddev/cpu-user.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc_v2/gauge_arcstats_raw_mru-mfu_hits.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc_v2/gauge_arcstats_raw_counts-deleted.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc_v2/gauge_arcstats_raw_evict-evict_skip.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc_v2/gauge_arcstats_raw_evict-evict_l2_eligible.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc_v2/gauge_arcstats_raw_hash-hash_elements_max.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc_v2/gauge_arcstats_raw_hash-hash_chain_max.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc_v2/gauge_arcstats_raw_cp-p.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc_v2/gauge_arcstats_raw_l2evict-l2_evict_reading.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc_v2/gauge_arcstats_raw_l2_compress-l2_compress_successes.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc_v2/gauge_arcstats_raw-l2_hdr_size.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc_v2/gauge_arcstats_raw_l2write-l2_write_io_in_progress.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc_v2/gauge_arcstats_raw_arcmeta-arc_meta_max.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc_v2/arcstat_ratio_arc-hits.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc_v2/arcstat_ratio_mu-mru_hits.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc_v2/arcstat_ratio_data-demand_data_hits.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc_v2/arcstat_ratio_arc-l2_misses.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc_v2/arcstat_ratio_metadata-demand_metadata_hits.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc_v2/arcstat_ratio_metadata-prefetch_metadata_misses.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/geom_stat/geom_bw-da0.rrd

  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:00:36 with 0 errors on Fri Jan  5 03:45:36 2018
config:

		NAME		STATE	 READ WRITE CKSUM
		freenas-boot  ONLINE	   0	 0	 0
		  da0p2	 ONLINE	   0	 0	 0

errors: No known data errors
root@freenas:~ #

Bidule0hm · Jan 6, 2018

Do you have a backup? if not it would be great to do one right now. The only files corrupted are the one used for the graphs in the GUI so nothing important is corrupted for now.

What's the output of camcontrol devlist please?

sdspieg · Jan 6, 2018

Hah - I just now see that the system didn't 'see' one drive. I'll check that too (maybe a cable issues...). And will start a backup then

 

root@freenas:~ # camcontrol devlist

<WDC WD40EFRX-68N32N0 82.00A82>	at scbus0 target 0 lun 0 (pass0,ada0)

<WDC WD40EFRX-68N32N0 82.00A82>	at scbus1 target 0 lun 0 (ada1,pass1)

<WDC WD40EFRX-68N32N0 82.00A82>	at scbus2 target 0 lun 0 (ada2,pass2)

<WDC WD40EFRX-68N32N0 82.00A82>	at scbus3 target 0 lun 0 (ada3,pass3)

<SanDisk SanDisk Ultra PMAP>	   at scbus8 target 0 lun 0 (pass4,da0)

root@freenas:~ #

Bidule0hm · Jan 6, 2018

How are your drives connected to the system?

sdspieg · Jan 6, 2018

Euh... Normally. There are 6 SATA ports on the motherboard. I had 3 drives connected to 3 of them; I then re-added the two degraded ones, but maybe a cable of one of them wasn't properly connected. I'll check that out in a bit when I'm back home - maybe that's the one I clicked 'detach' on.

Bidule0hm · Jan 6, 2018

Ok, what's your motherboard model?

sdspieg · Jan 6, 2018

Here's the new output. And the motherboard is an Asus M5A78L LE.

 

root@freenas:~ # zpool status

  pool: Stephan

 state: DEGRADED

status: One or more devices is currently being resilvered.  The pool will

		continue to function, possibly in a degraded state.

action: Wait for the resilver to complete.

  scan: resilver in progress since Sat Jan  6 23:44:54 2018

		1.26T scanned at 490M/s, 700G issued at 267M/s, 7.26T total

		233G resilvered, 9.42% done, 0 days 07:10:56 to go

config:



		NAME											  STATE	 READ WRITE CKSUM

		Stephan										   DEGRADED	 0	 0	 2

		  raidz1-0										DEGRADED	 0	 0	 6

			ada0p2										ONLINE	   0	 0	 1  (resilvering)

			replacing-1								   DEGRADED	 0	 0	 0

			  12001118326471259940						UNAVAIL	  0	 0	 0  was /dev/gptid/d1e4cf38-2c1a-11e7-94fe-3085a943b0e3

			  gptid/9e6a269c-ef4d-11e7-9531-3085a943b0e3  ONLINE	   0	 0	 0  (resilvering)

			replacing-2								   ONLINE	   0	 0	 3

			  gptid/d2adb9c6-2c1a-11e7-94fe-3085a943b0e3  ONLINE	   0	 0	 0

			  gptid/e3b1c9fa-f011-11e7-bff4-3085a943b0e3  ONLINE	   0	 0	 0  (resilvering)



errors: 65 data errors, use '-v' for a list



  pool: freenas-boot

 state: ONLINE

  scan: scrub repaired 0 in 0 days 00:00:36 with 0 errors on Fri Jan  5 03:45:36 2018

config:



		NAME		STATE	 READ WRITE CKSUM

		freenas-boot  ONLINE	   0	 0	 0

		  da0p2	 ONLINE	   0	 0	 0



errors: No known data errors

root@freenas:~ # camcontrol devlist

<WDC WD40EFRX-68N32N0 82.00A82>	at scbus0 target 0 lun 0 (ada0,pass0)

<WDC WD40EFRX-68N32N0 82.00A82>	at scbus1 target 0 lun 0 (ada1,pass1)

<WDC WD40EFRX-68N32N0 82.00A82>	at scbus2 target 0 lun 0 (ada2,pass2)

<WDC WD40EFRX-68N32N0 82.00A82>	at scbus3 target 0 lun 0 (ada3,pass3)

<WDC WD40EFRX-68N32N0 82.00A82>	at scbus5 target 0 lun 0 (ada4,pass4)

<SanDisk SanDisk Ultra PMAP>	   at scbus8 target 0 lun 0 (pass5,da0)

root@freenas:~ #

I guess I should try to find out which gptid number corresponds to which serial number, so that I can at least identify them physically too.

Bidule0hm · Jan 8, 2018

The MB uses a part of the AMD SB710 chipset as the SATA controller and I don't know if it is well compatible with FreeNAS, we tend to prefer Intel and LSI controllers because we know there's no problem with them. It can explain why your drives drop like that.

It can also be the cables, SATA and/or power. And it can be the power supply; BTW what PSU do you have?

Finally it can be the drives of course; what's the output of the command smartctl -a /dev/adaX for all disks?

I made a script to identify the drives if you want: https://forums.freenas.org/index.ph...d-identification-and-backup-the-config.27365/ or you can also pick commands in my other thread: https://forums.freenas.org/index.php?threads/useful-commands.30314/ ;)

Important Announcement for the TrueNAS Community.

Problem detaching 2 replaced drives

sdspieg

Contributor

Alecmascot

Guru

Bidule0hm

Server Electronics Sorcerer

sdspieg

Contributor

Alecmascot

Guru

Bidule0hm

Server Electronics Sorcerer

sdspieg

Contributor

Bidule0hm

Server Electronics Sorcerer

sdspieg

Contributor

sdspieg

Contributor

Bidule0hm

Server Electronics Sorcerer

sdspieg

Contributor

Bidule0hm

Server Electronics Sorcerer

sdspieg

Contributor

Bidule0hm

Server Electronics Sorcerer

sdspieg

Contributor

Bidule0hm

Server Electronics Sorcerer

Similar threads

Important Announcement for the TrueNAS Community.

Problem detaching 2 replaced drives

Contributor

Guru

Server Electronics Sorcerer

Contributor

Guru

Server Electronics Sorcerer

Contributor

Server Electronics Sorcerer

Contributor

Contributor

Server Electronics Sorcerer

Contributor

Server Electronics Sorcerer

Contributor

Server Electronics Sorcerer

Contributor

Server Electronics Sorcerer

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Problem detaching 2 replaced drives"

Similar threads