Problem detaching 2 replaced drives

Status
Not open for further replies.

sdspieg

Contributor
Joined
Aug 6, 2012
Messages
168
Since two disks in my volume were degraded, I replaced them through the webgui. After replacing them, they immediately started resilvering, a process I didn't want to interrupt. But when that was done (6 hours later), the new disks were online, but the old disks still seem to be in the system. Detaching them in the GUI doesn't work - it says succesfully detached; but they're still there. Also upon a reboot. See here. Any ideas on how to fix this?
01.04.2018-12.28.png

Thanks!
 
Last edited:

Alecmascot

Guru
Joined
Mar 18, 2014
Messages
1,177
Well, no one else has given you the bad news......
I did this myself a few years back and it is easy to do.
You did not follow the correct procedure to replace a failed drive as per the manual, and just added a new drive to the pool.
The only way back is to destroy the pool and restore from a backup.
After it happened to me I keep a full backup of my storage pool just in case.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
@Alecmascot Actually no, that's not what he has done,; the zpool status command output clearly shows a 3 disks RAID-Z1 with two disks currently being replaced.

@OP AFAIK you can't detach the drives because they are not available. Are they still connected and powered?
 

sdspieg

Contributor
Joined
Aug 6, 2012
Messages
168
Thanks for getting back to me. I did start the whole procedure by clicking on the 'replace' option (they were already offline - the 'detach' option was not available, just the replace one). And no the disks are no longer connected. But I obviously still have them. I still have enough sata- and power-cables to reconnect them - would that help? And would it matter then, that they are no longer connected to the the same sata-ports? Please advise - I will wait to do anything until I get a reply. Thanks!
 

Alecmascot

Guru
Joined
Mar 18, 2014
Messages
1,177
"3 disks RAID-Z1 with two disks currently being replaced."
That indicates the pool integrity is destroyed.
If the OP got here by strictly following the drive replacement instructions then he should open a bug report.
Anyway the pool is toast :-((
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
As he has the two original disks the pool is maybe recoverable (but with some corruption).

So yeah, reconnect the missing disks, it should start to resilver.
 

sdspieg

Contributor
Joined
Aug 6, 2012
Messages
168
I put the two 'degraded' disks back in (alongside the three that were already in there) - on different sata ports. FreeNAS did start resilvering them - and said it would take 17 hours, which is about twice the time it took for each of them to resilver when I replaced them. Fingers crossed. BTW - everything keeps running perfectly in the meanwhile...
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Ok, don't touch anything until it's over and post the output of zpool status -v after so we can see exactly the pool' state and what to do next.
 

sdspieg

Contributor
Joined
Aug 6, 2012
Messages
168
Hmmmm. No cigar.
Code:
login as: root
root@192.168.1.10's password:
Last login: Sat Jan  6 02:56:25 2018 from 192.168.1.17
FreeBSD 11.1-STABLE (FreeNAS.amd64) #0 r321665+d4625dcee3e(freenas/11.1-stable):																		   Wed Dec 13 16:33:42 UTC 2017

		FreeNAS (c) 2009-2017, The FreeNAS Development Team
		All rights reserved.
		FreeNAS is released under the modified BSD license.

		For more information, documentation, help or support, go here:
		http://freenas.org
Welcome to Stephan's FreeNAS

Warning: settings changed through the CLI are not written to
the configuration database and will be reset on reboot.

root@freenas:~ # zpool status -v
  pool: Stephan
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
		corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
		entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: resilvered 656K in 0 days 06:05:48 with 71 errors on Sat Jan  6 13:16:06 2018
config:

		NAME											  STATE	 READ WRITE CKSUM
		Stephan										   DEGRADED	 0	 0	74
		  raidz1-0										DEGRADED	 0	 0   148
			3404696166831701594						   REMOVED	  0	 0	 0  was /dev/ada0p2
			replacing-1								   DEGRADED	 0	 0	 0
			  12001118326471259940						UNAVAIL	  0	 0	 0  was /dev/gptid/d1e4cf38-2c1a-11e7-94fe-3085a943b0e3
			  gptid/9e6a269c-ef4d-11e7-9531-3085a943b0e3  ONLINE	   0	 0	 0
			replacing-2								   DEGRADED	 0	 0	 0
			  10566938263912300466						UNAVAIL	  0	 0	 0  was /dev/gptid/d2adb9c6-2c1a-11e7-94fe-3085a943b0e3
			  gptid/e3b1c9fa-f011-11e7-bff4-3085a943b0e3  ONLINE	   0	 0	 0

errors: Permanent errors have been detected in the following files:

		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/geom_stat/geom_busy_percent-da0.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/geom_stat/geom_ops-ada0p1.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/geom_stat/geom_latency-ada0p1.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/cputemp-0/temperature.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc_v2/arcstat_ratio_metadata-demand_metadata_misses.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/geom_stat/geom_ops-ada1p1.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/geom_stat/geom_busy_percent-ada1p1.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/geom_stat/geom_latency-ada1p1.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/geom_stat/geom_bw-ada1p2.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/geom_stat/geom_latency-ada1p2.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/geom_stat/geom_ops-ada1p2.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/geom_stat/geom_ops-ada0.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/disk-ada0/disk_time.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/interface-re0/if_packets.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/geom_stat/geom_bw-ada0.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/df-mnt-Stephan-Media/df_complex-free.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/geom_stat/geom_bw-da0p2.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/disk-ada1/disk_io_time.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/ctl-ioctl/disk_time.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/geom_stat/geom_ops_rwd-ada1.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/df-mnt-Stephan-jails-apacheserver/df_complex-used.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/disk-ada2/disk_octets.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/interface-epair1a/if_packets.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/df-mnt-Stephan-jails-sabnzbd_1/df_complex-free.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/disk-ada2/disk_io_time.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/processes/ps_state-idle.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/cpu-0/cpu-nice.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/cpu-1/cpu-user.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/swap/swap-free.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc/cache_size-anon_size.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc/cache_operation-allocated.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc/mutex_operations-miss.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc/hash_collisions.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc/cache_eviction-ineligible.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc/cache_result-demand_metadata-hit.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/aggregation-cpu-min/cpu-idle.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc/cache_result-demand_data-miss.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc/cache_result-demand_metadata-miss.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/aggregation-cpu-max/cpu-idle.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/aggregation-cpu-average/cpu-interrupt.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc/cache_result-prefetch_metadata-miss.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/aggregation-cpu-average/cpu-nice.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc/cache_result-mru_ghost-hit.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/aggregation-cpu-min/cpu-system.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc_v2/gauge_arcstats_raw_demand-demand_metadata_hits.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/aggregation-cpu-stddev/cpu-user.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc_v2/gauge_arcstats_raw_mru-mfu_hits.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc_v2/gauge_arcstats_raw_counts-deleted.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc_v2/gauge_arcstats_raw_evict-evict_skip.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc_v2/gauge_arcstats_raw_evict-evict_l2_eligible.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc_v2/gauge_arcstats_raw_hash-hash_elements_max.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc_v2/gauge_arcstats_raw_hash-hash_chain_max.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc_v2/gauge_arcstats_raw_cp-p.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc_v2/gauge_arcstats_raw_l2evict-l2_evict_reading.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc_v2/gauge_arcstats_raw_l2_compress-l2_compress_successes.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc_v2/gauge_arcstats_raw-l2_hdr_size.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc_v2/gauge_arcstats_raw_l2write-l2_write_io_in_progress.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc_v2/gauge_arcstats_raw_arcmeta-arc_meta_max.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc_v2/arcstat_ratio_arc-hits.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc_v2/arcstat_ratio_mu-mru_hits.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc_v2/arcstat_ratio_data-demand_data_hits.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc_v2/arcstat_ratio_arc-l2_misses.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc_v2/arcstat_ratio_metadata-demand_metadata_hits.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/zfs_arc_v2/arcstat_ratio_metadata-prefetch_metadata_misses.rrd
		/var/db/system/rrd-7c35bc62b22f460fb3766e1c156d5c44/localhost/geom_stat/geom_bw-da0.rrd

  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:00:36 with 0 errors on Fri Jan  5 03:45:36 2018
config:

		NAME		STATE	 READ WRITE CKSUM
		freenas-boot  ONLINE	   0	 0	 0
		  da0p2	 ONLINE	   0	 0	 0

errors: No known data errors
root@freenas:~ #

 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Do you have a backup? if not it would be great to do one right now. The only files corrupted are the one used for the graphs in the GUI so nothing important is corrupted for now.

What's the output of camcontrol devlist please?
 

sdspieg

Contributor
Joined
Aug 6, 2012
Messages
168
Hah - I just now see that the system didn't 'see' one drive. I'll check that too (maybe a cable issues...). And will start a backup then

root@freenas:~ # camcontrol devlist
<WDC WD40EFRX-68N32N0 82.00A82> at scbus0 target 0 lun 0 (pass0,ada0)
<WDC WD40EFRX-68N32N0 82.00A82> at scbus1 target 0 lun 0 (ada1,pass1)
<WDC WD40EFRX-68N32N0 82.00A82> at scbus2 target 0 lun 0 (ada2,pass2)
<WDC WD40EFRX-68N32N0 82.00A82> at scbus3 target 0 lun 0 (ada3,pass3)
<SanDisk SanDisk Ultra PMAP> at scbus8 target 0 lun 0 (pass4,da0)
root@freenas:~ #

 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
How are your drives connected to the system?
 

sdspieg

Contributor
Joined
Aug 6, 2012
Messages
168
Euh... Normally. There are 6 SATA ports on the motherboard. I had 3 drives connected to 3 of them; I then re-added the two degraded ones, but maybe a cable of one of them wasn't properly connected. I'll check that out in a bit when I'm back home - maybe that's the one I clicked 'detach' on.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Ok, what's your motherboard model?
 

sdspieg

Contributor
Joined
Aug 6, 2012
Messages
168
Here's the new output. And the motherboard is an Asus M5A78L LE.

root@freenas:~ # zpool status
pool: Stephan
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Sat Jan 6 23:44:54 2018
1.26T scanned at 490M/s, 700G issued at 267M/s, 7.26T total
233G resilvered, 9.42% done, 0 days 07:10:56 to go
config:

NAME STATE READ WRITE CKSUM
Stephan DEGRADED 0 0 2
raidz1-0 DEGRADED 0 0 6
ada0p2 ONLINE 0 0 1 (resilvering)
replacing-1 DEGRADED 0 0 0
12001118326471259940 UNAVAIL 0 0 0 was /dev/gptid/d1e4cf38-2c1a-11e7-94fe-3085a943b0e3
gptid/9e6a269c-ef4d-11e7-9531-3085a943b0e3 ONLINE 0 0 0 (resilvering)
replacing-2 ONLINE 0 0 3
gptid/d2adb9c6-2c1a-11e7-94fe-3085a943b0e3 ONLINE 0 0 0
gptid/e3b1c9fa-f011-11e7-bff4-3085a943b0e3 ONLINE 0 0 0 (resilvering)

errors: 65 data errors, use '-v' for a list

pool: freenas-boot
state: ONLINE
scan: scrub repaired 0 in 0 days 00:00:36 with 0 errors on Fri Jan 5 03:45:36 2018
config:

NAME STATE READ WRITE CKSUM
freenas-boot ONLINE 0 0 0
da0p2 ONLINE 0 0 0

errors: No known data errors
root@freenas:~ # camcontrol devlist
<WDC WD40EFRX-68N32N0 82.00A82> at scbus0 target 0 lun 0 (ada0,pass0)
<WDC WD40EFRX-68N32N0 82.00A82> at scbus1 target 0 lun 0 (ada1,pass1)
<WDC WD40EFRX-68N32N0 82.00A82> at scbus2 target 0 lun 0 (ada2,pass2)
<WDC WD40EFRX-68N32N0 82.00A82> at scbus3 target 0 lun 0 (ada3,pass3)
<WDC WD40EFRX-68N32N0 82.00A82> at scbus5 target 0 lun 0 (ada4,pass4)
<SanDisk SanDisk Ultra PMAP> at scbus8 target 0 lun 0 (pass5,da0)
root@freenas:~ #

I guess I should try to find out which gptid number corresponds to which serial number, so that I can at least identify them physically too.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
The MB uses a part of the AMD SB710 chipset as the SATA controller and I don't know if it is well compatible with FreeNAS, we tend to prefer Intel and LSI controllers because we know there's no problem with them. It can explain why your drives drop like that.

It can also be the cables, SATA and/or power. And it can be the power supply; BTW what PSU do you have?

Finally it can be the drives of course; what's the output of the command smartctl -a /dev/adaX for all disks?

I made a script to identify the drives if you want: https://forums.freenas.org/index.ph...d-identification-and-backup-the-config.27365/ or you can also pick commands in my other thread: https://forums.freenas.org/index.php?threads/useful-commands.30314/ ;)
 
Status
Not open for further replies.
Top