SOLVED Replace HD FreeNAS 9.10.2

forbyd · May 13, 2019

Hello again, i tried replace hd but receive the message below to force.
is there any reason to dont do that? or another way to replace.

smartctl -a /dev/da8

[root@nasbkp01] ~# smartctl -a /dev/da8
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Seagate Desktop HDD.15
Device Model: ST4000DM000-1F2168
Serial Number: S301KSHD
LU WWN Device Id: 5 000c50 081042946
Firmware Version: CC54
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5900 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Mon May 13 10:42:12 2019 BRT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 97) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 498) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x1085) SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 109 098 006 Pre-fail Always - 176390952
3 Spin_Up_Time 0x0003 091 091 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 9
5 Reallocated_Sector_Ct 0x0033 099 099 010 Pre-fail Always - 928
7 Seek_Error_Rate 0x000f 083 060 030 Pre-fail Always - 9003914440
9 Power_On_Hours 0x0032 071 071 000 Old_age Always - 25933
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 9
183 Runtime_Bad_Block 0x0032 099 099 000 Old_age Always - 1
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 040 040 000 Old_age Always - 60
188 Command_Timeout 0x0032 100 099 000 Old_age Always - 1 1 1
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 072 063 045 Old_age Always - 28 (Min/Max 22/32)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 3
193 Load_Cycle_Count 0x0032 098 098 000 Old_age Always - 5507
194 Temperature_Celsius 0x0022 028 040 000 Old_age Always - 28 (0 22 0 0 0)
197 Current_Pending_Sector 0x0012 001 001 000 Old_age Always - 51032
198 Offline_Uncorrectable 0x0010 001 001 000 Old_age Offline - 51032
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 25551h+08m+22.858s
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 63216693726
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 11427498986333

SMART Error Log Version: 1
ATA Error Count: 60 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 60 occurred at disk power-on lifetime: 24186 hours (1007 days + 18 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 10 ff ff ff 4f 00 48d+17:59:20.932 READ FPDMA QUEUED
60 00 10 ff ff ff 4f 00 48d+17:59:20.931 READ FPDMA QUEUED
60 00 10 90 02 40 40 00 48d+17:59:20.931 READ FPDMA QUEUED
60 00 00 ff ff ff 4f 00 48d+17:59:20.931 READ FPDMA QUEUED
2f 00 01 10 00 00 00 00 48d+17:59:20.909 READ LOG EXT

Error 59 occurred at disk power-on lifetime: 24186 hours (1007 days + 18 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 20 ff ff ff 4f 00 48d+17:59:16.845 READ FPDMA QUEUED
60 00 10 ff ff ff 4f 00 48d+17:59:16.845 READ FPDMA QUEUED
60 00 10 ff ff ff 4f 00 48d+17:59:16.845 READ FPDMA QUEUED
60 00 10 90 02 40 40 00 48d+17:59:16.845 READ FPDMA QUEUED
60 00 00 ff ff ff 4f 00 48d+17:59:16.845 READ FPDMA QUEUED

Error 58 occurred at disk power-on lifetime: 24186 hours (1007 days + 18 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 20 ff ff ff 4f 00 48d+17:59:11.846 READ FPDMA QUEUED
60 00 00 ff ff ff 4f 00 48d+17:59:11.846 READ FPDMA QUEUED
60 00 10 90 02 40 40 00 48d+17:59:11.846 READ FPDMA QUEUED
60 00 10 ff ff ff 4f 00 48d+17:59:11.846 READ FPDMA QUEUED
60 00 10 ff ff ff 4f 00 48d+17:59:11.845 READ FPDMA QUEUED

Error 57 occurred at disk power-on lifetime: 24186 hours (1007 days + 18 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: WP at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 00 08 ff ff ff 4f 00 48d+17:59:05.224 WRITE FPDMA QUEUED
61 00 08 ff ff ff 4f 00 48d+17:59:05.224 WRITE FPDMA QUEUED
61 00 08 ff ff ff 4f 00 48d+17:59:05.224 WRITE FPDMA QUEUED
61 00 08 ff ff ff 4f 00 48d+17:59:05.224 WRITE FPDMA QUEUED
61 00 28 ff ff ff 4f 00 48d+17:59:05.223 WRITE FPDMA QUEUED

Error 56 occurred at disk power-on lifetime: 24186 hours (1007 days + 18 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: WP at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 00 08 ff ff ff 4f 00 48d+17:59:04.403 WRITE FPDMA QUEUED
60 00 20 ff ff ff 4f 00 48d+17:59:00.499 READ FPDMA QUEUED
60 00 20 ff ff ff 4f 00 48d+17:59:00.498 READ FPDMA QUEUED
60 00 20 ff ff ff 4f 00 48d+17:59:00.494 READ FPDMA QUEUED
60 00 20 ff ff ff 4f 00 48d+17:59:00.491 READ FPDMA QUEUED

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

zpool status

[root@nasbkp01] ~# zpool status
pool: DATA
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
repaired.
scan: scrub repaired 0 in 41h23m with 0 errors on Mon Apr 8 17:23:34 2019
config:

NAME STATE READ WRITE CKSUM
DATA DEGRADED 0 0 0
raidz3-0 DEGRADED 0 0 0
gptid/f92e0fff-56c7-11e7-8ab3-d89d67230870 ONLINE 0 0 0
gptid/f9ea1d11-56c7-11e7-8ab3-d89d67230870 ONLINE 0 0 0
gptid/fab4857c-56c7-11e7-8ab3-d89d67230870 ONLINE 0 0 0
gptid/fb754095-56c7-11e7-8ab3-d89d67230870 ONLINE 0 0 0
gptid/fc4b77ce-56c7-11e7-8ab3-d89d67230870 ONLINE 0 0 0
gptid/fd049811-56c7-11e7-8ab3-d89d67230870 ONLINE 0 0 0
gptid/fdc797da-56c7-11e7-8ab3-d89d67230870 ONLINE 0 0 0
gptid/fe87b2af-56c7-11e7-8ab3-d89d67230870 FAULTED 50 416 17.6K too many errors
gptid/ff52cf0e-56c7-11e7-8ab3-d89d67230870 ONLINE 0 0 0
gptid/006b8c5f-56c8-11e7-8ab3-d89d67230870 ONLINE 0 0 0
gptid/015a138f-56c8-11e7-8ab3-d89d67230870 ONLINE 0 0 0
spares
gptid/024f1cc8-56c8-11e7-8ab3-d89d67230870 AVAIL

errors: No known data errors

pool: freenas-boot
state: ONLINE
scan: scrub repaired 0 in 0h0m with 0 errors on Sun May 12 03:45:43 2019
config:

NAME STATE READ WRITE CKSUM
freenas-boot ONLINE 0 0 0
da0p2 ONLINE 0 0 0

errors: No known data errors

forbyd · May 13, 2019

I tried to force and got that message..

dlavigne · May 15, 2019

Are you reusing a disk from another (or the same) pool?

forbyd · May 15, 2019

I tried replace da8 to da11(spare), the same pool i guess (can u see in the first print).. I not expert in the freenas yet :/
Thank you for the fast reply.

dlavigne · May 20, 2019

Were you able to resolve this?

forbyd · May 21, 2019

not yet

zperetz · May 21, 2019

Looks like you are trying to replace the "bad" disk with another "new" one, which was already used before in another (or the same) pool.
If so, you need to remove the label of this old pool from your "new" disk (wipe with dd or zpool labelclear) before choosing it as a replacement.

forbyd · May 22, 2019

zperetz said:
Looks like you are trying to replace the "bad" disk with another "new" one, which was already used before in another (or the same) pool.
If so, you need to remove the label of this old pool from your "new" disk (wipe with dd or zpool labelclear) before choosing it as a replacement.

Yes, i trying replace da8 (faulted) with da11 (spare disk)

Need I zpool labelclear in the spare disk?
the command is: zpool labelclear /dev/da11 ?

sorry about my poor knowledge. The guy who ran freenas, left the company.

zperetz · May 22, 2019

forbyd said:
Need I zpool labelclear in the spare disk?
the command is: zpool labelclear /dev/da11 ?
sorry about my poor knowledge. The guy who ran freenas, left the company.

You need to clear all partitions from disk and do labelclear, according to the error you have.
gpart show da11 should not list any unexpected partitions. If there are any, you can do gpart destroy -F da11 to clear the disk and check again.

If you still can't do the replacement, and label error still exists you need to zpool labelclear /dev/da11.

forbyd · May 22, 2019

gpart show da8 and 11 looks like the same.

can I try destroy da11 ?

zperetz · May 22, 2019

forbyd said:
gpart show da8 and 11 looks like the same.
can I try destroy da11 ?

I think no need. Just clear label.

forbyd · May 22, 2019

returned the message "failed to open /dev/da11: Operation not permitted"

zperetz · May 22, 2019

It looks to me that you have to try work with dd here, to look for something like this solution.

forbyd · May 22, 2019

zperetz said:
It looks to me that you have to try work with dd here, to look for something like this solution.

I do not know if I missed anything on the command dd:

zperetz · May 22, 2019

Did you put the disk to OFFLINE status before (via GUI)?

forbyd · May 27, 2019

I can't, dont show the option:

da11: (show "Edit Disk" "Replace" "Remove")

da08: (show "Edit Disk" "Replace")

all the other disks show "Offline" option

ethereal · May 27, 2019

zperetz said:
Did you put the disk to OFFLINE status before (via GUI)?

they said to offline the disc so you can wipe it - you need to offline da11 - because you need to wipe it before you can replace the faulted disc but you are trying to offline da8 the wrong disc.

do you have another computer ? in the past i have used another computer to wipe the discs. then you'll be able to replace da8.
you should be able as it is the spare to just shutdown freenas and remove da11 - it is a spare after all that is not being used

zperetz · May 27, 2019

forbyd said:
I can't, don't show the option:
da11: (show "Edit Disk" "Replace" "Remove")

As it's a spare, you should be able to remove it from the volume without any damage to the pool. I think you can try to apply "Remove" button to da11 and then you will be able to "Replace" da8 with da11 even without wiping. I'm not sure, but you can try.

ethereal · May 27, 2019

zperetz said:
As it's a spare, you should be able to remove it from the volume without any damage to the pool. I think you can try to apply "Remove" button to da11 and then you will be able to "Replace" da8 with da11 even without wiping. I'm not sure, but you can try.

it looked to me he was trying to offline da8 instead of da11 (we know he needs to offline da11 so he can wipe it)

forbyd · May 27, 2019

zperetz said:
As it's a spare, you should be able to remove it from the volume without any damage to the pool. I think you can try to apply "Remove" button to da11 and then you will be able to "Replace" da8 with da11 even without wiping. I'm not sure, but you can try.

yessssssssssssssssssssssssssssssssssssssssssssssssssss
I removed the da11 and try replace da8 again into the da11, and show this message:

Thanks youuuuuuuuuuuuuuu
if all work I will back here to say.

Important Announcement for the TrueNAS Community.

SOLVED Replace HD FreeNAS 9.10.2

Dabbler

Dabbler

dlavigne

Guest

Dabbler

dlavigne

Guest

Dabbler

Dabbler

Dabbler

Dabbler

Dabbler

Dabbler

Dabbler

Dabbler

Dabbler

Dabbler

Dabbler

Guru

Dabbler

Guru

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Replace HD FreeNAS 9.10.2"

Similar threads