Burlumpu Bumpu
Dabbler
- Joined
- Jun 23, 2018
- Messages
- 20
Dear community
It's still a learning process for me, as I shall demonstrate now:
I am running a RAIDZ2 on a DELL R510 with a crossflashed H200. Today, my RAIDZ2-Pool was marked degraded, there were some write errors in one disk.
My plan was (and is) to get a replacement for this disk and resilver the pool (as I have done before). To make sure (now comes the beginner part :) ) there's nothing wrong with the backplane or some cabling, I shut down the server and took a look at the failing harddrive, the backplane and generally my server.
I then restarted and now there's a problem: FreeNas keeps trying to resilver my Pool, all the disks (including the faulty one) are now online. The resilver process works it's way to 1% and starts again from 0%.
What should I do now in order to replace the failed disk? Can I replace a disk while the pool is resilvering? There are no SMART errors in any of the disks.
da7 is the failing drive..:
Thanks and kind regards
Alex
It's still a learning process for me, as I shall demonstrate now:
I am running a RAIDZ2 on a DELL R510 with a crossflashed H200. Today, my RAIDZ2-Pool was marked degraded, there were some write errors in one disk.
My plan was (and is) to get a replacement for this disk and resilver the pool (as I have done before). To make sure (now comes the beginner part :) ) there's nothing wrong with the backplane or some cabling, I shut down the server and took a look at the failing harddrive, the backplane and generally my server.
I then restarted and now there's a problem: FreeNas keeps trying to resilver my Pool, all the disks (including the faulty one) are now online. The resilver process works it's way to 1% and starts again from 0%.
What should I do now in order to replace the failed disk? Can I replace a disk while the pool is resilvering? There are no SMART errors in any of the disks.
Code:
[root@freenas ~]# zpool status
pool: RAIDZ2
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Wed May 1 00:48:46 2019
78.7G scanned at 167M/s, 50.9G issued at 108M/s, 16.4T total
5.96G resilvered, 0.30% done, 1 days 20:11:54 to go
config:
NAME STATE READ WRITE CKSUM
RAIDZ2 ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/fbbfea97-78cb-11e8-8f71-842b2b4f1e06 ONLINE 0 0 0
gptid/ff414dca-78cb-11e8-8f71-842b2b4f1e06 ONLINE 0 0 0
gptid/02bb0a27-78cc-11e8-8f71-842b2b4f1e06 ONLINE 0 0 0
gptid/0645a295-78cc-11e8-8f71-842b2b4f1e06 ONLINE 0 0 0
gptid/7f2b5a7c-f102-11e8-84c2-842b2b4f1e06 ONLINE 0 3667
gptid/0d636c3e-78cc-11e8-8f71-842b2b4f1e06 ONLINE 0 0 0
gptid/10eaa592-78cc-11e8-8f71-842b2b4f1e06 ONLINE 0 0 0
gptid/14762089-78cc-11e8-8f71-842b2b4f1e06 ONLINE 0 0 0
cache
gptid/73396a40-b812-11e8-8f7e-842b2b4f1e06 ONLINE 0 0 0
errors: No known data errors
pool: freenas-boot
state: ONLINE
scan: scrub repaired 0 in 0 days 00:59:13 with 0 errors on Wed Apr 24 04:44:13 2019
config:
NAME STATE READ WRITE CKSUM
freenas-boot ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
da9p2 ONLINE 0 0 0
da8p2 ONLINE 0 0 0
errors: No known data errorsda7 is the failing drive..:
Code:
[root@freenas ~]# smartctl -a /dev/da7
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Vendor: IBM-XIV
Product: ST4000NM0043 C1
Revision: EC5C
Compliance: SPC-4
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Logical block size: 512 bytes
LU is fully provisioned
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Logical Unit id: 0x5000c50083c28b3f
Serial number: Z1Z8TF6D0000R546K5WD
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Wed May 1 00:57:44 2019 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled
=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
Current Drive Temperature: 38 C
Drive Trip Temperature: 65 C
Elements in grown defect list: 0
Vendor (Seagate) cache information
Blocks sent to initiator = 0
Vendor (Seagate/Hitachi) factory information
number of hours powered up = 23000.42
number of minutes until next internal SMART test = 4
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 3928830803 0 0 3928830803 0 600550.787 0
write: 0 0 0 0 0 224974.613 0
verify: 988837325 0 0 988837325 0 6966.842 0
Non-medium error count: 161
SMART Self-test log
Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ]
Description number (hours)
# 1 Background short Completed - 23000 - [- - -]
# 2 Background short Completed - 22975 - [- - -]
# 3 Background short Completed - 22951 - [- - -]
# 4 Background short Completed - 22927 - [- - -]
# 5 Background short Completed - 22903 - [- - -]
# 6 Background short Completed - 22879 - [- - -]
# 7 Background short Completed - 22855 - [- - -]
# 8 Background short Completed - 22831 - [- - -]
# 9 Background short Completed - 22807 - [- - -]
#10 Background short Completed - 147 - [- - -]
#11 Background short Completed - 113 - [- - -]
#12 Background long Completed - 56 - [- - -]
#13 Background short Completed - 50 - [- - -]
#14 Background short Aborted (by user command) - 37 - [- - -]
Long (extended) Self Test duration: 32700 seconds [545.0 minutes]Thanks and kind regards
Alex
Last edited: