rmccullough
Patron
- Joined
- May 17, 2018
- Messages
- 269
I suspect I have a failed drive, but want to be sure before I "replace" it with a hot backup.
I received an alert the other day:
I read some posts here that also suggested getting the output of smartctl:
I hate to just assume the drive is bad, but it looks like it is. Anything else I should try before I start the replacement process?
Is this the right process to replace the disk: Replacement
I received an alert the other day:
So I logged into shell and ran a couple of commands:* Pool Tank state is DEGRADED: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state.
The following devices are not healthy:
- Disk 14681939649493252335 is FAULTED
Code:
root@freenas[~]# zpool status pool: Tank state: DEGRADED status: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the faulted device, or use 'zpool clear' to mark the device repaired. scan: scrub repaired 2.63M in 04:50:57 with 0 errors on Sat Jan 1 04:51:00 2022 config: NAME STATE READ WRITE CKSUM Tank DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 gptid/3aceec98-a7d1-11e8-b311-0cc47a303600 ONLINE 0 0 0 gptid/850946ad-bd0e-11e8-9a6f-0cc47a303600 ONLINE 0 0 0 gptid/3fe310b6-a7d1-11e8-b311-0cc47a303600 FAULTED 147 0 0 too many errors gptid/40f281b9-a7d1-11e8-b311-0cc47a303600 ONLINE 0 0 0 gptid/447abee3-a7d1-11e8-b311-0cc47a303600 ONLINE 0 0 0 gptid/458f64b0-a7d1-11e8-b311-0cc47a303600 ONLINE 0 0 0 gptid/491ca6b0-a7d1-11e8-b311-0cc47a303600 ONLINE 0 0 0 gptid/4a41f4ea-a7d1-11e8-b311-0cc47a303600 ONLINE 0 0 0 gptid/4dcf9801-a7d1-11e8-b311-0cc47a303600 ONLINE 0 0 0 errors: No known data errors
I read some posts here that also suggested getting the output of smartctl:
Code:
root@freenas[~]# smartctl -a /dev/da1 smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p10 amd64] (local build) Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Vendor: HGST Product: HUS724020ALS641 Revision: MS04 Compliance: SPC-4 User Capacity: 2,000,398,934,016 bytes [2.00 TB] Logical block size: 512 bytes LU is fully provisioned Rotation Rate: 7200 rpm Form Factor: 3.5 inches Logical Unit id: 0x5000cca06d0c350c Serial number: P5G6R3RV Device type: disk Transport protocol: SAS (SPL-3) Local Time is: Sat Jan 1 10:30:55 2022 MST SMART support is: Available - device has SMART capability. SMART support is: Enabled Temperature Warning: Enabled === START OF READ SMART DATA SECTION === SMART Health Status: OK Current Drive Temperature: 23 C Drive Trip Temperature: 55 C Accumulated power on time, hours:minutes 29488:15 Manufactured in week 06 of year 2015 Specified cycle count over device lifetime: 50000 Accumulated start-stop cycles: 26 Specified load-unload count over device lifetime: 600000 Accumulated load-unload cycles: 1255 Elements in grown defect list: 571 Vendor (Seagate Cache) information Blocks sent to initiator = 22796998390317056 Error counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 12348 600 0 12948 4737783 52691.712 4 write: 0 0 0 0 547287 28130.329 0 verify: 0 0 0 0 171204 0.000 0 Non-medium error count: 0 SMART Self-test log Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ] Description number (hours) # 1 Background short Completed - 29382 - [- - -] # 2 Background short Completed - 29214 - [- - -] # 3 Background long Completed - 29050 - [- - -] # 4 Background short Completed - 28878 - [- - -] # 5 Background short Completed - 28662 - [- - -] # 6 Background short Completed - 28494 - [- - -] # 7 Background long Completed - 28330 - [- - -] # 8 Background short Completed - 28157 - [- - -] # 9 Background short Completed - 27917 - [- - -] #10 Background short Completed - 27749 - [- - -] #11 Background long Completed - 27588 - [- - -] #12 Background short Completed - 27412 - [- - -] #13 Background short Completed - 27197 - [- - -] #14 Background short Completed - 27029 - [- - -] #15 Background long Completed - 26865 - [- - -] #16 Background short Completed - 26693 - [- - -] #17 Background short Completed - 26453 - [- - -] #18 Background short Completed - 26285 - [- - -] #19 Background long Completed - 26122 - [- - -] #20 Background short Completed - 25949 - [- - -] Long (extended) Self-test duration: 6 seconds [0.1 minutes]
I hate to just assume the drive is bad, but it looks like it is. Anything else I should try before I start the replacement process?
Is this the right process to replace the disk: Replacement