Hi everyone
I just bough a HP DL380 G8 with 8x2TB SAS Drivers.
I was able to install and configure FreeNAS-11.3-U2 and create on RaidZ2 pool using all 8 Drivers.
In addition, I have added one M.2 NVME driver for caching and one SSD for logging.
I wanted to test the hot-swap capability and potential drive replacement procedure of the server so I have removed one of the drives, booted from another OS and formatted that driver. I then booted back to FreeNas and re-added that disk. The system started a replacement and re-silvering process started.
I know that this should have taken a few hours but I have noticed that it take too long and upon looking at the server log, I noticed that the re-silvering process was restarting after the process got to ~4%.
Looking at the server output, I see the following errors:
The interesting part here is that da5 is not even the drive i removed (it was da7)
Here is the output of the smart test for da5:
Here is the same output for da7:
I tried replacing the bays to make sure it is not a cable problem but I wonder is there is a disk problem here that I need to consider.
Here are more details on the system:
Itamar
I just bough a HP DL380 G8 with 8x2TB SAS Drivers.
I was able to install and configure FreeNAS-11.3-U2 and create on RaidZ2 pool using all 8 Drivers.
In addition, I have added one M.2 NVME driver for caching and one SSD for logging.
I wanted to test the hot-swap capability and potential drive replacement procedure of the server so I have removed one of the drives, booted from another OS and formatted that driver. I then booted back to FreeNas and re-added that disk. The system started a replacement and re-silvering process started.
Code:
pool: FR4G
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Tue Apr 21 09:33:29 2020
7.07T scanned at 3.22G/s, 256G issued at 261M/s, 7.07T total
8.59G resilvered, 3.53% done, 0 days 07:36:58 to go
config:
NAME STATE READ WRITE CKSUM
FR4G DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
gptid/1f782fce-8198-11ea-b536-2c44fd830388 ONLINE 0 0 0
gptid/20024e2c-8198-11ea-b536-2c44fd830388 ONLINE 0 0 0
gptid/20c6721c-8198-11ea-b536-2c44fd830388 ONLINE 0 0 0
replacing-3 DEGRADED 0 0 0
5134065479397326922 UNAVAIL 0 0 0 was /dev/gptid/20e82911-8198-11ea-b536-2c44fd830388
gptid/bebdfb0e-8266-11ea-899c-2c44fd830388 ONLINE 0 0 0
gptid/210337fe-8198-11ea-b536-2c44fd830388 ONLINE 0 0 12
gptid/212dd557-8198-11ea-b536-2c44fd830388 ONLINE 0 0 0
gptid/21487bc4-8198-11ea-b536-2c44fd830388 ONLINE 0 0 0
gptid/2115e086-8198-11ea-b536-2c44fd830388 ONLINE 0 0 13
logs
gptid/685d7224-8345-11ea-9ec5-2c44fd830388 ONLINE 0 0 0
cache
gptid/68c9e10b-8345-11ea-9ec5-2c44fd830388 ONLINE 0 0 0
errors: No known data errors
pool: freenas-boot
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
freenas-boot ONLINE 0 0 0
da8p2 ONLINE 0 0 0
errors: No known data errors
I know that this should have taken a few hours but I have noticed that it take too long and upon looking at the server log, I noticed that the re-silvering process was restarting after the process got to ~4%.
Looking at the server output, I see the following errors:
Code:
Apr 21 09:36:27 fr4g (da5:ciss0:32:5:0): Command Specific Info: 0x11181200 Apr 21 09:36:27 fr4g (da5:ciss0:32:5:0): Actual Retry Count: 4 Apr 21 09:36:28 fr4g (da5:ciss0:32:5:0): READ(10). CDB: 28 00 02 5c a0 60 00 01 00 00 Apr 21 09:36:28 fr4g (da5:ciss0:32:5:0): CAM status: SCSI Status Error Apr 21 09:36:28 fr4g (da5:ciss0:32:5:0): SCSI status: Check Condition Apr 21 09:36:28 fr4g (da5:ciss0:32:5:0): SCSI sense: RECOVERED ERROR asc:18,5 (Recovered data - recommend reassignment) Apr 21 09:36:28 fr4g (da5:ciss0:32:5:0): Info: 0x25ca090 Apr 21 09:36:28 fr4g (da5:ciss0:32:5:0): Field Replaceable Unit: 1 Apr 21 09:36:28 fr4g (da5:ciss0:32:5:0): Command Specific Info: 0x11040400 Apr 21 09:36:28 fr4g (da5:ciss0:32:5:0): Actual Retry Count: 7
The interesting part here is that da5 is not even the drive i removed (it was da7)
Here is the output of the smart test for da5:
Code:
smartctl -a /dev/da5
smartctl 7.0 2018-12-30 r4883 [FreeBSD 11.3-RELEASE-p7 amd64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Vendor: SEAGATE
Product: ST2000NM0001
Revision: 0001
Compliance: SPC-4
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Logical block size: 512 bytes
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Logical Unit id: 0x5000c5004129ea67
Serial number: Z1P1KMH500009232N4PY
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Tue Apr 21 09:51:09 2020 EDT
SMART support is: Unavailable - device lacks SMART capability.
=== START OF READ SMART DATA SECTION ===
Current Drive Temperature: 32 C
Drive Trip Temperature: 68 C
Manufactured in week 09 of year 2012
Specified cycle count over device lifetime: 10000
Accumulated start-stop cycles: 49
Specified load-unload count over device lifetime: 300000
Accumulated load-unload cycles: 49
Elements in grown defect list: 3539
Vendor (Seagate Cache) information
Blocks sent to initiator = 873378287
Blocks received from initiator = 638587946
Blocks read from cache and sent to initiator = 37263257
Number of read and write commands whose size <= segment size = 633436
Number of read and write commands whose size > segment size = 0
Vendor (Seagate/Hitachi) factory information
number of hours powered up = 64723.22
number of minutes until next internal SMART test = 41
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 1542301948 605 0 1542302553 711 447.178 106
write: 0 0 0 0 0 328.511 0
verify: 554223 145 0 554368 190 0.000 45
Non-medium error count: 1
SMART Self-test log
Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ]
Description number (hours)
# 1 Background short Completed - 64713 - [- - -]
# 2 Background short Completed - 64701 - [- - -]
Here is the same output for da7:
Code:
smartctl -a /dev/da7
smartctl 7.0 2018-12-30 r4883 [FreeBSD 11.3-RELEASE-p7 amd64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Vendor: SEAGATE
Product: ST2000NM0001
Revision: 0002
Compliance: SPC-4
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Logical block size: 512 bytes
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Logical Unit id: 0x5000c500573e7883
Serial number: Z1P66CMH0000940661R9
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Tue Apr 21 09:53:32 2020 EDT
SMART support is: Unavailable - device lacks SMART capability.
=== START OF READ SMART DATA SECTION ===
Current Drive Temperature: 30 C
Drive Trip Temperature: 68 C
Manufactured in week 35 of year 2013
Specified cycle count over device lifetime: 10000
Accumulated start-stop cycles: 54
Specified load-unload count over device lifetime: 300000
Accumulated load-unload cycles: 54
Elements in grown defect list: 0
Vendor (Seagate Cache) information
Blocks sent to initiator = 462698561
Blocks received from initiator = 1051950447
Blocks read from cache and sent to initiator = 3971534
Number of read and write commands whose size <= segment size = 1939050
Number of read and write commands whose size > segment size = 0
Vendor (Seagate/Hitachi) factory information
number of hours powered up = 50599.40
number of minutes until next internal SMART test = 33
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 860180076 0 0 860180076 0 236.902 0
write: 0 0 0 0 0 538.893 0
Non-medium error count: 2
SMART Self-test log
Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ]
Description number (hours)
# 1 Background short Completed - 50589 - [- - -]
# 2 Background short Completed - 50577 - [- - -]
# 3 Background short Aborted (by user command) - 50577 - [- - -]
I tried replacing the bays to make sure it is not a cable problem but I wonder is there is a disk problem here that I need to consider.
Here are more details on the system:
- HP Proliant 665553-B21 DL380p Gen8
- 2 x Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
- 128GB RAM
- 8 x Segate ST2000NM0001 2TB SAS 3.5inc drivers
- 1 x Samsung SSD 850 EVO 500GB for log drive
- 1 x PC401 NVMe SK hynix 512GB for cache
Itamar
Last edited: