pnunn
Dabbler
- Joined
- Jan 31, 2015
- Messages
- 39
Hi Guys, I have one disk in my array that is constantly failing,
I've changed the hdd and it seemed to fix the issue, however, I then start to get SCSI errors again and the disk fails out of the array.
I've re-plugged the cables on the controller and rebooted but the SCSI errors were so fast that it failed to let the machine boot. I pulled the disk and the machine booted through.
The host is a Dell R520 with the HBA changed to one reflashed to be be non-raid.
smart -x /dev/da7 gives
Should I change the HBA? Is it the disk, do I give up and replace the R520?
Peter.
I've changed the hdd and it seemed to fix the issue, however, I then start to get SCSI errors again and the disk fails out of the array.
I've re-plugged the cables on the controller and rebooted but the SCSI errors were so fast that it failed to let the machine boot. I pulled the disk and the machine booted through.
The host is a Dell R520 with the HBA changed to one reflashed to be be non-raid.
smart -x /dev/da7 gives
Warning: settings changed through the CLI are not written to
the configuration database and will be reset on reboot.
root@freenas2[~]# smartctl -x /dev/da7
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p12 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Vendor: HP
Product: EG0600JETKA
Revision: HPD2
Compliance: SPC-4
User Capacity: 600,127,266,816 bytes [600 GB]
Logical block size: 512 bytes
Rotation Rate: 10000 rpm
Form Factor: 2.5 inches
Logical Unit id: 0x50000397281b32e9
Serial number: 76C0A1D7FUYB1628
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Sat Mar 12 19:24:01 2022 AEDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled
Read Cache is: Enabled
Writeback Cache is: Disabled
=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
Current Drive Temperature: 27 C
Drive Trip Temperature: 60 C
Manufactured in week 28 of year 2016
Specified cycle count over device lifetime: 50000
Accumulated start-stop cycles: 89
Specified load-unload count over device lifetime: 600000
Accumulated load-unload cycles: 89
Elements in grown defect list: 0
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 0 1607 0 0 0 1207745.270 0
write: 0 1227 0 0 0 38174.543 0
Non-medium error count: 12409
SMART Self-test log
Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ]
Description number (hours)
# 1 Background long Completed - 39613 - [- - -]
# 2 Background short Completed - 39573 - [- - -]
Long (extended) Self-test duration: 5089 seconds [84.8 minutes]
Background scan results log
Status: scan is active
Accumulated power on time, hours:minutes 39708:51 [2382531 minutes]
Number of background scans performed: 1539, scan progress: 9.91%
Number of background medium scans performed: 0
Protocol Specific port log page for SAS SSP
relative target port id = 1
generation code = 2
number of phys = 1
phy identifier = 0
attached device type: SAS or SATA device
attached reason: loss of dword synchronization
reason: loss of dword synchronization
negotiated logical link rate: phy enabled; 6 Gbps
attached initiator port: ssp=1 stp=1 smp=1
attached target port: ssp=0 stp=0 smp=0
SAS address = 0x50000397281b32ea
attached SAS address = 0x5c81f660e19e1c06
attached phy identifier = 2
Invalid DWORD count = 8
Running disparity error count = 8
Loss of DWORD synchronization = 2
Phy reset problem = 0
Phy event descriptors:
Invalid word count: 8
Running disparity error count: 8
Loss of dword synchronization count: 2
Phy reset problem count: 0
Elasticity buffer overflow count: 0
Received abandon-class OPEN_REJECT count: 0
Transmitted BREAK count: 0
Received BREAK count: 0
Transmitted SSP frame error count: 195
Received SSP frame error count: 0
relative target port id = 2
generation code = 2
number of phys = 1
phy identifier = 1
attached device type: no device attached
attached reason: unknown
reason: unknown
negotiated logical link rate: phy enabled; unknown
attached initiator port: ssp=0 stp=0 smp=0
attached target port: ssp=0 stp=0 smp=0
SAS address = 0x50000397281b32eb
attached SAS address = 0x0
attached phy identifier = 0
Invalid DWORD count = 0
Running disparity error count = 0
Loss of DWORD synchronization = 0
Phy reset problem = 0
Phy event descriptors:
Invalid word count: 0
Running disparity error count: 0
Loss of dword synchronization count: 0
Phy reset problem count: 0
Should I change the HBA? Is it the disk, do I give up and replace the R520?
Peter.