Checksum Errors on 2 Seagate Ironwolf 4TB Drives

xknight2k10

Dabbler
Joined
Jul 19, 2023
Messages
10
I have 6x 4TB HDD's in my system, the latest two are both 4TB Seagate Ironwolf drives, but both are showing identical checksum errors which have led to my pool showing as degraded and my apps no longer booting.

I have run several SMART tests, but it never shows any errors, as my system is live the checksum errors on both drives keep climbing at the same rate.

I have changed the drives from running on a dedicated SSD card to putting them back on the motherboard SATA ports. I have changed the SATA cables and also went as far as to change the PSU, but the errors are still there.

My machine is as below...
CPU - Intel(R) Core(TM) i5 CPU 650 @ 3.20GHz
Ram - 8gb non ECC
Motherboard - ASUS P7P55D-E
Harddrives with Errors - ST4000VN006-3CW104
 

xknight2k10

Dabbler
Joined
Jul 19, 2023
Messages
10
Screenshot 2023-07-18 081505.png
 

samarium

Contributor
Joined
Apr 8, 2023
Messages
192
Please include details of sata controllers, hbas, and connection topology. Also smartctl -a for each drive, including a good one. Output from lspci is good too, and zpool status and zpool list -vL.
 

xknight2k10

Dabbler
Joined
Jul 19, 2023
Messages
10
For SDA

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.107+truenas] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model: ST4000VN006-3CW104
Serial Number: ###
LU WWN Device Id: 5 000c50 0e6615ec8
Firmware Version: SC60
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Form Factor: 3.5 inches
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-3 T13/2161-D revision 5
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Thu Jul 20 08:21:33 2023 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x73) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 464) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x70bd) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 081 064 006 Pre-fail Always - 119559600
3 Spin_Up_Time 0x0003 096 095 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 85
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 074 060 045 Pre-fail Always - 24967647
9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 1155
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 85
183 Runtime_Bad_Block 0x0032 069 069 000 Old_age Always - 31
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 099 000 Old_age Always - 4295032834
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 078 064 040 Old_age Always - 22 (Min/Max 21/25)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 111
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 133
194 Temperature_Celsius 0x0022 022 040 000 Old_age Always - 22 (0 18 0 0 0)
195 Hardware_ECC_Recovered 0x001a 081 064 000 Old_age Always - 119559600
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 1152 (174 123 0)
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 8061874732
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 29502102352

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 1155 -
# 2 Conveyance offline Completed without error 00% 1144 -
# 3 Short offline Completed without error 00% 1144 -
# 4 Extended offline Interrupted (host reset) 00% 1144 -
# 5 Extended offline Completed without error 00% 1142 -
# 6 Extended offline Interrupted (host reset) 90% 1118 -
# 7 Extended offline Completed without error 00% 1108 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 

xknight2k10

Dabbler
Joined
Jul 19, 2023
Messages
10
For SDC

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.107+truenas] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Seagate Desktop HDD.15
Device Model: ST4000DM000-1F2168
Serial Number: ###
LU WWN Device Id: 5 000c50 069a9e4c7
Firmware Version: CC52
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5900 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Thu Jul 20 08:23:43 2023 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 612) seconds.
Offline data collection
capabilities: (0x73) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 513) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x1085) SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 118 099 006 Pre-fail Always - 176289464
3 Spin_Up_Time 0x0003 092 091 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 094 094 020 Old_age Always - 6182
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 069 060 030 Pre-fail Always - 90389597394
9 Power_On_Hours 0x0032 048 048 000 Old_age Always - 45805
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 943
183 Runtime_Bad_Block 0x0032 001 001 000 Old_age Always - 107
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 1 1 1
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 072 052 045 Old_age Always - 28 (Min/Max 17/32)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 120
193 Load_Cycle_Count 0x0032 097 097 000 Old_age Always - 6264
194 Temperature_Celsius 0x0022 028 048 000 Old_age Always - 28 (0 14 0 0 0)
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 39322h+06m+35.694s
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 35910839445
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 68362015258

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Interrupted (host reset) 00% 45794 -
# 2 Extended offline Completed without error 00% 45790 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 

xknight2k10

Dabbler
Joined
Jul 19, 2023
Messages
10
lspci

00:00.0 Host bridge: Intel Corporation Core Processor DRAM Controller (rev 12)
00:01.0 PCI bridge: Intel Corporation Core Processor PCI Express x16 Root Port (rev 12)
00:1a.0 USB controller: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller (rev 06)
00:1b.0 Audio device: Intel Corporation 5 Series/3400 Series Chipset High Definition Audio (rev 06)
00:1c.0 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 1 (rev 06)
00:1c.1 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 2 (rev 06)
00:1c.2 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 3 (rev 06)
00:1c.3 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 4 (rev 06)
00:1c.4 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 5 (rev 06)
00:1c.5 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 6 (rev 06)
00:1c.6 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 7 (rev 06)
00:1c.7 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 8 (rev 06)
00:1d.0 USB controller: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller (rev 06)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev a6)
00:1f.0 ISA bridge: Intel Corporation P55 Chipset LPC Interface Controller (rev 06)
00:1f.2 IDE interface: Intel Corporation 5 Series/3400 Series Chipset 4 port SATA IDE Controller (rev 06)
00:1f.3 SMBus: Intel Corporation 5 Series/3400 Series Chipset SMBus Controller (rev 06)
00:1f.5 IDE interface: Intel Corporation 5 Series/3400 Series Chipset 2 port SATA IDE Controller (rev 06)
01:00.0 VGA compatible controller: NVIDIA Corporation TU116 [GeForce GTX 1660] (rev a1)
01:00.1 Audio device: NVIDIA Corporation TU116 High Definition Audio Controller (rev a1)
01:00.2 USB controller: NVIDIA Corporation TU116 USB 3.1 Host Controller (rev a1)
01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU116 USB Type-C UCSI Controller (rev a1)
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 03)
03:00.0 SATA controller: JMicron Technology Corp. JMB363 SATA/IDE Controller (rev 03)
03:00.1 IDE interface: JMicron Technology Corp. JMB363 SATA/IDE Controller (rev 03)
04:00.0 IDE interface: Marvell Technology Group Ltd. Device 914d (rev 10)
05:00.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 03)
08:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller (rev 11)
0a:04.0 FireWire (IEEE 1394): VIA Technologies, Inc. VT6306/7/8 [Fire II(M)] IEEE 1394 OHCI Controller (rev c0)
3f:00.0 Host bridge: Intel Corporation Core Processor QuickPath Architecture Generic Non-core Registers (rev 02)
3f:00.1 Host bridge: Intel Corporation Core Processor QuickPath Architecture System Address Decoder (rev 02)
3f:02.0 Host bridge: Intel Corporation Core Processor QPI Link 0 (rev 02)
3f:02.1 Host bridge: Intel Corporation 1st Generation Core i3/5/7 Processor QPI Physical 0 (rev 02)
3f:02.2 Host bridge: Intel Corporation 1st Generation Core i3/5/7 Processor Reserved (rev 02)
3f:02.3 Host bridge: Intel Corporation 1st Generation Core i3/5/7 Processor Reserved (rev 02)



Zpool Status

pool: NAS 4tb
state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: resilvered 7.25M in 00:00:02 with 0 errors on Mon Jul 17 20:22:35 2023
remove: Removal of vdev 2 copied 11.5M in 0h0m, completed on Sat Jun 3 14:28:32 2023
1.22K memory used for removed device mappings
config:

NAME STATE READ WRITE CKSUM
NAS 4tb DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
25d96a6a-8297-4f26-9797-6842598ee77b DEGRADED 0 0 0 too many errors
16ddd527-a1aa-4efb-88cd-cedde5aac775 ONLINE 0 0 0
mirror-1 DEGRADED 0 0 0
f51f935a-11aa-4042-bf0e-42a97aafe895 DEGRADED 0 0 108K too many errors
1f668bcf-a140-437e-a20d-f0791a773c3c DEGRADED 0 0 108K too many errors
mirror-3 DEGRADED 0 0 0
92e67055-b812-4f32-87b1-d1068c4eb163 DEGRADED 0 0 0 too many errors
a596351c-321c-48d7-955f-d0cb1fce7715 ONLINE 0 0 0

errors: 6 data errors, use '-v' for a list

pool: boot-pool
state: ONLINE
status: Some supported and requested features are not enabled on the pool.
The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: scrub repaired 0B in 00:01:57 with 0 errors on Mon Jul 17 03:46:59 2023
config:

NAME STATE READ WRITE CKSUM
boot-pool ONLINE 0 0 0
sdb3 ONLINE 0 0 0

errors: No known data errors



Zpool list -vL

NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
NAS 4tb 10.9T 7.38T 3.49T - - 9% 67% 1.00x DEGRADED /mnt
mirror-0 3.62T 3.52T 104G - - 27% 97.2% - DEGRADED
sdg2 3.64T - - - - - - - DEGRADED
sde2 3.64T - - - - - - - ONLINE
mirror-1 3.62T 2.76T 889G - - 1% 76.0% - DEGRADED
sdd2 3.64T - - - - - - - DEGRADED
sda2 3.64T - - - - - - - DEGRADED
indirect-2 - - - - - - - - ONLINE
mirror-3 3.62T 1.10T 2.52T - - 0% 30.5% - DEGRADED
sdc2 3.64T - - - - - - - DEGRADED
sdf2 3.64T - - - - - - - ONLINE
boot-pool 220G 18.3G 202G - - 2% 8% 1.00x ONLINE -
sdb3 222G 18.3G 202G - - 2% 8.32% - ONLINE
 

xknight2k10

Dabbler
Joined
Jul 19, 2023
Messages
10
Thanks in advance for any help and support on this.

Both drives are under warranty, but since it's happening to two identical drives I don't feel its a hardware issue, but any help on this to confirm and fix this will be so much appreciated.
 

samarium

Contributor
Joined
Apr 8, 2023
Messages
192
Please enclose blocks of text with code blocks, it makes them significant;y easier to read

[ code]
text
[ /code]

But remove the space after the [

You pool looks ugly. Not sure what removal of vdev 2 means, never seen it before. Maybe a the removed vdev is a new feature I haven't used yet, so I'm not sure of the implications.

Looking at the lspci output you have JMicron and Marvell controllers, not something I would use with important data, cheap, yes, but for a reason, and it looks like they crap out under load. So cheap might not be quite so good a deal, we will see if the pool survives. The intel isn't doing too good either.

Check the resources section for better suggestions, and you might have to do that sooner than later. I wouldn't trust the existing disks on the existing controllers, and would want to move all the zfs pool disks even the disks on the intel controller, but leave the boot disk.

Something like a LSI 2008 2308 or 3008 would be relatively cheap second hand on ebay, and they are often pre-flahshed with IT mode software for ZFS. Assuming you are in the US, some of the US based people might be able to suggest a reputable vendor.

You need to check all your disk models to ensure they are CMR. If they are SMR then that is know to cause problems with ZFS, and really slow performance.

Look at the SMART provided, 183 runtime bad blocks is of some concern, but I'm not familar with it. The attributes I normally look for are OK to my eyes. No error codes are logged, which is good. sdc has a high load cycle count, not great for disks, but 45000 hours runtime. You didn't provide other SMART data so I don't know about them.

I'm not sure what to do about the pool apart from move the disks to a better controller and see what happens then. You already have data errors.

Maybe some other eyes on the will add something more.
 
Last edited:

xknight2k10

Dabbler
Joined
Jul 19, 2023
Messages
10
For SDA

[CODE
Device Model: ST4000VN006-3CW104
Serial Number: ###
LU WWN Device Id: 5 000c50 0e6615ec8
Firmware Version: SC60
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Form Factor: 3.5 inches
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-3 T13/2161-D revision 5
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Thu Jul 20 08:21:33 2023 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run.
Total time to complete Offline data collection: ( 0) seconds.
Offline data collection capabilities: (0x73) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine recommended polling time: ( 1) minutes.
Extended self-test routine recommended polling time: ( 464) minutes.
Conveyance self-test routine recommended polling time: ( 2) minutes.
SCT capabilities: (0x70bd) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 081 064 006 Pre-fail Always - 119559600
3 Spin_Up_Time 0x0003 096 095 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 85
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 074 060 045 Pre-fail Always - 24967647
9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 1155
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 85
183 Runtime_Bad_Block 0x0032 069 069 000 Old_age Always - 31
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 099 000 Old_age Always - 4295032834
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 078 064 040 Old_age Always - 22 (Min/Max 21/25)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 111
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 133
194 Temperature_Celsius 0x0022 022 040 000 Old_age Always - 22 (0 18 0 0 0)
195 Hardware_ECC_Recovered 0x001a 081 064 000 Old_age Always - 119559600
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 1152 (174 123 0)
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 8061874732
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 29502102352

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 1155 -
# 2 Conveyance offline Completed without error 00% 1144 -
# 3 Short offline Completed without error 00% 1144 -
# 4 Extended offline Interrupted (host reset) 00% 1144 -
# 5 Extended offline Completed without error 00% 1142 -
# 6 Extended offline Interrupted (host reset) 90% 1118 -
# 7 Extended offline Completed without error 00% 1108 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.[/CODE]


For SDC

Code:
Model Family: Seagate Desktop HDD.15
Device Model: ST4000DM000-1F2168
Serial Number: ###
LU WWN Device Id: 5 000c50 069a9e4c7
Firmware Version: CC52
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5900 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T
13/1699-
D revision 4
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Thu Jul 20 08:23:43 2023 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever been run.
Total time to complete Offline data collection: ( 612) seconds.
Offline data collection apabilities: (0x73) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine recommended polling time: ( 1) minutes.
Extended self-test routine recommended polling time: ( 513) minutes.
Conveyance self-test routine recommended polling time: ( 2) minutes.
SCT capabilities: (0x1085) SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 118 099 006 Pre-fail Always - 176289464
3 Spin_Up_Time 0x0003 092 091 000 Pre-fail Always - 0
4 Start_Stop_Count 0x
0032   094   094   020   
Old_age Always - 6182
5 Reallocated_Sector_Ct 0x
0033   100   100   010   
Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 069 060 030 Pre-fail Always - 90389597394
9 Power_On_Hours 0x0032 048 048 000 Old_age Always - 45805
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 943
183 Runtime_Bad_Block 0x0032 001 001 000 Old_age Always - 107
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 1 1 1
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 072 052 045 Old_age Always - 28 (Min/Max 17/32)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 120
193 Load_Cycle_Count 0x0032 097 097 000 Old_age Always - 6264
194 Temperature_Celsius 0x0022 028 048 000 Old_age Always - 28 (0 14 0 0 0)
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200   200   000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 39322h+06m+35.694s
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 35910839445
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 68362015258

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Interrupted (host reset) 00% 45794 -
# 2 Extended offline Completed without error 00% 45790 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.



lspci

Code:
00:00.0 Host bridge: Intel Corporation Core Processor DRAM Controller (rev 12)
00:01.0 PCI bridge: Intel Corporation Core Processor PCI Express x16 Root Port (rev 12)
00:1a.0 USB controller: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller (rev 06)
00:1b.0 Audio device: Intel Corporation 5 Series/3400 Series Chipset High Definition Audio (rev 06)
00:1c.0 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 1 (rev 06)
00:1c.1 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 2 (rev 06)
00:1c.2 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 3 (rev 06)
00:1c.3 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 4 (rev 06)
00:1c.4 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 5 (rev 06)
00:1c.5 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 6 (rev 06)
00:1c.6 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 7 (rev 06)
00:1c.7 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 8 (rev 06)
00:1d.0 USB controller: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller (rev 06)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev a6)
00:1f.0 ISA bridge: Intel Corporation P55 Chipset LPC Interface Controller (rev 06)
00:1f.2 IDE interface: Intel Corporation 5 Series/3400 Series Chipset 4 port SATA IDE Controller (rev 06)
00:1f.3 SMBus: Intel Corporation 5 Series/3400 Series Chipset SMBus Controller (rev 06)
00:1f.5 IDE interface: Intel Corporation 5 Series/3400 Series Chipset 2 port SATA IDE Controller (rev 06)
01:00.0 VGA compatible controller: NVIDIA Corporation TU116 [GeForce GTX 1660] (rev a1)
01:00.1 Audio device: NVIDIA Corporation TU116 High Definition Audio Controller (rev a1)
01:00.2 USB controller: NVIDIA Corporation TU116 USB 3.1 Host Controller (rev a1)
01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU116 USB Type-C UCSI Controller (rev a1)
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 03)
03:00.0 SATA controller: JMicron Technology Corp. JMB363 SATA/IDE Controller (rev 03)
03:00.1 IDE interface: JMicron Technology Corp. JMB363 SATA/IDE Controller (rev 03)
04:00.0 IDE interface: Marvell Technology Group Ltd. Device 914d (rev 10)
05:00.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 03)
08:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller (rev 11)
0a:04.0 FireWire (IEEE 1394): VIA Technologies, Inc. VT6306/7/8 [Fire II(M)] IEEE 1394 OHCI Controller (rev c0)
3f:00.0 Host bridge: Intel Corporation Core Processor QuickPath Architecture Generic Non-core Registers (rev 02)
3f:00.1 Host bridge: Intel Corporation Core Processor QuickPath Architecture System Address Decoder (rev 02)
3f:02.0 Host bridge: Intel Corporation Core Processor QPI Link 0 (rev 02)
3f:02.1 Host bridge: Intel Corporation 1st Generation Core i3/5/7 Processor QPI Physical 0 (rev 02)
3f:02.2 Host bridge: Intel Corporation 1st Generation Core i3/5/7 Processor Reserved (rev 02)
3f:02.3 Host bridge: Intel Corporation 1st Generation Core i3/5/7 Processor Reserved (rev 02)



Zpool Status

Code:
pool: NAS 4tb
state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: resilvered 7.25M in 00:00:02 with 0 errors on Mon Jul 17 20:22:35 2023
remove: Removal of vdev 2 copied 11.5M in 0h0m, completed on Sat Jun 3 14:28:32 2023
1.22K memory used for removed device mappings
config:

NAME STATE READ WRITE CKSUM
NAS 4tb DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
25d96a6a-8297-4f26-9797-6842598ee77b DEGRADED 0 0 0 too many errors
16ddd527-a1aa-4efb-88cd-cedde5aac775 ONLINE 0 0 0
mirror-1 DEGRADED 0 0 0
f51f935a-11aa-4042-bf0e-42a97aafe895 DEGRADED 0 0 108K too many errors
1f668bcf-a140-437e-a20d-f0791a773c3c DEGRADED 0 0 108K too many errors
mirror-3 DEGRADED 0 0 0
92e67055-b812-4f32-87b1-d1068c4eb163 DEGRADED 0 0 0 too many errors
a596351c-321c-48d7-955f-d0cb1fce7715 ONLINE 0 0 0

errors: 6 data errors, use '-v' for a list

pool: boot-pool
state: ONLINE
status: Some supported and requested features are not enabled on the pool.
The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: scrub repaired 0B in 00:01:57 with 0 errors on Mon Jul 17 03:46:59 2023
config:

NAME STATE READ WRITE CKSUM
boot-pool ONLINE 0 0 0
sdb3 ONLINE 0 0 0

errors: No known data errors



Zpool list -vL

Code:
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
NAS 4tb 10.9T 7.38T 3.49T - - 9% 67% 1.00x DEGRADED /mnt
mirror-0 3.62T 3.52T 104G - - 27% 97.2% - DEGRADED
sdg2 3.64T - - - - - - - DEGRADED
sde2 3.64T - - - - - - - ONLINE
mirror-1 3.62T 2.76T 889G - - 1% 76.0% - DEGRADED
sdd2 3.64T - - - - - - - DEGRADED
sda2 3.64T - - - - - - - DEGRADED
indirect-2 - - - - - - - - ONLINE
mirror-3 3.62T 1.10T 2.52T - - 0% 30.5% - DEGRADED
sdc2 3.64T - - - - - - - DEGRADED
sdf2 3.64T - - - - - - - ONLINE
boot-pool 220G 18.3G 202G - - 2% 8% 1.00x ONLINE -
sdb3 222G 18.3G 202G - - 2% 8.32% - ONLINE
 

xknight2k10

Dabbler
Joined
Jul 19, 2023
Messages
10
Please enclose blocks of text with code blocks, it makes them significant;y easier to read

[ code]
text
[ /code]

But remove the space after the [

You pool looks ugly. Not sure what removal of vdev 2 means, never seen it before. Maybe a the removed vdev is a new feature I haven't used yet, so I'm not sure of the implications.

Looking at the lspci output you have JMicron and Marvell controllers, not something I would use with important data, cheap, yes, but for a reason, and it looks like they crap out under load. So cheap might not be quite so good a deal, we will see if the pool survives. The intel isn't doing too good either.

Check the resources section for better suggestions, and you might have to do that sooner than later. I wouldn't trust the existing disks on the existing controllers, and would want to move all the zfs pool disks even the disks on the intel controller, but leave the boot disk.

Something like a LSI 2008 2308 or 3008 would be relatively cheap second hand on ebay, and they are often pre-flahshed with IT mode software for ZFS. Assuming you are in the US, some of the US based people might be able to suggest a reputable vendor.

You need to check all your disk models to ensure they are CMR. If they are SMR then that is know to cause problems with ZFS, and really slow performance.

Look at the SMART provided, 183 runtime bad blocks is of some concern, but I'm not familar with it. The attributes I normally look for are OK to my eyes. No error codes are logged, which is good. sdc has a high load cycle count, not great for disks, but 45000 hours runtime. You didn't provide other SMART data so I don't know about them.

I'm not sure what to do about the pool apart from move the disks to a better controller and see what happens then. You already have data errors.

Maybe some other eyes on the will add something more.
Thank you for your advice. I'm totally new to home servers and a bit of a noob at this. This is an old PC that I had to use as a replacement for a NAS drive that I had before.

I will purchase an LSI 2008 2308 or 3008 and see how that goes. I am in the UK but I am seeing them on ebay pretty cheap.

I'm sure I will be back asking for more advice, but thank you so much for the help.

Also sorry for the double posting and not using the code bit on the posts.
 

somethingweird

Contributor
Joined
Jan 27, 2022
Messages
183
SDC - ST4000DM000, that worries me - is it SMR or CMR?
 

xknight2k10

Dabbler
Joined
Jul 19, 2023
Messages
10
SDC - ST4000DM000, that worries me - is it SMR or CMR?
The two drives giving the issues are Seagate Ironwolf drives that I bought especially for the server as they stated they were for NAS usage. I don't know if they are SMR or CMR but I will check.

The other 4 drives were 4 TB drives that I already had from another standard PC so they are not server-specific, but they are not giving issues, just the suppositive NAS special drives.
 

xknight2k10

Dabbler
Joined
Jul 19, 2023
Messages
10
I have a Dell H310 SAS2008 LSI 9211-8i HBA - P20 IT Mode - ZFS on order from eBay so hopefully this will help.
 
Top