Device is causing slow I/O on pool, couple questions

adamzwakk

Cadet
Joined
Apr 17, 2023
Messages
1
My pool started getting the dreaded
Device /dev/gptid/2e598723-7f38-11ed-89d7-6805ca23d2cc is causing slow I/O on pool Vasquez.
on the same drive a couple times the past few days doing some zfs sends/other intentional possible high I/O on my pool. It's a Seagate Ironwolf (ST8000VN004) so it SHOULDNT be SMR, almost all other drives in there are WD Reds and have no messages. Out of 12, its one of 2 Ironwolf drives in there, and they are the only 7200 RPM drives.

The short test shows that Seek_Error_Rate is 1 thanks to https://yksi.ml/#0x000113323225. Running an extended SMART test on both Ironwolf drives now out of paranoia :oops:

Ironically this is happening just after I built a backup machine and replicating data across (reasons for high I/O), but my questions are:
  1. Does this mean I should be replacing the drive in question regadless? I've done most of my homework and figured out the the serial number and everything already, if the extended test shows a 'worse' result then I'll just do it anyways
  2. Would the difference in RPM produce that kind of result? All other drives are ~5400 RPM, so maybe the drive was complaining about not going at speed during high I/O? (but then I'd assume the other Ironwolf would complain too...)
Thanks in advance for any insight!
 

unholyeyebrows

Explorer
Joined
Apr 19, 2012
Messages
55
Hi. I don’t have an answer but am struggling with the same issue. I have two new built TrueNAS Scale servers each with 4 disk pools. All disks CMR. One server RAIDZ2 and the other RAIDZ.

I am trying to backup the pool from the Z2 to the Z and have tried both ZFS replication and rsync. Both technically work but the performance is very poor (about 100Gb per day on a 1Gbps LAN connection.

I find the initial transfer starts quickly then after about an hour the transfer drops and then I start getting intermittent slow I/O errors sent to my email. These can be on any of the 4 disks and the errors also clear themselves after time before coming back again.

At this rate it will take about a month to backup my 10Tb pool so I really need a solution. Does anyone know some good tips to diagnose this?

Thanks.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
It's a Seagate Ironwolf (ST8000VN004) so it SHOULDNT be SMR,
I can confirm that's a CMR disk.

but my questions are:
  1. Does this mean I should be replacing the drive in question regadless? I've done most of my homework and figured out the the serial number and everything already, if the extended test shows a 'worse' result then I'll just do it anyways
  2. Would the difference in RPM produce that kind of result? All other drives are ~5400 RPM, so maybe the drive was complaining about not going at speed during high I/O? (but then I'd assume the other Ironwolf would complain too...)
Thanks in advance for any insight!
  1. No, posting the smart long test would be a step into troubleshooting the issue. You should also post your complete system specs in order for us to understand what's going on; I suspect a non-grinch-complaint HBA.
  2. No difference whatsoever.

Hi. I don’t have an answer but am struggling with the same issue. I have two new built TrueNAS Scale servers each with 4 disk pools. All disks CMR. One server RAIDZ2 and the other RAIDZ.

I am trying to backup the pool from the Z2 to the Z and have tried both ZFS replication and rsync. Both technically work but the performance is very poor (about 100Gb per day on a 1Gbps LAN connection.

I find the initial transfer starts quickly then after about an hour the transfer drops and then I start getting intermittent slow I/O errors sent to my email. These can be on any of the 4 disks and the errors also clear themselves after time before coming back again.

At this rate it will take about a month to backup my 10Tb pool so I really need a solution. Does anyone know some good tips to diagnose this?

Thanks.
Please post your complete hardware specs, it smells like a realtek NIC or non-grinch-complaint HBA.
 

unholyeyebrows

Explorer
Joined
Apr 19, 2012
Messages
55
I can confirm that's a CMR disk.


  1. No, posting the smart long test would be a step into troubleshooting the issue. You should also post your complete system specs in order for us to understand what's going on; I suspect a non-grinch-complaint HBA.
  2. No difference whatsoever.


Please post your complete hardware specs, it smells like a realtek NIC or non-grinch-complaint HBA.
Hey thanks for the advice. I’ll send my full specs when I’m back home. The server reporting slow I/O is a HP Microserver gen8 with 16gb ECC RAM and a core i3. As for the rest I’ll report back later. Thanks again.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222

unholyeyebrows

Explorer
Joined
Apr 19, 2012
Messages
55
Hi again.

Right, details are:

Source server:
Motherboard: Gigabyte H610I with 32Gb non-ECC RAM (conscious this is consumer spec - this is a home server).
NIC (from lspci): 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (17) I219-V (rev 11)

Destination (which is complaining about the slow I/O):
HP Microserver Gen 8 with 16Gb ECC RAM
NIC (from lspci): 03:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe

Unfortunately I cannot check the HBA as the command camcontrol devlist command doesn't work on TrueNAS Scale. Any other ideas on how I can check this?

Thanks
 

unholyeyebrows

Explorer
Joined
Apr 19, 2012
Messages
55
Full lspci for both devices if this helps:

Code:
SOURCE: lspci
00:00.0 Host bridge: Intel Corporation Device 4610 (rev 05)
00:01.0 PCI bridge: Intel Corporation Device 460d (rev 05)
00:02.0 VGA compatible controller: Intel Corporation Device 4693 (rev 0c)
00:14.0 USB controller: Intel Corporation Device 7ae0 (rev 11)
00:14.2 RAM memory: Intel Corporation Device 7aa7 (rev 11)
00:15.0 Serial bus controller [0c80]: Intel Corporation Device 7acc (rev 11)
00:15.1 Serial bus controller [0c80]: Intel Corporation Device 7acd (rev 11)
00:15.2 Serial bus controller [0c80]: Intel Corporation Device 7ace (rev 11)
00:15.3 Serial bus controller [0c80]: Intel Corporation Device 7acf (rev 11)
00:16.0 Communication controller: Intel Corporation Device 7ae8 (rev 11)
00:17.0 SATA controller: Intel Corporation Device 7ae2 (rev 11)
00:19.0 Serial bus controller [0c80]: Intel Corporation Device 7afc (rev 11)
00:19.1 Serial bus controller [0c80]: Intel Corporation Device 7afd (rev 11)
00:1c.0 PCI bridge: Intel Corporation Device 7ab8 (rev 11)
00:1c.4 PCI bridge: Intel Corporation Device 7abc (rev 11)
00:1f.0 ISA bridge: Intel Corporation Device 7a87 (rev 11)
00:1f.3 Audio device: Intel Corporation Device 7ad0 (rev 11)
00:1f.4 SMBus: Intel Corporation Device 7aa3 (rev 11)
00:1f.5 Serial bus controller [0c80]: Intel Corporation Device 7aa4 (rev 11)
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (17) I219-V (rev 11)
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon RX 460/560D / Pro 450/455/460/555/555X/560/560X] (rev e5)
01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Baffin HDMI/DP Audio [Radeon RX 550 640SP / RX 560/560X]
03:00.0 Non-Volatile memory controller: ADATA Technology Co., Ltd. XPG SX8200 Pro PCIe Gen3x4 M.2 2280 Solid State Drive (rev 03)

DESTINATION: lspci
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor DRAM Controller (rev 09)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port (rev 09)
00:06.0 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port (rev 09)
00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 05)
00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5)
00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 5 (rev b5)
00:1c.6 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 7 (rev b5)
00:1c.7 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 8 (rev b5)
00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 05)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev a5)
00:1f.0 ISA bridge: Intel Corporation C204 Chipset LPC Controller (rev 05)
00:1f.2 IDE interface: Intel Corporation 6 Series/C200 Series Chipset Family Desktop SATA Controller (IDE mode, ports 0-3) (rev 05)
00:1f.5 IDE interface: Intel Corporation 6 Series/C200 Series Chipset Family Desktop SATA Controller (IDE mode, ports 4-5) (rev 05)
01:00.0 System peripheral: Hewlett-Packard Company Integrated Lights-Out Standard Slave Instrumentation & System Support (rev 05)
01:00.1 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200EH
01:00.2 System peripheral: Hewlett-Packard Company Integrated Lights-Out Standard Management Processor Support and Messaging (rev 05)
01:00.4 USB controller: Hewlett-Packard Company Integrated Lights-Out Standard Virtual USB Controller (rev 02)
03:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe
03:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe
04:00.0 USB controller: Renesas Technology Corp. uPD720201 USB 3.0 Host Controller (rev 03)
 

unholyeyebrows

Explorer
Joined
Apr 19, 2012
Messages
55
And here are the disks:

Code:
Source:
All Seagate Ironwolf 6Tb:
ST6000VN001-2BB186
ST6000VN001-2BB186
ST6000VN001-2BB186
ST6000VN001-2BB186

Destination:
All WD Red 4TB:
WDC_WD40EFRX-68N32N0
WDC_WD40EFRX-68N32N0
WDC_WD40EFRX-68N32N0
WDC_WD40EFRX-68N32N0
 

unholyeyebrows

Explorer
Joined
Apr 19, 2012
Messages
55
Regarding HBA, I have successfully run smartctl on both servers (which I believe suggests I'm OK):

N.B. I am running a long SMART test on both servers overnight....

Code:
SOURCE:
sudo smartctl /dev/sda -a

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.79+truenas] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate IronWolf
Device Model:     ST6000VN001-2BB186
Serial Number:    ZR13CWY8
LU WWN Device Id: 5 000c50 0e429dd76
Firmware Version: SC60
User Capacity:    6,001,175,126,016 bytes [6.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5425 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed May 24 21:21:24 2023 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 249) Self-test routine in progress...
                                        90% of test remaining.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x73) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 698) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x70bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   081   064   006    Pre-fail  Always       -       136810982
  3 Spin_Up_Time            0x0003   093   092   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       61
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   086   060   045    Pre-fail  Always       -       419880121
  9 Power_On_Hours          0x0032   089   089   000    Old_age   Always       -       9819
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       61
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   061   055   040    Old_age   Always       -       39 (Min/Max 33/39)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       3
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       87
193 Load_Cycle_Count        0x0032   098   098   000    Old_age   Always       -       5981
194 Temperature_Celsius     0x0022   039   045   000    Old_age   Always       -       39 (0 19 0 0 0)
195 Hardware_ECC_Recovered  0x001a   081   064   000    Old_age   Always       -       136810982
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       9611h+20m+58.595s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       29912752927
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       177895291043

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Self-test routine in progress 90%      9819         -
# 2  Short offline       Completed without error       00%      9798         -
# 3  Short offline       Completed without error       00%      9774         -
# 4  Short offline       Completed without error       00%      9750         -
# 5  Short offline       Completed without error       00%      9726         -
# 6  Short offline       Completed without error       00%      9702         -
# 7  Short offline       Completed without error       00%      9679         -
# 8  Short offline       Completed without error       00%      9655         -
# 9  Short offline       Completed without error       00%      9631         -
#10  Short offline       Completed without error       00%      9607         -
#11  Short offline       Completed without error       00%      9583         -
#12  Short offline       Completed without error       00%      9559         -
#13  Short offline       Completed without error       00%      9539         -
#14  Short offline       Completed without error       00%      9515         -
#15  Short offline       Completed without error       00%      9491         -
#16  Short offline       Completed without error       00%      9467         -
#17  Extended offline    Completed without error       00%      9459         -
#18  Short offline       Completed without error       00%      9419         -
#19  Short offline       Completed without error       00%      9395         -
#20  Short offline       Completed without error       00%      9371         -
#21  Short offline       Completed without error       00%      9347         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


DESTINATION:
sudo smartctl /dev/sda -a
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.79+truenas] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD40EFRX-68N32N0
Serial Number:    WD-WCC7K5AFUAT3
LU WWN Device Id: 5 0014ee 2657ca49a
Firmware Version: 82.00A82
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed May 24 21:20:28 2023 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 249) Self-test routine in progress...
                                        90% of test remaining.
Total time to complete Offline
data collection:                (42720) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 453) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x303d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   206   158   021    Pre-fail  Always       -       4700
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       146
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   090   090   000    Old_age   Always       -       7555
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       37
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       21
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       248
194 Temperature_Celsius     0x0022   117   109   000    Old_age   Always       -       33
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      7541         -
# 2  Short offline       Completed without error       00%      7510         -
# 3  Extended offline    Interrupted (host reset)      90%      7497         -
# 4  Short offline       Completed without error       00%      7497         -
# 5  Short offline       Completed without error       00%      7486         -
# 6  Short offline       Completed without error       00%      7463         -
# 7  Short offline       Completed without error       00%      7439         -
# 8  Short offline       Completed without error       00%      7415         -
# 9  Short offline       Completed without error       00%      7391         -
#10  Short offline       Completed without error       00%      7367         -
#11  Short offline       Completed without error       00%      7343         -
#12  Short offline       Completed without error       00%      7319         -
#13  Short offline       Completed without error       00%      7295         -
#14  Short offline       Completed without error       00%      5492         -
#15  Short offline       Completed without error       00%      5329         -
#16  Short offline       Completed without error       00%      5161         -
#17  Short offline       Completed without error       00%      4996         -
#18  Extended offline    Completed without error       00%      4905         -
#19  Short offline       Completed without error       00%      4826         -
#20  Short offline       Completed without error       00%      4658         -
#21  Short offline       Completed without error       00%      4490         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
Last edited:

unholyeyebrows

Explorer
Joined
Apr 19, 2012
Messages
55
Here is the long SMART test output for the destination (i.e. where the I/O errors were seen). The source server tests are still running.

Code:
admin@saturn[~]$ sudo smartctl -a /dev/sda   
[sudo] password for admin:
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.79+truenas] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD40EFRX-68N32N0
Serial Number:    WD-WCC7K5AFUAT3
LU WWN Device Id: 5 0014ee 2657ca49a
Firmware Version: 82.00A82
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu May 25 07:19:30 2023 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (42720) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 453) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x303d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   206   158   021    Pre-fail  Always       -       4700
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       146
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   090   090   000    Old_age   Always       -       7565
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       37
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       21
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       248
194 Temperature_Celsius     0x0022   119   109   000    Old_age   Always       -       31
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      7564         -
# 2  Short offline       Completed without error       00%      7541         -
# 3  Short offline       Completed without error       00%      7510         -
# 4  Extended offline    Interrupted (host reset)      90%      7497         -
# 5  Short offline       Completed without error       00%      7497         -
# 6  Short offline       Completed without error       00%      7486         -
# 7  Short offline       Completed without error       00%      7463         -
# 8  Short offline       Completed without error       00%      7439         -
# 9  Short offline       Completed without error       00%      7415         -
#10  Short offline       Completed without error       00%      7391         -
#11  Short offline       Completed without error       00%      7367         -
#12  Short offline       Completed without error       00%      7343         -
#13  Short offline       Completed without error       00%      7319         -
#14  Short offline       Completed without error       00%      7295         -
#15  Short offline       Completed without error       00%      5492         -
#16  Short offline       Completed without error       00%      5329         -
#17  Short offline       Completed without error       00%      5161         -
#18  Short offline       Completed without error       00%      4996         -
#19  Extended offline    Completed without error       00%      4905         -
#20  Short offline       Completed without error       00%      4826         -
#21  Short offline       Completed without error       00%      4658         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

admin@saturn[~]$ sudo smartctl -a /dev/sdb
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.79+truenas] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD40EFRX-68N32N0
Serial Number:    WD-WCC7K3FSLAFX
LU WWN Device Id: 5 0014ee 2669c7409
Firmware Version: 82.00A82
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Thu May 25 07:19:33 2023 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (42720) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 453) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x303d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   213   167   021    Pre-fail  Always       -       4350
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       180
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   061   061   000    Old_age   Always       -       28919
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       76
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       43
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       496
194 Temperature_Celsius     0x0022   120   108   000    Old_age   Always       -       30
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     28919         -
# 2  Short offline       Completed without error       00%     28888         -
# 3  Short offline       Completed without error       00%     28864         -
# 4  Extended offline    Interrupted (host reset)      90%     28851         -
# 5  Short offline       Completed without error       00%     28851         -
# 6  Short offline       Completed without error       00%     28841         -
# 7  Short offline       Completed without error       00%     26848         -
# 8  Short offline       Completed without error       00%     26685         -
# 9  Short offline       Completed without error       00%     26518         -
#10  Short offline       Completed without error       00%     26354         -
#11  Extended offline    Completed without error       00%     26261         -
#12  Short offline       Completed without error       00%     26182         -
#13  Short offline       Completed without error       00%     26014         -
#14  Short offline       Completed without error       00%     25846         -
#15  Short offline       Completed without error       00%     25678         -
#16  Short offline       Completed without error       00%     25515         -
#17  Short offline       Completed without error       00%     25347         -
#18  Short offline       Completed without error       00%     25179         -
#19  Short offline       Completed without error       00%     25012         -
#20  Extended offline    Completed without error       00%     24948         -
#21  Short offline       Completed without error       00%     24844         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

admin@saturn[~]$ sudo smartctl -a /dev/sdc
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.79+truenas] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD40EFRX-68N32N0
Serial Number:    WD-WCC7K7YFZ2TV
LU WWN Device Id: 5 0014ee 211472709
Firmware Version: 82.00A82
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu May 25 07:19:34 2023 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (43260) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 460) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x303d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   210   166   021    Pre-fail  Always       -       4500
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       184
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   060   060   000    Old_age   Always       -       29275
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       76
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       43
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       570
194 Temperature_Celsius     0x0022   119   108   000    Old_age   Always       -       31
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     29275         -
# 2  Short offline       Completed without error       00%     29252         -
# 3  Short offline       Completed without error       00%     29222         -
# 4  Extended offline    Interrupted (host reset)      90%     29207         -
# 5  Short offline       Completed without error       00%     29207         -
# 6  Short offline       Completed without error       00%     29196         -
# 7  Short offline       Completed without error       00%     27202         -
# 8  Short offline       Completed without error       00%     27040         -
# 9  Short offline       Completed without error       00%     26872         -
#10  Short offline       Completed without error       00%     26706         -
#11  Extended offline    Completed without error       00%     26616         -
#12  Short offline       Completed without error       00%     26536         -
#13  Short offline       Completed without error       00%     26368         -
#14  Short offline       Completed without error       00%     26201         -
#15  Short offline       Completed without error       00%     26033         -
#16  Short offline       Completed without error       00%     25868         -
#17  Short offline       Completed without error       00%     25701         -
#18  Short offline       Completed without error       00%     25534         -
#19  Short offline       Completed without error       00%     25366         -
#20  Extended offline    Completed without error       00%     25302         -
#21  Short offline       Completed without error       00%     25199         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

admin@saturn[~]$ sudo smartctl -a /dev/sdd
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.79+truenas] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD40EFRX-68N32N0
Serial Number:    WD-WCC7K4KEL734
LU WWN Device Id: 5 0014ee 2114726bf
Firmware Version: 82.00A82
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Thu May 25 07:19:36 2023 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (43260) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 460) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x303d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   207   162   021    Pre-fail  Always       -       4641
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       184
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   061   061   000    Old_age   Always       -       29154
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       78
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       44
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       582
194 Temperature_Celsius     0x0022   119   108   000    Old_age   Always       -       31
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       7
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     29153         -
# 2  Short offline       Completed without error       00%     29130         -
# 3  Short offline       Completed without error       00%     29099         -
# 4  Extended offline    Interrupted (host reset)      90%     29086         -
# 5  Short offline       Completed without error       00%     29086         -
# 6  Short offline       Completed without error       00%     29075         -
# 7  Short offline       Completed without error       00%     27081         -
# 8  Short offline       Completed without error       00%     26918         -
# 9  Short offline       Completed without error       00%     26750         -
#10  Short offline       Completed without error       00%     26584         -
#11  Extended offline    Completed without error       00%     26494         -
#12  Short offline       Completed without error       00%     26415         -
#13  Short offline       Completed without error       00%     26247         -
#14  Short offline       Completed without error       00%     26079         -
#15  Short offline       Completed without error       00%     25911         -
#16  Short offline       Completed without error       00%     25745         -
#17  Short offline       Completed without error       00%     25580         -
#18  Short offline       Completed without error       00%     25412         -
#19  Short offline       Completed without error       00%     25245         -
#20  Extended offline    Completed without error       00%     25181         -
#21  Short offline       Completed without error       00%     25077         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

admin@saturn[~]$ 
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222

unholyeyebrows

Explorer
Joined
Apr 19, 2012
Messages
55
Can you tell us exactly what the error says?
Here we are - this is on the destination server (the HP Microserver)


Code:
Current alerts:

Device /dev/disk/by-partuuid/a94bfc6c-e7b9-43e4-9e88-c7962e2db362 is causing slow I/O on pool disk0.
Device /dev/disk/by-partuuid/279491cb-84e7-455c-bae5-b96058be1c49 is causing slow I/O on pool disk0.
Device /dev/disk/by-partuuid/b50f529b-fd33-48be-b048-dd18d9a0aa07 is causing slow I/O on pool disk0.
Device /dev/disk/by-partuuid/f8df5b1b-6ad6-410f-972f-7ea74742c33d is causing slow I/O on pool disk0.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222

unholyeyebrows

Explorer
Joined
Apr 19, 2012
Messages
55
Honestly I have no clue about what is going on.
:smile: Massive thanks anyway for giving it a go and at least hopefully confirming my hardware should be OK. I have ordered a PCI-E Startech Intel i210 NIC for the new server just in case this improves things. I'll post back here if I find anything. Thanks again.
 

unholyeyebrows

Explorer
Joined
Apr 19, 2012
Messages
55
Quick update: the new NIC made no difference and the replication is still going slowly with the occasional "slow I/O" error from the destination.

One observation is the destination server is using very little ZFS cache:

16Gb RAM total
Services 4Gb
ZFS Cache 0.9Gb
Free 10.1Gb.

Is this normal as on the other server the ZFS cache generally uses all free RAM?

This destination server is doing nothing except receive the replication from the source server.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Is this normal as on the other server the ZFS cache generally uses all free RAM?
Cache grows with time. If you don't see it grow that's potentially abnormal behaviour.
 

unholyeyebrows

Explorer
Joined
Apr 19, 2012
Messages
55
Ok for my problem I have a solution which is to instal TrueNAS Core.

I thought the problem might be related to the ZFS stack so I reinstalled the destination server from Scale to Core. Result is a sustained 900 mbps transfer for my ZFS replication.

So right now I’m putting this down to an incompatibility between Scale and the HP Gen 8 Microserver.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
So right now I’m putting this down to an incompatibility between Scale and the HP Gen 8 Microserver.

Possibly, but it is just as likely to be the poor handling of ARC in Linux hobbling your replication I/O. You need 24-32GB of RAM in Linux to be equal to 16GB of RAM in CORE.
 

aklibisz

Dabbler
Joined
Aug 23, 2023
Messages
13
Hi @jgreco , did you manage to figure this out? I'm also seeing extremely poor write performance with almost the same setup as yours: Scale on a HP Gen 8 Microserver w/ 16GB ECC memory and 4x 4TB Seagate Ironwolf ST4000VN006.

I'm getting the same " is causing slow I/O on pool" warnings, for all four drives. These are completely empty, brand new drives, and the description explicitly cites that these are CMR drive.

At first I thought it was a networking or SMB bottleneck, getting < 200Mbps writes to the SMB share over cat6 and a 2.5Gb switch.

But now I've SSHed into the box and when I directly `cp` a 1.6GB file from the SSD boot drive (/tmp) to the dataset (/mnt/pool/dataset), it takes ~70s (~0.18 Gbps), or it just completely hangs for > 10 minutes.

I'll likely try re-installing TrueNAS core next.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Hi @jgreco , did you manage to figure this out? I'm also seeing extremely poor write performance with almost the same setup as yours: Scale on a HP Gen 8 Microserver w/ 16GB ECC memory and 4x 4TB Seagate Ironwolf ST4000VN006.

I'm getting the same " is causing slow I/O on pool" warnings, for all four drives. These are completely empty, brand new drives, and the description explicitly cites that these are CMR drive.

At first I thought it was a networking or SMB bottleneck, getting < 200Mbps writes to the SMB share over cat6 and a 2.5Gb switch.

But now I've SSHed into the box and when I directly `cp` a 1.6GB file from the SSD boot drive (/tmp) to the dataset (/mnt/pool/dataset), it takes ~70s (~0.18 Gbps), or it just completely hangs for > 10 minutes.

I'll likely try re-installing TrueNAS core next.
What is said boot drive? How are you conncecting those drives to the motherboard?

Please look ath this post.
 
Top