- Joined
- Jan 14, 2023
- Messages
- 623
Hello all,
I noticed that on my old system (completely different hardware except for the drives) also, that sometimes a scrub would be very slow. Last time I thought it was stuck and stopped it, after a reboot it was fine.
Now after I changed my hardware, I had a similiar issue and wanted to investigate. I will try to give as much information as possible. The last scrub on this hardware took around 4 hours (I can't see it anymore since I rebooted in between and started this new scrub, at least in GUI).
I started the scrub on my VM pool (SSD mirror) and data pool (raidz2) simultaniously. The scrub for the VM pool took 8 minutes.
This is the disk I/O for the data pool:
As you can see it was at < 10 mb/s for a long time, then it got to speed and dropped again.
I did some investigation and saw the following correlation between the disk I/O and ARC Requests demand_metadata:
If it's not clear from the color scheme: violett is miss, blue is hit. So apparantly there are missed ARC Requests demand_metadata and when they drop / reduce the I/O increases.
While the scrub is still running I pulled the information given by
Any input is appreciated! If you need more information please ask.
I noticed that on my old system (completely different hardware except for the drives) also, that sometimes a scrub would be very slow. Last time I thought it was stuck and stopped it, after a reboot it was fine.
Now after I changed my hardware, I had a similiar issue and wanted to investigate. I will try to give as much information as possible. The last scrub on this hardware took around 4 hours (I can't see it anymore since I rebooted in between and started this new scrub, at least in GUI).
Code:
TrueNAS-SCALE-22.12.4.2 Supermicro X10SRi-F Xeon 2640v4 4x Samsung 32GB 2Rx4 PC4-2400T RA1-11-DC0 Server RAM ECC HP M393A4K40CB1-CRC0Q running at 2133 MHz, tested with memtest for 48 hours prior to installation Seasonic PX-750 Data pool: 4*4TB WD RED PLUS (CMR) raidz2 VM pool: 2*500GB SSD (Samsung Evo 850 / Crucial mx500) mirror Pool is 50 % full
I started the scrub on my VM pool (SSD mirror) and data pool (raidz2) simultaniously. The scrub for the VM pool took 8 minutes.
This is the disk I/O for the data pool:

As you can see it was at < 10 mb/s for a long time, then it got to speed and dropped again.
I did some investigation and saw the following correlation between the disk I/O and ARC Requests demand_metadata:
If it's not clear from the color scheme: violett is miss, blue is hit. So apparantly there are missed ARC Requests demand_metadata and when they drop / reduce the I/O increases.
While the scrub is still running I pulled the information given by
smartctl -A /dev/sdX
If I understand it correctly the last smart test that run was short. I will run a long test after the scrub is finished.Code:
=== START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 222 221 021 Pre-fail Always - 3858 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 30 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 091 091 000 Old_age Always - 6582 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 26 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 1 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 37 194 Temperature_Celsius 0x0022 119 105 000 Old_age Always - 31 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
Code:
=== START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 221 221 021 Pre-fail Always - 3916 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 30 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 091 091 000 Old_age Always - 6582 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 26 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 1 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 54 194 Temperature_Celsius 0x0022 116 109 000 Old_age Always - 34 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
Code:
=== START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 223 223 021 Pre-fail Always - 3825 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 29 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 091 091 000 Old_age Always - 6582 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 26 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 1 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 35 194 Temperature_Celsius 0x0022 112 104 000 Old_age Always - 38 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
Code:
=== START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 222 222 021 Pre-fail Always - 3875 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 30 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 091 091 000 Old_age Always - 6582 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 26 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 1 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 38 194 Temperature_Celsius 0x0022 114 106 000 Old_age Always - 36 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
Any input is appreciated! If you need more information please ask.
Last edited: