Drew Heath
Explorer
- Joined
- Mar 7, 2016
- Messages
- 80
I am have an issue post upgrade to Scale where the system is unresponsive (GUI/SSH/Shares) after an OS disk scrub. Strangely enough, the apps that are running still function. Post rebooting, the system returns to normal.
The following is listed in the GUI:
I ran an extended smart test on the boot ssd, where I think the drive is okay:
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.142+truenas] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: SandForce Driven SSDs
Device Model: Corsair Force LS SSD
Serial Number: 16088024000104780263
Firmware Version: S9FM02.6
User Capacity: 60,022,480,896 bytes [60.0 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
TRIM Command: Available
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-3 (minor revision not indicated)
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Mon Dec 5 23:00:41 2022 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 30) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 2) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 050 Pre-fail Always - 0/0
5 Retired_Block_Count 0x0013 100 100 050 Pre-fail Always - 0
9 Power_On_Hours_and_Msec 0x0012 100 100 000 Old_age Always - 13714h+00m+00.000s
12 Power_Cycle_Count 0x0012 100 100 000 Old_age Always - 62
162 Unknown_SandForce_Attr 0x0003 081 081 000 Pre-fail Always - 54
170 Reserve_Block_Count 0x0002 100 100 000 Old_age Always - 172
172 Erase_Fail_Count 0x0012 100 100 000 Old_age Always - 0
173 Unknown_SandForce_Attr 0x0000 100 100 000 Old_age Offline - 1704118
174 Unexpect_Power_Loss_Ct 0x0012 100 100 000 Old_age Always - 5
181 Program_Fail_Count 0x0012 100 100 000 Old_age Always - 0
187 Reported_Uncorrect 0x0012 100 100 000 Old_age Always - 1
192 Power-Off_Retract_Count 0x0012 100 100 000 Old_age Always - 5
194 Temperature_Celsius 0x0023 070 070 000 Pre-fail Always - 30
196 Reallocated_Event_Count 0x0000 100 100 000 Old_age Offline - 0
218 Unknown_SandForce_Attr 0x0000 100 100 000 Old_age Offline - 73
231 SSD_Life_Left 0x0013 100 100 000 Pre-fail Always - 99
241 Lifetime_Writes_GiB 0x0012 100 100 000 Old_age Always - 701
242 Lifetime_Reads_GiB 0x0012 100 100 000 Old_age Always - 1685
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 13699 -
A zpool status however shows the boot pool as degraded:
pool: freenas-boot
state: DEGRADED
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
scan: scrub repaired 0B in 00:02:20 with 0 errors on Thu Dec 1 03:47:21 2022
config:
NAME STATE READ WRITE CKSUM
freenas-boot DEGRADED 0 0 0
sdb DEGRADED 4 440 0 too many errors
I have a new drive on order, but based on smart I am not sure the disk is the issue. Thank you in advance for your time and advice!
The following is listed in the GUI:
Boot pool status is DEGRADED: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected..
I ran an extended smart test on the boot ssd, where I think the drive is okay:
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.142+truenas] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: SandForce Driven SSDs
Device Model: Corsair Force LS SSD
Serial Number: 16088024000104780263
Firmware Version: S9FM02.6
User Capacity: 60,022,480,896 bytes [60.0 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
TRIM Command: Available
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-3 (minor revision not indicated)
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Mon Dec 5 23:00:41 2022 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 30) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 2) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 050 Pre-fail Always - 0/0
5 Retired_Block_Count 0x0013 100 100 050 Pre-fail Always - 0
9 Power_On_Hours_and_Msec 0x0012 100 100 000 Old_age Always - 13714h+00m+00.000s
12 Power_Cycle_Count 0x0012 100 100 000 Old_age Always - 62
162 Unknown_SandForce_Attr 0x0003 081 081 000 Pre-fail Always - 54
170 Reserve_Block_Count 0x0002 100 100 000 Old_age Always - 172
172 Erase_Fail_Count 0x0012 100 100 000 Old_age Always - 0
173 Unknown_SandForce_Attr 0x0000 100 100 000 Old_age Offline - 1704118
174 Unexpect_Power_Loss_Ct 0x0012 100 100 000 Old_age Always - 5
181 Program_Fail_Count 0x0012 100 100 000 Old_age Always - 0
187 Reported_Uncorrect 0x0012 100 100 000 Old_age Always - 1
192 Power-Off_Retract_Count 0x0012 100 100 000 Old_age Always - 5
194 Temperature_Celsius 0x0023 070 070 000 Pre-fail Always - 30
196 Reallocated_Event_Count 0x0000 100 100 000 Old_age Offline - 0
218 Unknown_SandForce_Attr 0x0000 100 100 000 Old_age Offline - 73
231 SSD_Life_Left 0x0013 100 100 000 Pre-fail Always - 99
241 Lifetime_Writes_GiB 0x0012 100 100 000 Old_age Always - 701
242 Lifetime_Reads_GiB 0x0012 100 100 000 Old_age Always - 1685
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 13699 -
A zpool status however shows the boot pool as degraded:
pool: freenas-boot
state: DEGRADED
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
scan: scrub repaired 0B in 00:02:20 with 0 errors on Thu Dec 1 03:47:21 2022
config:
NAME STATE READ WRITE CKSUM
freenas-boot DEGRADED 0 0 0
sdb DEGRADED 4 440 0 too many errors
I have a new drive on order, but based on smart I am not sure the disk is the issue. Thank you in advance for your time and advice!