Poor disk performance

Youri Andropov

Dabbler
Joined
Apr 8, 2014
Messages
34
Hello,

I'm actually experiencing unusual poor disk transfer performance (SMB share), I suspect a failing disk to be the cause.
How could I investigate and find the faulty disk ? Note that the pool is online.

My setup is 6x 8TB drives in a RAIDZ2 pool on FreeNAS-11.3-U5, with 24 GB system memory.

Thank you.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
zpool status -v

smartctl -a /dev/daX (replacing X with the disk numbers... maybe adaX in your case, depending on the type of SATA controller)

This assumes you're already running the scheduled SMART tests (long and short).

also:
zpool iostat -v poolname (replace poolname with the name of your pool)
 

Youri Andropov

Dabbler
Joined
Apr 8, 2014
Messages
34
# zpool status -v
pool: MEDIATHEQUE
state: ONLINE
scan: scrub repaired 0 in 0 days 07:47:46 with 0 errors on Sun Oct 11 07:48:04 2020
config:

NAME STATE READ WRITE CKSUM
MEDIATHEQUE ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/181964f7-3721-11e8-86da-002590722e5d ONLINE 0 0 0
gptid/75f8d62e-8871-11e8-8cb7-002590722e5d ONLINE 0 0 0
gptid/8a57dfa2-c726-11e9-93f6-002590722e5d ONLINE 0 0 0
gptid/fb287aa9-0a5e-11e9-8524-002590722e5d ONLINE 0 0 0
gptid/1d181c76-3721-11e8-86da-002590722e5d ONLINE 0 0 0
gptid/2f354524-8fc7-11e9-b8f0-002590722e5d ONLINE 0 0 0


I couldn't see any SMART error using smartctl.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
I couldn't see any SMART error using smartctl.
Do you know what you're looking for?

"Test completed without error" isn't an indication that the disk has no error, just that the test was able to complete.

Also try that command:
zpool iostat -v poolname (replace poolname with the name of your pool)
 

Youri Andropov

Dabbler
Joined
Apr 8, 2014
Messages
34
Here's a SMART status:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 064 006 Pre-fail Always - 532944
3 Spin_Up_Time 0x0003 092 092 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 49
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 090 060 045 Pre-fail Always - 963507053
9 Power_On_Hours 0x0032 090 090 000 Old_age Always - 9363h+47m+22.997s
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 49
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 064 051 040 Old_age Always - 36 (Min/Max 21/37)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 361
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 423
194 Temperature_Celsius 0x0022 036 049 000 Old_age Always - 36 (0 19 0 0 0)
195 Hardware_ECC_Recovered 0x001a 100 064 000 Old_age Always - 532944
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 9345h+52m+00.023s
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 11982601314
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 77910796405

# zpool iostat -v MEDIATHEQUE
capacity operations bandwidth
pool alloc free read write read write
-------------------------------------- ----- ----- ----- ----- ----- -----
MEDIATHEQUE 16.1T 27.4T 4 42 425K 530K
raidz2 16.1T 27.4T 4 42 425K 530K
gptid/181964f7-3721-11e8-86da-002590722e5d - - 1 10 74.2K 174K
gptid/75f8d62e-8871-11e8-8cb7-002590722e5d - - 1 10 76.8K 174K
gptid/8a57dfa2-c726-11e9-93f6-002590722e5d - - 1 9 75.6K 174K
gptid/fb287aa9-0a5e-11e9-8524-002590722e5d - - 1 10 74.0K 174K
gptid/1d181c76-3721-11e8-86da-002590722e5d - - 1 10 74.7K 174K
gptid/2f354524-8fc7-11e9-b8f0-002590722e5d - - 1 9 73.7K 174K
-------------------------------------- ----- ----- ----- ----- ----- -----
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Try the iostat during and/or right after a "slow" transfer.

Right now it's showing that all disks are performing the same.

Have a look here:

Maybe you need to run the smartctl with the extra switches since it's a seagate drive. Those numbers look a little odd to me.
TL:DR smartctl -v 1,hex48 -v 7,hex48 -A /dev/hda
 

Youri Andropov

Dabbler
Joined
Apr 8, 2014
Messages
34
smartctl.png
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
You might also try it with the ECC_recovered attribute, so:

smartctl -v 1,hex48 -v 7,hex48 -v 195,hex48 -A /dev/ada5

I don't know if that really is the right way to interpret that field, but maybe better than seeing a really big number that can't possibly be right.
 

Youri Andropov

Dabbler
Joined
Apr 8, 2014
Messages
34
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 064 006 Pre-fail Always - 0x0000001cf9f8
7 Seek_Error_Rate 0x000f 090 060 045 Pre-fail Always - 0x00003970c88f
195 Hardware_ECC_Recovered 0x001a 100 064 000 Old_age Always - 0x0000001cf9f8


It seems fine...

Well, not quite :

# smartctl -v 1,hex48 -v 7,hex48 -v 195,hex48 -A /dev/ada3
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 117 099 006 Pre-fail Always - 0x000009137c30
7 Seek_Error_Rate 0x000f 075 060 030 Pre-fail Always - 0x00243d81e919
195 Hardware_ECC_Recovered 0x001a 117 099 000 Old_age Always - 0x000009137c30

# smartctl -v 1,hex48 -v 7,hex48 -v 195,hex48 -A /dev/ada1
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 117 099 006 Pre-fail Always - 0x000009c53f30
7 Seek_Error_Rate 0x000f 082 060 030 Pre-fail Always - 0x0006422b85f5
195 Hardware_ECC_Recovered 0x001a 117 099 000 Old_age Always - 0x000009c53f30

# smartctl -v 1,hex48 -v 7,hex48 -v 195,hex48 -A /dev/ada0
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 114 099 006 Pre-fail Always - 0x000004643640
7 Seek_Error_Rate 0x000f 077 060 030 Pre-fail Always - 0x001448d791c5
195 Hardware_ECC_Recovered 0x001a 114 099 000 Old_age Always - 0x000004643640


How bad can this be ?


------
Model Family: Seagate Archive HDD
Device Model: ST8000AS0002-1NA17Z
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I sometimes hate when my instincts are right.

Model Family: Seagate Archive HDD
Device Model: ST8000AS0002-1NA17Z

Going to say this is another victim of SMR. Are all six of these disks the same model identified above (Seagate Archive/ST8000AS0002)?

Any one SMR drive will drag the entire vdev down with it, but if all six are there you have much higher odds of a drive being busy reshingling when you ask it to do something useful.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Top