Poor disk performance

Youri Andropov · Nov 9, 2020

Hello,

I'm actually experiencing unusual poor disk transfer performance (SMB share), I suspect a failing disk to be the cause.
How could I investigate and find the faulty disk ? Note that the pool is online.

My setup is 6x 8TB drives in a RAIDZ2 pool on FreeNAS-11.3-U5, with 24 GB system memory.

Thank you.

sretalla · Nov 9, 2020

zpool status -v

smartctl -a /dev/daX (replacing X with the disk numbers... maybe adaX in your case, depending on the type of SATA controller)

This assumes you're already running the scheduled SMART tests (long and short).

also:
zpool iostat -v poolname (replace poolname with the name of your pool)

Youri Andropov · Nov 9, 2020

# zpool status -v
pool: MEDIATHEQUE
state: ONLINE
scan: scrub repaired 0 in 0 days 07:47:46 with 0 errors on Sun Oct 11 07:48:04 2020
config:

NAME STATE READ WRITE CKSUM
MEDIATHEQUE ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/181964f7-3721-11e8-86da-002590722e5d ONLINE 0 0 0
gptid/75f8d62e-8871-11e8-8cb7-002590722e5d ONLINE 0 0 0
gptid/8a57dfa2-c726-11e9-93f6-002590722e5d ONLINE 0 0 0
gptid/fb287aa9-0a5e-11e9-8524-002590722e5d ONLINE 0 0 0
gptid/1d181c76-3721-11e8-86da-002590722e5d ONLINE 0 0 0
gptid/2f354524-8fc7-11e9-b8f0-002590722e5d ONLINE 0 0 0

I couldn't see any SMART error using smartctl.

sretalla · Nov 9, 2020

Youri Andropov said:
I couldn't see any SMART error using smartctl.

Do you know what you're looking for?

"Test completed without error" isn't an indication that the disk has no error, just that the test was able to complete.

Also try that command:
zpool iostat -v poolname (replace poolname with the name of your pool)

Youri Andropov · Nov 9, 2020

Here's a SMART status:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 064 006 Pre-fail Always - 532944
3 Spin_Up_Time 0x0003 092 092 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 49
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 090 060 045 Pre-fail Always - 963507053
9 Power_On_Hours 0x0032 090 090 000 Old_age Always - 9363h+47m+22.997s
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 49
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 064 051 040 Old_age Always - 36 (Min/Max 21/37)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 361
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 423
194 Temperature_Celsius 0x0022 036 049 000 Old_age Always - 36 (0 19 0 0 0)
195 Hardware_ECC_Recovered 0x001a 100 064 000 Old_age Always - 532944
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 9345h+52m+00.023s
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 11982601314
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 77910796405

# zpool iostat -v MEDIATHEQUE
capacity operations bandwidth
pool alloc free read write read write
-------------------------------------- ----- ----- ----- ----- ----- -----
MEDIATHEQUE 16.1T 27.4T 4 42 425K 530K
raidz2 16.1T 27.4T 4 42 425K 530K
gptid/181964f7-3721-11e8-86da-002590722e5d - - 1 10 74.2K 174K
gptid/75f8d62e-8871-11e8-8cb7-002590722e5d - - 1 10 76.8K 174K
gptid/8a57dfa2-c726-11e9-93f6-002590722e5d - - 1 9 75.6K 174K
gptid/fb287aa9-0a5e-11e9-8524-002590722e5d - - 1 10 74.0K 174K
gptid/1d181c76-3721-11e8-86da-002590722e5d - - 1 10 74.7K 174K
gptid/2f354524-8fc7-11e9-b8f0-002590722e5d - - 1 9 73.7K 174K
-------------------------------------- ----- ----- ----- ----- ----- -----

sretalla · Nov 9, 2020

Try the iostat during and/or right after a "slow" transfer.

Right now it's showing that all disks are performing the same.

Have a look here:

Reading SMART Data on Seagate Drives - sdx1.net

Lenovo's Corporate Discount, which currently uses the code NJ*PERKSEPP, offers a significant discount on expensive ThinkPads.

sdx1.net

Maybe you need to run the smartctl with the extra switches since it's a seagate drive. Those numbers look a little odd to me.
TL:DR smartctl -v 1,hex48 -v 7,hex48 -A /dev/hda

Youri Andropov · Nov 9, 2020

sretalla · Nov 9, 2020

You might also try it with the ECC_recovered attribute, so:

smartctl -v 1,hex48 -v 7,hex48 -v 195,hex48 -A /dev/ada5

I don't know if that really is the right way to interpret that field, but maybe better than seeing a really big number that can't possibly be right.

HoneyBadger · Nov 9, 2020

Sorry if I've missed it - but @Youri Andropov can you post the model number?

I'm seeing "Seagate" and "8TB" and I'm worried the next thing that I find out about these drives will be "Archive"

Youri Andropov · Nov 9, 2020

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 064 006 Pre-fail Always - 0x0000001cf9f8
7 Seek_Error_Rate 0x000f 090 060 045 Pre-fail Always - 0x00003970c88f
195 Hardware_ECC_Recovered 0x001a 100 064 000 Old_age Always - 0x0000001cf9f8

It seems fine...

Well, not quite :

# smartctl -v 1,hex48 -v 7,hex48 -v 195,hex48 -A /dev/ada3
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 117 099 006 Pre-fail Always - 0x000009137c30
7 Seek_Error_Rate 0x000f 075 060 030 Pre-fail Always - 0x00243d81e919
195 Hardware_ECC_Recovered 0x001a 117 099 000 Old_age Always - 0x000009137c30

# smartctl -v 1,hex48 -v 7,hex48 -v 195,hex48 -A /dev/ada1
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 117 099 006 Pre-fail Always - 0x000009c53f30
7 Seek_Error_Rate 0x000f 082 060 030 Pre-fail Always - 0x0006422b85f5
195 Hardware_ECC_Recovered 0x001a 117 099 000 Old_age Always - 0x000009c53f30

# smartctl -v 1,hex48 -v 7,hex48 -v 195,hex48 -A /dev/ada0
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 114 099 006 Pre-fail Always - 0x000004643640
7 Seek_Error_Rate 0x000f 077 060 030 Pre-fail Always - 0x001448d791c5
195 Hardware_ECC_Recovered 0x001a 114 099 000 Old_age Always - 0x000004643640

How bad can this be ?

------
Model Family: Seagate Archive HDD
Device Model: ST8000AS0002-1NA17Z

HoneyBadger · Nov 9, 2020

I sometimes hate when my instincts are right.

Youri Andropov said:
Model Family: Seagate Archive HDD
Device Model: ST8000AS0002-1NA17Z

Going to say this is another victim of SMR. Are all six of these disks the same model identified above (Seagate Archive/ST8000AS0002)?

Any one SMR drive will drag the entire vdev down with it, but if all six are there you have much higher odds of a drive being busy reshingling when you ask it to do something useful.

joeschmuck · Nov 9, 2020

HoneyBadger said:
I sometimes hate when my instincts are right.

Good catch! Bad for the OP.

Important Announcement for the TrueNAS Community.

Poor disk performance

Youri Andropov

Dabbler

sretalla

Powered by Neutrality

Youri Andropov

Dabbler

sretalla

Powered by Neutrality

Youri Andropov

Dabbler

sretalla

Powered by Neutrality

Reading SMART Data on Seagate Drives - sdx1.net

Youri Andropov

Dabbler

sretalla

Powered by Neutrality

HoneyBadger

actually does care

Youri Andropov

Dabbler

HoneyBadger

actually does care

joeschmuck

Old Man

Similar threads

Important Announcement for the TrueNAS Community.

Poor disk performance

Dabbler

Powered by Neutrality

Dabbler

Powered by Neutrality

Dabbler

Powered by Neutrality

Dabbler

Powered by Neutrality

actually does care

Dabbler

actually does care

Old Man

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Poor disk performance"

Similar threads