Should I be seeing symmetric drive usage on a RAIDZ?

Status
Not open for further replies.

scurrier

Patron
Joined
Jan 2, 2014
Messages
297
I pulled four 4-year-old drives from my old NAS and plugged them into my shiny new FreeNAS box. They are Seagate 1.5TB LP (Low Power) drives, ST31500541AS. I put them in a four-drive RAIDZ2 to test performance. I used dd to test them (with compression turned off on the dataset):
Code:
dd if=/dev/zero of=/mnt/fourseagates/testfiledeleteme.dat bs=2048k count=25k
23314038784 bytes transferred in 1733.059391 secs (13452533 bytes/sec)

...and the performance is atrocious, 13 MB/s. So I looked at the drive usage report plots:
nonsymmetric drive usage.png

...hmm, da6 seems to be working harder and is nonsymmetric compared to the other drives, which appear identical. You'll also notice that da6 isn't even on the same y-scale as the other plots, it is on a smaller y-scale.

I checked the SMART data and found that no tests had ever been run on these drives. Blame my former ignorance. But nothing seemed to be wrong with them as far as I could tell, no reallocated sectors or anything. I am running a SMART long test on them and we'll see what that tells me.

But in the meantime, a question for posterity: Should drive usage always appear symmetric on a very healthy array? If you were just browsing and noticed the above reports on your NAS, would you be alarmed?
 

no_connection

Patron
Joined
Dec 15, 2013
Messages
480
Cables are all fine?
What happens if you reorder the drives (just swap two of them)?

I usually run HDTune to see if there is any oddities or issues with the drive.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
First, that chart was NEVER EVER meant to be compared in the way you are comparing. gstat is far more accurate because it provides better resolution. SMART data and SMART testing would be the first thing I'd look at.
zpool iostat 1 will give you 1 second breakdowns for pool usage.
 

scurrier

Patron
Joined
Jan 2, 2014
Messages
297
Cables are all fine?
What happens if you reorder the drives (just swap two of them)?

I usually run HDTune to see if there is any oddities or issues with the drive.
I have messed with the cables and ended up confusing myself even more. At first I thought I had found an issue with one of them, now not. I guess that is the nature of intermittent problems like electrical issues. I ended up putting them back to how they were originally and then troubleshooting more. I purposely bought Nippon Labs cables which were like 2x the cost of other cables to make sure that I got a quality cable. But I haven't rule them out as the cause, yet.

Here's my rough notes from the further troubleshooting that I did. The SATA# references are designations that the backplane has for the SATA drive bays. The 4 letter gibberish references are drive serials. These are all 4 year old Seagate 1.5TB drives. They are also on a controller with new Seagate 4TB NAS drives. The controller is the onboard LSI 2308 on my Supermicro X10SL7-F.
Code:
9XW028EV gives 37 then 32 MB/s when plugged into sata4.  All others test in 90's.
T87Y has 0 Command_timeout and 0 UDMA_CRC_Error_Count
9XW028EV has 2 Command_timeout and 2 UDMA_CRC_Error_Count
PVPH has 1 command_timeout and 0 UDMA_CRC_Error_Count.
Y95A has 1 command_timeout and 0 UDMA_CRC_ERROR_COUNT.
 
SATA2,4,5 are working and the drives PVPH, T87Y, and Y95A all appear in the GUI.  28EV is missing and is in SATA3.  The 1.5tb drives that show are all acheiving ~90MB/s.  Serials aren't showing for 4tb drives.
 
Without rebooting, moved 28EV to SATA6.  Tests at 85 MB/s.
Without rebooting, moved 28EV to SATA7.  Tests at 37 MB/s.
Without rebooting, moved 28EV to SATA6.  Drive is not visible to system.
Without rebooting, reinserted 28EV to SATA6.  Drive is not visible to system.
Without rebooting, reinserted 28EV to SATA6.  Drive is not visible to system.
Pulled 28EV from SATA6.  Pulled T87Y from SATA5.  Inserted T87Y to SATA6 and 28EV to SATA5.  28EV (SATA5) is detected but T87Y (SATA6) is not.  28EV (SATA5) tests at 85 MB/s.
 
Rebooted.  All drives recognized.
PVPH (SATA2) tests at 100 MB/s.
T87Y (SATA6) tests at 95 MB/s.
28EV (SATA5) tests at 86 MB/s.
Y95A (SATA4) tests at 90 MB/s.
 
Swapped drives around w/o reboot.  All are recognized.
PVPH (SATA3) tests at 98 MB/s.
T87Y (SATA5) tests at 101 MB/s.
28EV (SATA4) tests at 57 MB/s.  <---stands out as bad
Y95A (SATA2) tests at 93 MB/s.
 
Swapped drives around w/o reboot.  All are recognized.
PVPH (SATA5) tests at 93 MB/s.
T87Y (SATA2) tests at 102 MB/s.
28EV (SATA6) tests at 91 MB/s.
Y95A (SATA7) tests at 44 MB/s.  <---stands out as bad
 
Rebooted. Didn't move any drives.  I *think* the drives are assigned different names now- even though I didn't move anything.
PVPH (SATA5) tests at 91 MB/s.
T87Y (SATA2) tests at 102 MB/s.
28EV (SATA6) tests at 91 MB/s.
Y95A (SATA7) tests at 43 MB/s.  <---stands out as bad
 
Rebooted. Didn't move any drives.  I *think* the drives did not change name this time.
PVPH (SATA5) tests at 91 MB/s.
T87Y (SATA2) tests at 102 MB/s.
28EV (SATA6) tests at 90 MB/s.
Y95A (SATA7) tests at 43 MB/s.  <---stands out as bad
 
Swapped drives around w/o reboot:
PVPH (SATA7) tests at 78 MB/s.
T87Y (SATA4) tests at 92 MB/s.
28EV (SATA2) tests at 83 MB/s.
Y95A (SATA6) tests at 65 MB/s.
 
Seems like I'm getting overall slower performance now so I'll retest:
PVPH (SATA7) tests at 73 MB/s.
T87Y (SATA4) tests at 81 MB/s.
28EV (SATA2) tests at 85 MB/s.
Y95A (SATA6) tests at 64 MB/s.
 
Tested again to make sure:
PVPH (SATA7) tests at 72 MB/s.
T87Y (SATA4) tests at 83 MB/s.
28EV (SATA2) tests at 85 MB/s.
Y95A (SATA6) tests at 65 MB/s.
 
Rebooted, did not move any drives:
PVPH (SATA7) tests at 72 MB/s.
T87Y (SATA4) tests at 85 MB/s.
28EV (SATA2) tests at 87 MB/s.
Y95A (SATA6) tests at 67 MB/s.


One trend that I thought was seeing when I was troubleshooting using the RRD graphs to diagnose this is that the SATA4 and SATA7 drive bays were the only ones that acted up. But it seems like maybe only certain drive/bay combinations have the issue, which makes no sense to me. I am just making myself more confused.


First, that chart was NEVER EVER meant to be compared in the way you are comparing. gstat is far more accurate because it provides better resolution. SMART data and SMART testing would be the first thing I'd look at.
zpool iostat 1 will give you 1 second breakdowns for pool usage.

I will look into these commands you have mentioned.

Any other good diagnostics that I could try? I am really at a loss at this point.
 

scurrier

Patron
Joined
Jan 2, 2014
Messages
297
I should also mention that I am experiencing an issue where the serials do not appear in the "View Disks" screen even though I can get the serials via SMART commands. Not sure if these are related.
 

scurrier

Patron
Joined
Jan 2, 2014
Messages
297
I moved the four disks to the other onboard controller. None of them had very poor performance compared to the others, but still not great:

I put them into a mirrored stripe RAID 10 equivalent and I got the following zpool iostat 1 output while doing dd if=/dev/zero of=/mnt/testraid10/deleteme1.dat bs=2048k count=50k. Notice how it is woofing up and down in speed. This matches my many other observations and troubles with these disks. Maybe the disks just weren't cut out for this type of workload. Is this normal? Am I doing a good DD test?
Code:
----------  -----  -----  -----  -----  -----  -----
testraid10  77.4G  2.64T      0  1.40K      0  177M
----------  -----  -----  -----  -----  -----  -----
testraid10  77.4G  2.64T      0  1.06K      0  134M
----------  -----  -----  -----  -----  -----  -----
testraid10  77.4G  2.64T      0  1.33K      0  169M
----------  -----  -----  -----  -----  -----  -----
testraid10  77.4G  2.64T      0  1.03K      0  130M
----------  -----  -----  -----  -----  -----  -----
testraid10  77.4G  2.64T      0  1.57K      0  200M
----------  -----  -----  -----  -----  -----  -----
testraid10  77.4G  2.64T      0    944      0  116M
----------  -----  -----  -----  -----  -----  -----
testraid10  77.4G  2.64T      0    574      0  71.3M
----------  -----  -----  -----  -----  -----  -----
testraid10  77.4G  2.64T      0    663      0  82.6M
----------  -----  -----  -----  -----  -----  -----
testraid10  77.4G  2.64T      0    870      0  108M
----------  -----  -----  -----  -----  -----  -----
testraid10  77.4G  2.64T      0    826      0  103M
----------  -----  -----  -----  -----  -----  -----
testraid10  77.4G  2.64T      0    885      0  110M
----------  -----  -----  -----  -----  -----  -----
testraid10  77.4G  2.64T      0    695      0  75.7M
----------  -----  -----  -----  -----  -----  -----
testraid10  77.4G  2.64T      0    810      0  101M
----------  -----  -----  -----  -----  -----  -----
testraid10  77.4G  2.64T      0    728      0  91.0M
----------  -----  -----  -----  -----  -----  -----
testraid10  77.4G  2.64T      0    894      0  112M
----------  -----  -----  -----  -----  -----  -----
testraid10  77.4G  2.64T      0    855      0  107M
----------  -----  -----  -----  -----  -----  -----
testraid10  77.4G  2.64T      0    634      0  79.3M
----------  -----  -----  -----  -----  -----  -----
testraid10  77.4G  2.64T      0    779      0  97.4M
----------  -----  -----  -----  -----  -----  -----
testraid10  77.4G  2.64T      0    399      0  49.9M
----------  -----  -----  -----  -----  -----  -----
testraid10  77.4G  2.64T      0    370      0  46.3M
----------  -----  -----  -----  -----  -----  -----
testraid10  78.5G  2.64T      0    621      0  72.4M
----------  -----  -----  -----  -----  -----  -----
testraid10  78.5G  2.64T      0  1.35K      0  171M
----------  -----  -----  -----  -----  -----  -----
testraid10  78.5G  2.64T      0  1.23K      0  155M
----------  -----  -----  -----  -----  -----  -----
testraid10  78.5G  2.64T      0  1.36K      0  171M
----------  -----  -----  -----  -----  -----  -----
testraid10  78.5G  2.64T      0    991      0  122M
----------  -----  -----  -----  -----  -----  -----
testraid10  78.5G  2.64T      0  1.15K      0  145M
----------  -----  -----  -----  -----  -----  -----
testraid10  78.5G  2.64T      0  1.28K      0  161M
----------  -----  -----  -----  -----  -----  -----
testraid10  78.5G  2.64T      0  1.38K      0  174M
----------  -----  -----  -----  -----  -----  -----
testraid10  78.5G  2.64T      0    774      0  94.0M
----------  -----  -----  -----  -----  -----  -----
testraid10  78.5G  2.64T      0    785      0  85.9M
----------  -----  -----  -----  -----  -----  -----
testraid10  81.2G  2.64T      0  1.23K      0  155M
----------  -----  -----  -----  -----  -----  -----
testraid10  81.2G  2.64T      0  1.29K      0  163M
----------  -----  -----  -----  -----  -----  -----
testraid10  81.2G  2.64T      0  1.27K      0  160M
----------  -----  -----  -----  -----  -----  -----
testraid10  81.2G  2.64T      0  1.32K      0  167M
----------  -----  -----  -----  -----  -----  -----
testraid10  81.2G  2.64T      0  1.17K      0  148M
----------  -----  -----  -----  -----  -----  -----
testraid10  81.2G  2.64T      0    973      0  120M
----------  -----  -----  -----  -----  -----  -----
testraid10  81.2G  2.64T      0    815      0  99.0M
----------  -----  -----  -----  -----  -----  -----
testraid10  81.2G  2.64T      0    598      0  63.5M
----------  -----  -----  -----  -----  -----  -----
testraid10  82.2G  2.64T      0  1.18K      0  148M
----------  -----  -----  -----  -----  -----  -----
testraid10  82.2G  2.64T      0  1.25K      0  157M
----------  -----  -----  -----  -----  -----  -----
testraid10  82.2G  2.64T      0  1.48K      0  186M
----------  -----  -----  -----  -----  -----  -----
testraid10  82.2G  2.64T      0  1.24K      0  156M
----------  -----  -----  -----  -----  -----  -----
testraid10  82.2G  2.64T      0  1.34K      0  169M
----------  -----  -----  -----  -----  -----  -----
testraid10  82.2G  2.64T      0  1.29K      0  163M
----------  -----  -----  -----  -----  -----  -----
testraid10  82.2G  2.64T      0    877      0  107M
----------  -----  -----  -----  -----  -----  -----
testraid10  82.2G  2.64T      0    241      0  29.7M
----------  -----  -----  -----  -----  -----  -----
testraid10  82.2G  2.64T      0    248      0  30.6M
----------  -----  -----  -----  -----  -----  -----
testraid10  82.2G  2.64T      0    828      0  92.0M
----------  -----  -----  -----  -----  -----  -----
testraid10  82.2G  2.64T      0    946      0  117M
----------  -----  -----  -----  -----  -----  -----
testraid10  82.2G  2.64T      0  1.02K      0  128M
----------  -----  -----  -----  -----  -----  -----
testraid10  82.2G  2.64T      0  1.69K      0  214M
----------  -----  -----  -----  -----  -----  -----
testraid10  82.2G  2.64T      0  1.30K      0  164M
----------  -----  -----  -----  -----  -----  -----
testraid10  82.2G  2.64T      0  1.15K      0  147M
----------  -----  -----  -----  -----  -----  -----
testraid10  82.2G  2.64T      0    716      0  89.2M
----------  -----  -----  -----  -----  -----  -----
 

scurrier

Patron
Joined
Jan 2, 2014
Messages
297
Here's some more, this might be easier to read.
Code:
testraid10  32.4G  2.69T      0    970      0  110M
testraid10  32.4G  2.69T      0  1.35K      0  173M
testraid10  32.4G  2.69T      0    685      0  85.6M
testraid10  32.4G  2.69T      0    483      0  60.4M
testraid10  32.4G  2.69T      0    148      0  18.6M
testraid10  33.5G  2.69T      0  1.18K      0  147M
testraid10  33.5G  2.69T      0  1.30K      0  164M
testraid10  33.5G  2.69T      0  1.21K      0  152M
testraid10  33.5G  2.69T      0  1.35K      0  171M
testraid10  33.5G  2.69T      0  1.32K      0  167M
testraid10  33.5G  2.69T      0  1.29K      0  162M
testraid10  33.5G  2.69T      0  1.16K      0  146M
testraid10  33.5G  2.69T      0  1.21K      0  153M
testraid10  33.5G  2.69T      0    996      0  121M
testraid10  33.5G  2.69T      0    613      0  75.6M
testraid10  36.2G  2.68T      0    708      0  77.5M
testraid10  36.2G  2.68T      0  1.29K      0  163M
testraid10  36.2G  2.68T      0  1.30K      0  164M
testraid10  36.2G  2.68T      0  1.25K      0  158M
testraid10  36.2G  2.68T      0  1.37K      0  173M
testraid10  36.2G  2.68T      0  1.28K  4.00K  161M
testraid10  36.2G  2.68T      0  1.25K      0  158M
testraid10  36.2G  2.68T      0  1.17K      0  146M
testraid10  36.2G  2.68T      0    558      0  68.7M
testraid10  37.4G  2.68T      0    932      0  105M
testraid10  37.4G  2.68T      0  1.04K      0  131M
testraid10  37.4G  2.68T      0  1.23K      0  155M
testraid10  37.4G  2.68T      0  1.25K      0  157M
testraid10  37.4G  2.68T      0  1.39K      0  175M
testraid10  37.4G  2.68T      0  1.36K      0  172M
testraid10  37.4G  2.68T      0  1.20K      0  150M
testraid10  37.4G  2.68T      0    528      0  65.1M
testraid10  37.4G  2.68T      0  1.06K      0  124M
testraid10  37.4G  2.68T      0  1.23K  1.58K  155M
testraid10  37.4G  2.68T      0    817      0  101M
testraid10  37.4G  2.68T      0    315      0  39.0M
testraid10  37.4G  2.68T      0  1.82K      0  231M
testraid10  37.4G  2.68T      0    707      0  87.8M
testraid10  37.4G  2.68T      0  1.03K      0  131M
testraid10  37.4G  2.68T      0  1.12K      0  142M
testraid10  37.4G  2.68T      0    906      0  113M
testraid10  37.4G  2.68T      0    772      0  95.9M
testraid10  37.4G  2.68T      0    801      0  98.2M
testraid10  37.4G  2.68T      0    770      0  95.9M
testraid10  37.4G  2.68T      0    743      0  92.5M
testraid10  37.4G  2.68T      0    683      0  85.2M
testraid10  37.4G  2.68T      0    670      0  83.4M
testraid10  37.4G  2.68T      0    638      0  79.6M
testraid10  37.4G  2.68T      0    736      0  91.7M
testraid10  37.4G  2.68T      0    496      0  52.0M
testraid10  37.4G  2.68T      0  1007      0  126M
testraid10  37.4G  2.68T      0    787      0  98.4M
testraid10  37.4G  2.68T      0    371      0  46.3M
testraid10  37.4G  2.68T      0    488      0  61.1M
testraid10  37.4G  2.68T      0    295      0  37.0M
testraid10  37.4G  2.68T      0    398      0  49.8M
testraid10  37.4G  2.68T      0    310      0  38.8M
testraid10  38.5G  2.68T      0    889      0  104M
testraid10  38.5G  2.68T      0  1.24K      0  157M
testraid10  38.5G  2.68T      0  1.33K      0  168M
testraid10  38.5G  2.68T      0  1.15K      0  145M
testraid10  38.5G  2.68T      0  1.30K      0  165M
testraid10  38.5G  2.68T      0  1.29K      0  163M
testraid10  38.5G  2.68T      0  1.02K      0  129M
testraid10  38.5G  2.68T      0  1.29K      0  161M
testraid10  38.5G  2.68T      0    648      0  79.8M
testraid10  41.1G  2.68T      0    783      0  86.3M
testraid10  41.1G  2.68T      0  1.36K      0  171M
testraid10  41.1G  2.68T      0  1.40K      0  177M
testraid10  41.1G  2.68T      0  1.28K      0  161M
testraid10  41.1G  2.68T      0  1.36K      0  172M
testraid10  41.1G  2.68T      0  1.33K      0  168M
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Not a clue since you are just posting random command output and providing ZERO information on what you are doing. You're basically saying "see.. look at my broken shit" with no context at all.

And like I said before... SMART is what I'd look at first, yet you provided nothing on that...
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
As cyberjock suggests, do a smartctl -a (device) and post the results of each, in separate code blocks or on Pastebin.

I'm not sure what the idle-sleep timer is on the Seagate Greens but generally speaking "low power" equals "low performance"
 

scurrier

Patron
Joined
Jan 2, 2014
Messages
297
And like I said before... SMART is what I'd look at first, yet you provided nothing on that...

Sorry I looked at SMART myself but did not see much besides a few command_timeout errors and some UDMA_CRC_Error_Count errors. They are very low numbers which aren't alarming to me considering that these drives have been running for 4 years. I would appreciate others' takes on this data though. Here's the SMART output for the disks.
Code:
=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda LP
Device Model:     ST31500541AS
Serial Number:    6XW0PVPH
LU WWN Device Id: 5 000c50 01b863741
Firmware Version: CC35
User Capacity:    1,500,301,910,016 bytes [1.50 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    5900 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Tue Mar 11 16:03:52 2014 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
 
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
 
General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  653) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 380) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x103f) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.
 
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   114   099   006    Pre-fail  Always       -       82231326
  3 Spin_Up_Time            0x0003   100   100   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   098   098   020    Old_age   Always       -       2868
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   080   060   030    Pre-fail  Always       -       103585853
  9 Power_On_Hours          0x0032   061   061   000    Old_age   Always       -       34888
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       67
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   099   000    Old_age   Always       -       1
189 High_Fly_Writes         0x003a   099   099   000    Old_age   Always       -       1
190 Airflow_Temperature_Cel 0x0022   080   056   045    Old_age   Always       -       20 (Min/Max 18/24)
194 Temperature_Celsius     0x0022   020   044   000    Old_age   Always       -       20 (0 15 0 0 0)
195 Hardware_ECC_Recovered  0x001a   045   038   000    Old_age   Always       -       82231326
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       49164490648223
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       3500702345
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       1956796253
 
SMART Error Log Version: 1
No Errors Logged
 
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     34836         -
 
SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


Code:
=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda LP
Device Model:     ST31500541AS
Serial Number:    6XW0T87Y
LU WWN Device Id: 5 000c50 01ba868a4
Firmware Version: CC35
User Capacity:    1,500,301,910,016 bytes [1.50 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    5900 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Tue Mar 11 16:07:18 2014 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
 
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
 
General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  643) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 370) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x103f) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.
 
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   117   099   006    Pre-fail  Always       -       164601148
  3 Spin_Up_Time            0x0003   100   100   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   099   099   020    Old_age   Always       -       1973
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   078   060   030    Pre-fail  Always       -       81456548
  9 Power_On_Hours          0x0032   071   071   000    Old_age   Always       -       25769
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       56
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   097   097   000    Old_age   Always       -       3
190 Airflow_Temperature_Cel 0x0022   080   058   045    Old_age   Always       -       20 (Min/Max 19/23)
194 Temperature_Celsius     0x0022   020   042   000    Old_age   Always       -       20 (0 17 0 0 0)
195 Hardware_ECC_Recovered  0x001a   051   039   000    Old_age   Always       -       164601148
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       86616605467331
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       2529618934
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       1258677798
 
SMART Error Log Version: 1
No Errors Logged
 
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     25715         -
 
SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


Code:
=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda LP
Device Model:     ST31500541AS
Serial Number:    9XW028EV
LU WWN Device Id: 5 000c50 01a132819
Firmware Version: CC35
User Capacity:    1,500,301,910,016 bytes [1.50 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    5900 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Tue Mar 11 16:07:57 2014 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
 
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
 
General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  653) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 383) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x103f) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.
 
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   118   099   006    Pre-fail  Always       -       185202905
  3 Spin_Up_Time            0x0003   100   100   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   098   098   020    Old_age   Always       -       2824
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   079   060   030    Pre-fail  Always       -       95325169
  9 Power_On_Hours          0x0032   061   061   000    Old_age   Always       -       34880
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       67
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   099   000    Old_age   Always       -       2
189 High_Fly_Writes         0x003a   085   085   000    Old_age   Always       -       15
190 Airflow_Temperature_Cel 0x0022   081   054   045    Old_age   Always       -       19 (Min/Max 18/22)
194 Temperature_Celsius     0x0022   019   046   000    Old_age   Always       -       19 (0 16 0 0 0)
195 Hardware_ECC_Recovered  0x001a   050   030   000    Old_age   Always       -       185202905
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       2
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       15169824500633
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       3599353894
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       107579849
 
SMART Error Log Version: 1
No Errors Logged
 
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     34827         -
 
SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


Code:
=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda LP
Device Model:     ST31500541AS
Serial Number:    6XW0Y95A
LU WWN Device Id: 5 000c50 01ede4991
Firmware Version: CC35
User Capacity:    1,500,301,910,016 bytes [1.50 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    5900 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Tue Mar 11 16:08:32 2014 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
 
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
 
General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  653) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 390) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x103f) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.
 
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   117   099   006    Pre-fail  Always       -       148512518
  3 Spin_Up_Time            0x0003   100   100   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   098   098   020    Old_age   Always       -       2058
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   080   060   030    Pre-fail  Always       -       109049767
  9 Power_On_Hours          0x0032   071   071   000    Old_age   Always       -       25658
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       36
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   099   000    Old_age   Always       -       1
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   080   055   045    Old_age   Always       -       20 (Min/Max 19/23)
194 Temperature_Celsius     0x0022   020   045   000    Old_age   Always       -       20 (0 18 0 0 0)
195 Hardware_ECC_Recovered  0x001a   051   045   000    Old_age   Always       -       148512518
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       269869975085892
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       2003166542
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       2369612417
 
SMART Error Log Version: 1
No Errors Logged
 
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     25604         -
 
SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


I'm not sure what the idle-sleep timer is on the Seagate Greens but generally speaking "low power" equals "low performance"
My understanding of green drives is that they are kind of a myth. Yes, there is some power to be saved by running lower RPMs, which is going to translate to higher rotational latency and lost performance. But, they really don't save much power. I think that they take longer to spin up too, because they try to limit the max current draw during spinup. If I could buy again I probably wouldn't get low power since the power savings isn't worth the performance loss.

That being said, for these drives being what they are, I'm OK with low performance. I'm not OK with is getting less out of the drives than they are capable of. That's why I'm trying to figure out why they have this whiplash performance where they shoot up in speed and then slow down. It doesn't seem right or normal. Maybe it has something to do with the cache? I don't know. Maybe I should figure out how to disable NCQ? Again, I don't know. I appreciate anyone's comments on this.

Thanks guys.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Not a clue, but almost every disk has a non-zero value for "command timeout". That's a bad thing, and indicative of some kind of problem. Now if that problem is hardware or software(aka driver or misconfigured BIOS setting) is anyone's guess. They are relatively old though, all being over 25k hours. Other than the command timeouts they appear to be in good shape. I've never seen a non-zero for command timeout from anyone here before(and I've seen alot of disks).

And despite you fears of lower performance, those 4 drives should be able to saturate Gb LAN. So unless you plan to run 10Gb LAN, those drives aren't going to be your bottleneck.. your NIC is.

To be honest, my first guess is user-error of some sort. By user error I mean something isn't configured properly somewhere. Either the BIOS, your RAID/IT controller, or FreeNAS. I'm not sure which because there's so many ways you could have something wrong I don't even know where to start narrowing down the problem. It could even be some setting I'd never consider to be out of place. It happens.

Don't take this personally, but its far easier to get stuff wrong than to get stuff right. And at least once a week someone shows up here that set some setting to something so incredibly stupid I couldn't have guessed it given 100 guesses and offered 1 million dollars.

What I'd do next is:

1. Destroy your pool(if you can).
2. Open 4 ssh windows and do dd if=/dev/XXXX of=/dev/null bs=1m for all 4 disks at the same time. See what gstat shows for the disks(A screenshot or paste here would be nice). This will simply prove that the 4 disks can provide good speed individually(and they'd better since this test is their bread and butter for setting benchmarks).
 

scurrier

Patron
Joined
Jan 2, 2014
Messages
297
1. Destroy your pool(if you can).
2. Open 4 ssh windows and do dd if=/dev/XXXX of=/dev/null bs=1m for all 4 disks at the same time. See what gstat shows for the disks(A screenshot or paste here would be nice). This will simply prove that the 4 disks can provide good speed individually(and they'd better since this test is their bread and butter for setting benchmarks).

Here are some gstat screenshots done while using the dd command on all disks, as you suggested.

index.php


index.php


index.php


That third screenshot shows something that I noticed happening every 10 seconds or so when I first started the test. Disk ada2 would hiccup like you can see in the screenshot. I happened to catch the worst one I ever saw in this third screenshot. Most of them were more like dips to 65000 or 75000 kBps.

Now that I'm about 40 minutes in, the disks seems to ride along pretty evenly, independently slowing or speeding up +/- 10000 kBps or so.

But I did see something strange about 30 minutes ago. I didn't catch it in a screenshot, but I'll reproduce it here.

Code:
dT: 1.001s  w: 1.000s  filter: ada?.$
L(q)  ops/s    r/s  kBps  ms/r    w/s  kBps  ms/w  %busy Name
    1    651    651  83373    1.5      0      0    0.0  99.7| ada0
    1    781    781  99996    1.3      0      0    0.0  99.5| ada1
*
    1    746    746  95520    1.3      0      0    0.0  99.6| ada3


Again, ada2 seems to be behaving differently than the others. Seems pretty damning for ada2 (serial 9XW028EV). I looked back at the SMART data and that is the only disk that had a non-zero UDMA_CRC_Error_Count. When I posted the SMART data it had a value of 2 and it still has a value of 2. Same for the Command_Timeout parameter.

The only thing I can think to do now is to let FreeNAS grind on these disks for a while and see if it will force a failure or significant change in the SMART data. I could also try moving them around to different bays and see if their behavior changes. I'm not married to these disks since they are 4 years old but I would like to still use them if it's feasible. Maybe I could relegate them to a less critical application.

Is there any way to do a longer test that will somehow let me plot a time history?
 

Attachments

  • 2014-03-11 21_16_05-root@freenas_~.png
    2014-03-11 21_16_05-root@freenas_~.png
    4.6 KB · Views: 353
  • 2014-03-11 21_16_31-root@freenas_~.png
    2014-03-11 21_16_31-root@freenas_~.png
    4.7 KB · Views: 324
  • 2014-03-11 21_18_15-root@freenas_~.png
    2014-03-11 21_18_15-root@freenas_~.png
    4.5 KB · Views: 350

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
UDMA errors normally don't cause problems like you are seeing. A disk with that many hours on it is going to have a few UDMA errors. That's pretty much par for the course.

If you can, I'd try creating a 3 disk RAIDZ1 without ada2 and see how performance changes. My guess is it'll perform fine and you'll have basically validated that ada2(or something unique to ada2) is the problem.
 

scurrier

Patron
Joined
Jan 2, 2014
Messages
297
Thanks. I'll follow up on this after I finish using them to temporarily move some data around.
 
Status
Not open for further replies.
Top