Questions about SolNet Drive Burn in

Status
Not open for further replies.

tealcomp

Explorer
Joined
Sep 5, 2016
Messages
59
Hi Fellow Members:

I have been diligently burning in my hardware for about the past month. Everything was going quite well with all of the various tests. Once I successfully completed the SMART testing (short, long conveyance) and badblocks, I decided to put some mileage on the new drives with the solnet array tests. The very first time I ran the solnet test, I ran into a few issues. I have the Fractal Design 804, which as most of you know has 3 fans that are controlled with a built in three way switch to control the speed of those 3 fans. I noticed that when I would change the speeds with the switch in the back, it would trigger some sense read errors on the drives attached to the M1015. I was into the last part of the test but decided to abort the test and investigate. While I am quite sure the problem was coming from some electrical interference, I have never seen this kind of issue before. I ended up re-routing the cables away from that switch and from what I can tell that has resolved the issue. Now with all of that said, I restarted the solnet test. From what I can tell, everything looked fine up until the last test. I saw a mix of SLOW and FAST results (see below)..I did a search on the forum and have noticed others have experienced similar behavior.

First solnet array output results

Code:
sol.net disk array test v2

1) Use all disks (from camcontrol)
2) Use selected disks (from camcontrol|grep)
3) Specify disks
4) Show camcontrol list

Option:
Enter grep match pattern (e.g. ST150176):
Selected disks: da0 da1 da2 da3 da4 da5 da6 da7 ada0 ada1
<ATA WDC WD60EFRX-68L 0A82>        at scbus0 target 0 lun 0 (pass0,da0)
<ATA WDC WD60EFRX-68L 0A82>        at scbus0 target 1 lun 0 (pass1,da1)
<ATA WDC WD60EFRX-68L 0A82>        at scbus0 target 2 lun 0 (pass2,da2)
<ATA WDC WD60EFRX-68L 0A82>        at scbus0 target 3 lun 0 (pass3,da3)
<ATA WDC WD60EFRX-68L 0A82>        at scbus0 target 4 lun 0 (pass4,da4)
<ATA WDC WD60EFRX-68L 0A82>        at scbus0 target 5 lun 0 (pass5,da5)
<ATA WDC WD60EFRX-68L 0A82>        at scbus0 target 6 lun 0 (pass6,da6)
<ATA WDC WD60EFRX-68L 0A82>        at scbus0 target 7 lun 0 (pass7,da7)
<WDC WD60EFRX-68L0BN1 82.00A82>    at scbus1 target 0 lun 0 (pass8,ada0)
<WDC WD60EFRX-68L0BN1 82.00A82>    at scbus2 target 0 lun 0 (pass9,ada1)
Is this correct? (y/N): Performing initial serial array read (baseline speeds)
Mon Jan 30 22:21:16 EST 2017
Mon Jan 30 22:43:50 EST 2017
Completed: initial serial array read (baseline speeds)

Array's average speed is 165.032 MB/sec per disk

Disk    Disk Size  MB/sec %ofAvg
------- ---------- ------ ------
da0      5723166MB    165    100
da1      5723166MB    174    105
da2      5723166MB    164    100
da3      5723166MB    164     99
da4      5723166MB    159     96
da5      5723166MB    161     98
da6      5723166MB    172    104
da7      5723166MB    165    100
ada0     5723166MB    161     97
ada1     5723166MB    165    100

Performing initial parallel array read
Mon Jan 30 22:43:50 EST 2017
The disk da0 appears to be 5723166 MB.     
Disk is reading at about 161 MB/sec     
This suggests that this pass may take around 593 minutes

                   Serial Parall % of
Disk    Disk Size  MB/sec MB/sec Serial
------- ---------- ------ ------ ------
da0      5723166MB    165    161     98
da1      5723166MB    174    169     97
da2      5723166MB    164    160     97
da3      5723166MB    164    160     97
da4      5723166MB    159    154     97
da5      5723166MB    161    158     98
da6      5723166MB    172    168     98
da7      5723166MB    165    160     97
ada0     5723166MB    161    157     98
ada1     5723166MB    165    165    100

Awaiting completion: initial parallel array read
Tue Jan 31 11:41:04 EST 2017
Completed: initial parallel array read

Disk's average time is 44910 seconds per disk

Disk    Bytes Transferred Seconds %ofAvg
------- ----------------- ------- ------
da0         6001175126016   45251    101
da1         6001175126016   42373     94
da2         6001175126016   45310    101
da3         6001175126016   45488    101
da4         6001175126016   46635    104
da5         6001175126016   46213    103
da6         6001175126016   43211     96
da7         6001175126016   45676    102
ada0        6001175126016   45760    102
ada1        6001175126016   43181     96

Performing initial parallel seek-stress array read
Tue Jan 31 11:41:04 EST 2017
The disk da0 appears to be 5723166 MB.     
Disk is reading at about 159 MB/sec     
This suggests that this pass may take around 599 minutes

                   Serial Parall % of
Disk    Disk Size  MB/sec MB/sec Serial
------- ---------- ------ ------ ------
da0      5723166MB    165    159     96
da1      5723166MB    174    166     96
da2      5723166MB    164    160     97
da3      5723166MB    164    162     99
da4      5723166MB    159    160    101
da5      5723166MB    161    175    109
da6      5723166MB    172    174    101
da7      5723166MB    165    173    105
ada0     5723166MB    161    226    141
ada1     5723166MB    165    228    138

Awaiting completion: initial parallel seek-stress array read
Fri Feb  3 11:37:27 EST 2017
Completed: initial parallel seek-stress array read

Disk's average time is 113705 seconds per disk

Disk    Bytes Transferred Seconds %ofAvg
------- ----------------- ------- ------
da0         6001175126016   56652     50 ++FAST++
da1         6001175126016  133325    117 --SLOW--
da2         6001175126016  109032     96
da3         6001175126016   56621     50 ++FAST++
da4         6001175126016  184004    162 --SLOW--
da5         6001175126016   56756     50 ++FAST++
da6         6001175126016  150088    132 --SLOW--
da7         6001175126016   67028     59 ++FAST++
ada0        6001175126016  146952    129 --SLOW--
ada1        6001175126016  176589    155 --SLOW--


I ran the long SMART tests on all of the drives, to ensure that there were no issues and nothing stood out. There were no pending sector counts or retries, frankly nothing that would be worthy of a return or RMA. (those results are attached).
I did make one mistake and had an array defined and mounted raidZ2 (just a test array at this point); but my thought was it could have interfered with end results of solnet array testing.


So, I decided to run the solnet array test for a second complete time, just to see if anything looked out the ordinary. I made certain there were no volumes mounted this time. Again, everything seemed to look OK, (see below) until the last test and that's where once again, the results looked questionable.

Second solnet array output results

Code:

sol.net disk array test v2

1) Use all disks (from camcontrol)
2) Use selected disks (from camcontrol|grep)
3) Specify disks
4) Show camcontrol list

Option:
Enter grep match pattern (e.g. ST150176):
Selected disks: da0 da1 da2 da3 da4 da5 da6 da7 ada0 ada1
<ATA WDC WD60EFRX-68L 0A82>        at scbus0 target 0 lun 0 (pass0,da0)
<ATA WDC WD60EFRX-68L 0A82>        at scbus0 target 1 lun 0 (pass1,da1)
<ATA WDC WD60EFRX-68L 0A82>        at scbus0 target 2 lun 0 (pass2,da2)
<ATA WDC WD60EFRX-68L 0A82>        at scbus0 target 3 lun 0 (pass3,da3)
<ATA WDC WD60EFRX-68L 0A82>        at scbus0 target 4 lun 0 (pass4,da4)
<ATA WDC WD60EFRX-68L 0A82>        at scbus0 target 5 lun 0 (pass5,da5)
<ATA WDC WD60EFRX-68L 0A82>        at scbus0 target 6 lun 0 (pass6,da6)
<ATA WDC WD60EFRX-68L 0A82>        at scbus0 target 7 lun 0 (pass7,da7)
<WDC WD60EFRX-68L0BN1 82.00A82>    at scbus1 target 0 lun 0 (pass8,ada0)
<WDC WD60EFRX-68L0BN1 82.00A82>    at scbus2 target 0 lun 0 (pass9,ada1)
Is this correct? (y/N): Performing initial serial array read (baseline speeds)
Sun Feb  5 06:31:51 PST 2017
Sun Feb  5 09:54:24 EST 2017
Completed: initial serial array read (baseline speeds)

Array's average speed is 165.971 MB/sec per disk

Disk    Disk Size  MB/sec %ofAvg
------- ---------- ------ ------
da0      5723166MB    165    100
da1      5723166MB    174    105
da2      5723166MB    165     99
da3      5723166MB    164     99
da4      5723166MB    159     96
da5      5723166MB    162     98
da6      5723166MB    172    104
da7      5723166MB    166    100
ada0     5723166MB    163     98
ada1     5723166MB    170    102

Performing initial parallel array read
Sun Feb  5 09:54:24 EST 2017
The disk da0 appears to be 5723166 MB.     
Disk is reading at about 166 MB/sec     
This suggests that this pass may take around 573 minutes

                   Serial Parall % of
Disk    Disk Size  MB/sec MB/sec Serial
------- ---------- ------ ------ ------
da0      5723166MB    165    166    101
da1      5723166MB    174    175    100
da2      5723166MB    165    165    100
da3      5723166MB    164    166    101
da4      5723166MB    159    159    100
da5      5723166MB    162    163    101
da6      5723166MB    172    173    101
da7      5723166MB    166    166    100
ada0     5723166MB    163    162    100
ada1     5723166MB    170    170    100

Awaiting completion: initial parallel array read
Sun Feb  5 22:42:21 EST 2017
Completed: initial parallel array read

Disk's average time is 44378 seconds per disk

Disk    Bytes Transferred Seconds %ofAvg
------- ----------------- ------- ------
da0         6001175126016   44721    101
da1         6001175126016   41849     94
da2         6001175126016   44795    101
da3         6001175126016   44958    101
da4         6001175126016   46078    104
da5         6001175126016   45677    103
da6         6001175126016   42678     96
da7         6001175126016   45132    102
ada0        6001175126016   45206    102
ada1        6001175126016   42684     96

Performing initial parallel seek-stress array read
Sun Feb  5 22:42:22 EST 2017
The disk da0 appears to be 5723166 MB.     
Disk is reading at about 160 MB/sec     
This suggests that this pass may take around 595 minutes

                   Serial Parall % of
Disk    Disk Size  MB/sec MB/sec Serial
------- ---------- ------ ------ ------
da0      5723166MB    165    161     97
da1      5723166MB    174    171     98
da2      5723166MB    165    179    109
da3      5723166MB    164    180    110
da4      5723166MB    159    179    113
da5      5723166MB    162    156     97
da6      5723166MB    172    181    105
da7      5723166MB    166    162     98
ada0     5723166MB    163    165    102
ada1     5723166MB    170    245    144

Awaiting completion: initial parallel seek-stress array read
Thu Feb  9 02:16:56 EST 2017
Completed: initial parallel seek-stress array read

Disk's average time is 115534 seconds per disk

Disk    Bytes Transferred Seconds %ofAvg
------- ----------------- ------- ------
da0         6001175126016  171068    148 --SLOW--
da1         6001175126016   67394     58 ++FAST++
da2         6001175126016  118882    103
da3         6001175126016   56520     49 ++FAST++
da4         6001175126016   57467     50 ++FAST++
da5         6001175126016  117406    102
da6         6001175126016  178994    155 --SLOW--
da7         6001175126016  134442    116 --SLOW--
ada0        6001175126016  137955    119 --SLOW--
ada1        6001175126016  115210    100

Taking a closer look, here are the results between each run of the solnet tests.



PASS 1 PASS 2
Code:
Disk    Bytes Transferred Seconds %ofAvg                    Disk    Bytes Transferred Seconds %ofAvg
------- ----------------- ------- -                         ------- ----------------- ------- ------
da0     6001175126016   56652     50 ++FAST++                da0        6001175126016  171068    148 --SLOW--
da1     6001175126016  133325    117 --SLOW--                da1        6001175126016   67394     58 ++FAST++
da2     6001175126016  109032     96                         da2        6001175126016  118882    103
da3     6001175126016   56621     50 ++FAST++                da3        6001175126016   56520     49 ++FAST++
da4     6001175126016  184004    162 --SLOW--                da4        6001175126016   57467     50 ++FAST++
da5     6001175126016   56756     50 ++FAST++                da5        6001175126016  117406    102
da6     6001175126016  150088    132 --SLOW--                da6        6001175126016  178994    155 --SLOW--
da7     6001175126016   67028     59 ++FAST++                da7        6001175126016  134442    116 --SLOW--
ada0    6001175126016  146952    129 --SLOW--               ada0        6001175126016  137955    119 --SLOW--
ada1    6001175126016  176589    155 --SLOW--               ada1        6001175126016  115210    100



Maybe they are fine, and even if they aren't, if the SMART tests come back clean, I guess there is not a whole lot I can do other than make sure email alerting is set up (check) and that I am doing the proper tests at the proper intervals.
I would very much appreciate any input you all might have; I know that @cyberjock has this same model of drive, so I am curious if anything here gives reason for concern?

Incidentally, I attached the output of the smartctl -x files rather than embed them. If you would prefer I embed them as code I can do that instead.

I am running another long SMART test, and I will post those results tomorrow.

Thanks for your feedback!

-Dan
 

Attachments

  • ada0_long_20170204.0928.txt
    5.1 KB · Views: 306
  • ada1_long_20170204.0931.txt
    5.1 KB · Views: 282
  • da0_long_20170204.0932.txt
    6.1 KB · Views: 298
  • da1_long_20170204.0932.txt
    5.1 KB · Views: 247
  • da2_long_20170204.0932.txt
    5.1 KB · Views: 289
  • da3_long_20170204.0932.txt
    5.1 KB · Views: 288
  • da4_long_20170204.0933.txt
    5.1 KB · Views: 291
  • da5_long_20170204.0933.txt
    5.1 KB · Views: 285
  • da6_long_20170204.0933.txt
    5.1 KB · Views: 291
  • da7_long_20170204.0933.txt
    5.1 KB · Views: 306
Joined
Jan 7, 2015
Messages
1,155
While im not 100% sure there is anything that is a cause for alarm, I think these readings are fairly common. My first time around I bought drives from Amazon and when several of them tested slow, I returned them, and then the replacement drives basically did the same.

On the several test I have run for myself and others using this testing method, there always seems to be a fair amount of "slow" drives. At a glance it appears that that the drives that tested slow on your first pass, then tested fast on the second, save da6, ada0. If you didnt shuffle cables or anything, if drives were faulty they should test "slow" (fairly) consistently it seems to me.

In my opinion, if the drives all pass a FULL badblocks 0/0/0, and SMART looks good, then I think you are good to go. Set up your testing routines, at least 2 shorts, and at least 1 long per month--keep temps as low as you can. Keep an eye out on your terminal window/dmesg for anything goofy going on. Based on what you have shown here, I would put some data on these drives, yeehaw!

You got yourself a great looking system there bud!
 
Joined
Apr 9, 2015
Messages
1,258
Agreed with the above post.

The fan speed switch may be backfeeding noise (a/c current) into the power so if you noticed weird tests with it then either power the fans direct, off the board or replace them with ones that can. You probably want as much airflow over the drives as you can anyway so why let a controller rob power and slow them down.
 

tealcomp

Explorer
Joined
Sep 5, 2016
Messages
59
Thanks @John Digital and @nightshade00013, I appreciate your valued input.

I actually have (3) 120MM's that came with the case; those (3) are in the motherboard compartment, (2) front and (1) back. In the drive bay chamber, I added (2) 120MM Cougar's and (1) 140MM Noctua for exhausting the drive chamber. Mostly, I am happy with the temperatures I have been seeing (under 40c, while under load) though that load has not been a scrub, since I have not made it that far yet. All of this testing takes some time LOL. I have been trying to determine if I have seen the highest temperatures I should see with the badblocks and solnet testing. Do either of you think the scrub is going to be more aggressive (temperature wise) or about the same? I tried searching for this information on the forum, but haven't had any success in finding my answers.

I may look at doing some design changes on the cooling if I want to get the temperatures a bit lower. All told, it looks like the drives at idle are in the lower 30's and then when under the load I described above, a few of the 10 drives hit a high of 38' C, whereas most others are in the 35-37 C range.

I am using the fan control script as well that helps with ramping the fans up and down as needed.

Thanks!
-Dan
 
Joined
Jan 7, 2015
Messages
1,155
I suspect resilvering is the most strenuous activity your drives will ever have to face under normal circumstances, with scrubbing right behind it. 40c seems like it should be the hottest your drives ever get with all that cooling, but if your fans just blow around hot air, it isnt going to make tons of difference. I have three fans in front sucking in cool over the drive exterior and two 120 in the rear to exhaust. During the warmer months I pull off my side panels on my case even. My server resides in my basement and the ambient temp never really gets above 70F, except for maybe the dog days of summer. Ive had hard drives last about seven years where I know I saw some max temps of 65c, this is back in IDE days, and am unsure if thats just the way it was or if I was negligent in regards to temps. Mine hover now around 30 and short of some form of advanced cooling that im un aware of, I feel like anywhere under 40 is well within spec and the fact you have the warranty on reds, I wouldn't stress over it. 10 spinning motors stacked on each other are hot. It is what it is. I would just make sure you have done what you can to keep the drives as cool as you can. As numerous persons have pointed out, there is strong correlation between avg drive temps and premature failure rates. With that said, those average temps are most likely not 40c. My opinion is that you are done. Testing is over. You've done your due diligence and in the end it not only will provide you with piece of mind, you will get maximum life to $$$ spent, and that is what its all about.

One last thing. Is noise an issue? Can you do away with fan scripts and let the fans run at full blast? Not that a few less degrees is any sort of big deal in the grand scheme, but im just fully on board with cooler=better, cold=gold. If noise isnt an issue, let those props fly.

Cheers bud!
 

tealcomp

Explorer
Joined
Sep 5, 2016
Messages
59
Thanks!

I am not quite done though; I need to test the spares and I decided since I had 2 4TB spares from my last QNAP build, I might as well put them through the testing as well; lest I find when I need them a 2 year old spare that has never been used won't even start :) So, I need to run those 4 disks through the process. Once I do that, I need to probably do some CPU stress testing. I did the memory for several days but figured I needed to go back and do the CPU to keep things tidy. After that, I can finally build the array. I have already built the array once and done some basic testing and frankly the Z3 configuration worked really well. While the Z2 would no doubt give me a bit more storage, I kinda like the idea of that level of redundancy with Z3. So, I am just having some fun with my expensive "habit".

Oh regarding your question with the fans; this server is probably going to be tucked away in a corner of my family room; so I don't want to run the fans full bore when we are present, although during the badblocks testing I did do that just to keep the temps within the <40' C temperature guidance. I spent a fair amount of my adult life in server rooms LOL, so I am not really to sensitive to the fans.

I am somewhat surprised I haven't heard input from any of the other "experts" on the forum, made me wonder if I was posting in the wrong spot or the like :).

I do plan to post some pics soon, just need to crunch them down and clean them up.

Thanks guys, I will be in touch..

-Dan
 
Status
Not open for further replies.
Top