Harddrive burn-in questions

Status
Not open for further replies.

Ceetan

Contributor
Joined
Apr 29, 2016
Messages
139
I am in the middle of my harddrive burn-in process. I did not notice it at first but in the dev list, all drives are listed with passX, daX save for the bootdrive wich is the only drive connected via the chipset SATA ports (the rest are connected via LSI3008 SAS controller with P12 IT-firmware) and da5 which also has this reverse order.

Code:
[root@Darwin] ~# tmxo show-buffer
tmxo: Command not found.
[root@Darwin] ~# tmux show-buffer
root@Darwin] ~# camcontrol devlist
<ATA WDC WD40EFRX-68W 0A82>		at scbus0 target 0 lun 0 (pass0,da0)
<ATA WDC WD40EFRX-68W 0A82>		at scbus0 target 1 lun 0 (pass1,da1)
<ATA WDC WD40EFRX-68W 0A82>		at scbus0 target 2 lun 0 (pass2,da2)
<ATA WDC WD40EFRX-68W 0A82>		at scbus0 target 3 lun 0 (pass3,da3)
<ATA WDC WD40EFRX-68W 0A82>		at scbus0 target 4 lun 0 (pass4,da4)
<ATA WDC WD40EFRX-68W 0A82>		at scbus0 target 5 lun 0 (da5,pass5)
<ATA WDC WD40EFRX-68W 0A82>		at scbus0 target 6 lun 0 (pass6,da6)
<ATA WDC WD40EFRX-68W 0A82>		at scbus0 target 7 lun 0 (pass7,da7)
<Corsair CSSD-F90GB2 2.0>		  at scbus2 target 0 lun 0 (ada0,pass8)


The smart test for da5 returned the following message:
Code:
Device: /dev/da5 [SAT], Self-Test Log error count increased from 0 to 1


This naturally made me suspicious, but I thought I would run all the tests on the other drives before I RMA:ed it, so I started badblocks testing.

As I half expected da5 started showing ridiculous amounts of errors. The test has now been going on for about 120 houers (been running more tests than I have threads so I expect it to take a while) All WDRED drives except for da5 are so far without error which is pleasing. However, I also received the following output from FreeNas email smartservices (I think?)

http://pastebin.com/aSa6YFWf

Since I am abit of a novice on these things I have the following questions:

1. Is there any reason why da5 (and the bootdrive) have their devicename and their pass listed in reversed order? Does it mean anything of significance?

2. Does the pastbin output log indicate that the error could be outside the drive? (E.g the SAS-controller)? Seems unlikely since all the other drives seem to be working.
 

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
The smart test for da5 returned the following message:
Code:
Device: /dev/da5 [SAT], Self-Test Log error count increased from 0 to 1
My interpretation is that a self-test did not complete properly and generated this message.
run and post output to give additional clues
Code:
smartctl -a /dev/da5
 

Ceetan

Contributor
Joined
Apr 29, 2016
Messages
139
I also got the folowing input via mail just today.
Code:
Device: /dev/da5 [SAT], failed to read SMART Attribute Data
Device: /dev/da5 [SAT], Self-Test Log error count increased from 1 to 2
Device: /dev/da6 [SAT], failed to read SMART Attribute Data
Device: /dev/da4 [SAT], Read SMART Self-Test Log Failed
Device: /dev/da4 [SAT], Read SMART Error Log Failed
Device: /dev/da5 [SAT], Read SMART Error Log Failed
Device: /dev/da4 [SAT], not capable of SMART self-check
Device: /dev/da6 [SAT], Read SMART Self-Test Log Failed
Device: /dev/da4 [SAT], failed to read SMART Attribute Data
Device: /dev/da6 [SAT], not capable of SMART self-check
Device: /dev/da5 [SAT], not capable of SMART self-check
Device: /dev/da6 [SAT], Read SMART Error Log Failed
Device: /dev/da5 [SAT], Read SMART Self-Test Log Failed

Here is the smart output for all the drives. I ran a long smart test on the last of oktober, after badblocks was done, last week

Code:
[root@Darwin] ~# smartctl -A /dev/da0
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x002f   200   200   051	Pre-fail  Always	   -	   0
  3 Spin_Up_Time			0x0027   178   177   021	Pre-fail  Always	   -	   8100
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -	   11
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x002e   200   200   000	Old_age   Always	   -	   0
  9 Power_On_Hours		  0x0032   100   100   000	Old_age   Always	   -	   527
10 Spin_Retry_Count		0x0032   100   253   000	Old_age   Always	   -	   0
11 Calibration_Retry_Count 0x0032   100   253   000	Old_age   Always	   -	   0
12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   11
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   9
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   365
194 Temperature_Celsius	 0x0022   122   117   000	Old_age   Always	   -	   30
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0030   100   253   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -	   0
200 Multi_Zone_Error_Rate   0x0008   200   200   000	Old_age   Offline	  -	   0

[root@Darwin] ~# smartctl -A /dev/da1
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x002f   200   200   051	Pre-fail  Always	   -	   0
  3 Spin_Up_Time			0x0027   177   177   021	Pre-fail  Always	   -	   8108
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -	   11
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x002e   200   200   000	Old_age   Always	   -	   0
  9 Power_On_Hours		  0x0032   100   100   000	Old_age   Always	   -	   527
10 Spin_Retry_Count		0x0032   100   253   000	Old_age   Always	   -	   0
11 Calibration_Retry_Count 0x0032   100   253   000	Old_age   Always	   -	   0
12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   11
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   9
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   391
194 Temperature_Celsius	 0x0022   122   119   000	Old_age   Always	   -	   30
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0030   100   253   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -	   0
200 Multi_Zone_Error_Rate   0x0008   200   200   000	Old_age   Offline	  -	   0

[root@Darwin] ~# smartctl -A /dev/da2
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x002f   200   200   051	Pre-fail  Always	   -	   0
  3 Spin_Up_Time			0x0027   175   175   021	Pre-fail  Always	   -	   8208
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -	   11
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x002e   200   200   000	Old_age   Always	   -	   0
  9 Power_On_Hours		  0x0032   100   100   000	Old_age   Always	   -	   527
10 Spin_Retry_Count		0x0032   100   253   000	Old_age   Always	   -	   0
11 Calibration_Retry_Count 0x0032   100   253   000	Old_age   Always	   -	   0
12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   11
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   9
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   387
194 Temperature_Celsius	 0x0022   122   119   000	Old_age   Always	   -	   30
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0030   100   253   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -	   0
200 Multi_Zone_Error_Rate   0x0008   200   200   000	Old_age   Offline	  -	   0

[root@Darwin] ~# smartctl -A /dev/da3
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x002f   200   200   051	Pre-fail  Always	   -	   0
  3 Spin_Up_Time			0x0027   179   178   021	Pre-fail  Always	   -	   8008
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -	   11
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x002e   200   200   000	Old_age   Always	   -	   0
  9 Power_On_Hours		  0x0032   100   100   000	Old_age   Always	   -	   527
10 Spin_Retry_Count		0x0032   100   253   000	Old_age   Always	   -	   0
11 Calibration_Retry_Count 0x0032   100   253   000	Old_age   Always	   -	   0
12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   11
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   9
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   386
194 Temperature_Celsius	 0x0022   122   118   000	Old_age   Always	   -	   30
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0030   100   253   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -	   0
200 Multi_Zone_Error_Rate   0x0008   200   200   000	Old_age   Offline	  -	   0

[root@Darwin] ~# smartctl -A /dev/da4
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x002f   200   200   051	Pre-fail  Always	   -	   0
  3 Spin_Up_Time			0x0027   178   177   021	Pre-fail  Always	   -	   8091
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -	   11
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x002e   200   200   000	Old_age   Always	   -	   0
  9 Power_On_Hours		  0x0032   100   100   000	Old_age   Always	   -	   527
10 Spin_Retry_Count		0x0032   100   253   000	Old_age   Always	   -	   0
11 Calibration_Retry_Count 0x0032   100   253   000	Old_age   Always	   -	   0
12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   11
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   0
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   412
194 Temperature_Celsius	 0x0022   123   120   000	Old_age   Always	   -	   29
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0030   100   253   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -	   0
200 Multi_Zone_Error_Rate   0x0008   200   200   000	Old_age   Offline	  -	   0

[root@Darwin] ~# smartctl -A /dev/da5
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x002f   200   200   051	Pre-fail  Always	   -	   0
  3 Spin_Up_Time			0x0027   197   173   021	Pre-fail  Always	   -	   7108
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -	   13
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x002e   199   190   000	Old_age   Always	   -	   78
  9 Power_On_Hours		  0x0032   100   100   000	Old_age   Always	   -	   527
10 Spin_Retry_Count		0x0032   100   253   000	Old_age   Always	   -	   0
11 Calibration_Retry_Count 0x0032   100   253   000	Old_age   Always	   -	   0
12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   13
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   11
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   315
194 Temperature_Celsius	 0x0022   120   116   000	Old_age   Always	   -	   32
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0030   100   253   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -	   0
200 Multi_Zone_Error_Rate   0x0008   200   200   000	Old_age   Offline	  -	   17

[root@Darwin] ~# smartctl -A /dev/da6
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x002f   200   200   051	Pre-fail  Always	   -	   0
  3 Spin_Up_Time			0x0027   176   175   021	Pre-fail  Always	   -	   8183
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -	   11
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x002e   200   200   000	Old_age   Always	   -	   0
  9 Power_On_Hours		  0x0032   100   100   000	Old_age   Always	   -	   527
10 Spin_Retry_Count		0x0032   100   253   000	Old_age   Always	   -	   0
11 Calibration_Retry_Count 0x0032   100   253   000	Old_age   Always	   -	   0
12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   11
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   0
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   420
194 Temperature_Celsius	 0x0022   120   115   000	Old_age   Always	   -	   32
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0030   100   253   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -	   0
200 Multi_Zone_Error_Rate   0x0008   200   200   000	Old_age   Offline	  -	   0

[root@Darwin] ~# smartctl -A /dev/da7
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x002f   200   200   051	Pre-fail  Always	   -	   0
  3 Spin_Up_Time			0x0027   182   181   021	Pre-fail  Always	   -	   7900
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -	   11
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x002e   200   200   000	Old_age   Always	   -	   0
  9 Power_On_Hours		  0x0032   100   100   000	Old_age   Always	   -	   527
10 Spin_Retry_Count		0x0032   100   253   000	Old_age   Always	   -	   0
11 Calibration_Retry_Count 0x0032   100   253   000	Old_age   Always	   -	   0
12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   11
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   9
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   387
194 Temperature_Celsius	 0x0022   121   115   000	Old_age   Always	   -	   31
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0030   100   253   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -	   0
200 Multi_Zone_Error_Rate   0x0008   200   200   000	Old_age   Offline	  -	   0
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
The problem is that @Dice asked you to run smartctl -a, not smartctl -A. Many Unix programs are case-sensitive in their options. In particular, using -A instead of -a strips off a lot of information, including (critically) the self-test log. The data you posted for da5 is questionable in the seek error rate and multi-zone error rate though.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
When you do post the full output of smartctl -a /dev/da5 I'm sure more information can be provided to you, however in the meantime I thought I'd ask a few questions to provoke the thinking process, and because I'd like to learn a few things (yes, I do still learn here).

1) I noticed that you have an LSI 3008 controller flashed to IT P12 firmware. Is that the proper firmware for FreeNAS? I ask this because it seems like everyone is using firmware version P20/P21. I'm not a controller expert so I ask and learn. (EDIT: I just looked up firmware from the LSI site, looks like P12 is the current, answered my own question)

2) ID 200 Multi-Zone Errors very well can be the indication of a failure however generally not a failure of a SMART test. My question is: What testing were you running for your Burn-In testing in which generated the errors? Bad Blocks or maybe "dd" ?

3) What version of FreeNAS are you running?

4) Last thing would be to connect the hard drive directly to the motherboard SATA port and then test the drive directly and I'd run a SMART Long test and then see what the results are.

Let's hope your drive doesn't have infant mortality.

Good Luck!
 

Ceetan

Contributor
Joined
Apr 29, 2016
Messages
139
When you do post the full output of smartctl -a /dev/da5 I'm sure more information can be provided to you, however in the meantime I thought I'd ask a few questions to provoke the thinking process, and because I'd like to learn a few things (yes, I do still learn here).

One should always keep learning, and I have found these forums to be a place full of knowledge, both in the form of text and a community of many "teachers", yourself included.
1) I noticed that you have an LSI 3008 controller flashed to IT P12 firmware. Is that the proper firmware for FreeNAS? I ask this because it seems like everyone is using firmware version P20/P21. I'm not a controller expert so I ask and learn. (EDIT: I just looked up firmware from the LSI site, looks like P12 is the current, answered my own question)

I took special care to make sure I got this right. I hade feeling doing it wrong would be expensive.
2) ID 200 Multi-Zone Errors very well can be the indication of a failure however generally not a failure of a SMART test. My question is: What testing were you running for your Burn-In testing in which generated the errors? Bad Blocks or maybe "dd" ?

4) Last thing would be to connect the hard drive directly to the motherboard SATA port and then test the drive directly and I'd run a SMART Long test and then see what the results are.

Let's hope your drive doesn't have infant mortality.
I considered this.
It is next on my todo list, to investigate the controller and/or cables
I think it it might, at least is seems to have neonatal illnesses ;) (da5)
Good Luck!
Thank you. I 'll update my sig with OS info
New attempt
SAMRT-Info
 

Ceetan

Contributor
Joined
Apr 29, 2016
Messages
139
2) ID 200 Multi-Zone Errors very well can be the indication of a failure however generally not a failure of a SMART test. My question is: What testing were you running for your Burn-In testing in which generated the errors? Bad Blocks or maybe "dd" ?
So far, I have been running short and long smarttests, badlocks, and long smart tests again. If memory serves, Da5 started acting up during the first long smart, and seemingly went haywire during the badblocks. See image for a snipet of what it was like.
 

Attachments

  • 14889809_10154726750693060_4994332487990442608_o.jpg
    14889809_10154726750693060_4994332487990442608_o.jpg
    146 KB · Views: 338

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
:oops: You got troubles.

That has been running for 7 days on a 4TB drive?

Looking at the SMART information, you have grounds to RMA the drive. If possible you might be able to return to where you purchased for a new drive. Either way the drive is bad and nothing you can do will fix it.

I'll explain it a little... The SMART Long test failed at 90% into the read. This is a completely self contained test meaning your SATA cable and interface has nothing to do with the test other than to tell the drive to start and then you read the results. So the drive is really not worth putting any extra time into. Sorry for the bad news.

EDIT: The rest of your drives look to be in good working condition.
 

Ceetan

Contributor
Joined
Apr 29, 2016
Messages
139
No bad news really. Better to catch it now then when the pool is up and running. Thanks @joeschmuck @danb35 and @Dice :)
I had that feeling fairly early on from the badblocks already, so I have already intitade RMA contacts. What worries me more is the failiure of da4 and da6 to read their smart error logs. Is that cause for concern?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
What worries me more is the failiure of da4 and da6 to read their smart error logs. Is that cause for concern?
I suspect that is an issue with the LSI3008 or data cables. Connect those hard drives directly to the motherboard SATA ports and see if that allows you to communicate properly. If that works then you can try to swap the SATA cables around and see if it's a bad cable, but as I said earlier, I'm not a RAID/HBA card expert, even though I know yours is the built in SAS controller.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Load cycle count seems high on a drive with 500 hours.

Nearly a load an hour. Have you got the drives configured to spin down?
 

Ceetan

Contributor
Joined
Apr 29, 2016
Messages
139
Load cycle count seems high on a drive with 500 hours.

Nearly a load an hour. Have you got the drives configured to spin down?

nope....not yet. I suspect that it could be due to the fact that for awhile, I had more 2 bad blocks going for some of the drives. That was a mistake that happend becaue I was still learning tmux.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
I think the load cycle count is when the drive wakes from sleep
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Load cycle count seems high on a drive with 500 hours.

Nearly a load an hour. Have you got the drives configured to spin down?

Well, with a cycle per hour the drive will be gone from another cause before the LCC will even be at half the max rated value... I wouldn't worry about that :)
 

Ceetan

Contributor
Joined
Apr 29, 2016
Messages
139
Got this from the server earlier today. Can't be good, can it? I guess I should bring this to lsi3008 Issues. I have a feeling it might be firmware related. Any opinions?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I thought the LLC's were a bit high too but didn't know what was driving them up.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Well, with a cycle per hour the drive will be gone from another cause before the LCC will even be at half the max rated value... I wouldn't worry about that :)
Still, worth investigating after the immediate issues are solved. It's that weird middle ground that suggests something's wrong without being an immediate danger sign.
 

Ceetan

Contributor
Joined
Apr 29, 2016
Messages
139
Still, worth investigating after the immediate issues are solved. It's that weird middle ground that suggests something's wrong without being an immediate danger sign.
Is there any particular way this can be invetigated? What LLC shhould I expect?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Is there any particular way this can be invetigated? What LLC shhould I expect?
Roughly the same as the power cycle count.
 
Status
Not open for further replies.
Top