SMART strangeness

Status
Not open for further replies.
Joined
Oct 8, 2016
Messages
48
Hi to all, i'm new and I admit: i don't use freenas :(
I'm here because reading your forum i've seen many people very prepared.

I have a question: on some server, our SAS disks report the following:

Code:
smartctl 5.41 2011-06-09 r3365 [i686-linux-2.6.32.43-0.4.1.xs1.8.0.835.170778xen] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

Vendor:			   SEAGATE 
Product:			  ST3600057SS	
Revision:			 0008
User Capacity:		600,127,266,816 bytes [600 GB]
Logical block size:   512 bytes
Logical Unit id:	  0x5000c500777acdf7
Serial number:		6SL95B550000N506770E
Device type:		  disk
Transport protocol:   SAS
Local Time is:		Sat Oct  8 21:28:27 2016 CEST
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK

Current Drive Temperature:	 51 C
Drive Trip Temperature:		68 C
Elements in grown defect list: 0
Vendor (Seagate) cache information
  Blocks sent to initiator = 3309149431
  Blocks received from initiator = 162212934
  Blocks read from cache and sent to initiator = 2621964573
  Number of read and write commands whose size <= segment size = 2276337835
  Number of read and write commands whose size > segment size = 0
Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 16374,93
  number of minutes until next internal SMART test = 27

Error counter log:
		   Errors Corrected by		   Total   Correction	 Gigabytes	Total
			   ECC		  rereads/	errors   algorithm	  processed	uncorrected
		   fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   3494281588		1		 0  3494281589   3494281589	 221886,840		   0
write:		 0		0		 0		 0		  0	  35407,108		   0
verify: 668376162		0		 0  668376162   668376162	  56884,374		   0

Non-medium error count:		0

SMART Self-test log
Num  Test			  Status				 segment  LifeTime  LBA_first_err [SK ASC ASQ]
	 Description							  number   (hours)
# 1  Background short  Completed				   -   16355				 - [-   -	-]
# 2  Background short  Completed				   -   16331				 - [-   -	-]
# 3  Background short  Completed				   -   16307				 - [-   -	-]
# 4  Background short  Completed				   -   16283				 - [-   -	-]
# 5  Background short  Completed				   -   16259				 - [-   -	-]
# 6  Background short  Completed				   -   16235				 - [-   -	-]
# 7  Background long   Completed				   -   16217				 - [-   -	-]
# 8  Background short  Completed				   -   16211				 - [-   -	-]
# 9  Background short  Completed				   -   16187				 - [-   -	-]
#10  Background short  Completed				   -   16163				 - [-   -	-]
#11  Background short  Completed				   -   16139				 - [-   -	-]
#12  Background short  Completed				   -   16115				 - [-   -	-]
#13  Background short  Completed				   -   16091				 - [-   -	-]
#14  Background short  Completed				   -   16067				 - [-   -	-]
#15  Background long   Completed				   -   16049				 - [-   -	-]
#16  Background short  Completed				   -   16043				 - [-   -	-]
#17  Background short  Completed				   -   16019				 - [-   -	-]
#18  Background short  Completed				   -   15995				 - [-   -	-]
#19  Background short  Completed				   -   15971				 - [-   -	-]
#20  Background short  Completed				   -   15947				 - [-   -	-]

Long (extended) Self Test duration: 6400 seconds [106,7 minutes]

Background scan results log
  Status: waiting until BMS interval timer expires
	Accumulated power on time, hours:minutes 16374:56 [982496 minutes]
	Number of background scans performed: 228,  scan progress: 0,00%
	Number of background medium scans performed: 33356

   #  when		lba(hex)	[sk,asc,ascq]	reassign_status
   1 12681:19  000000000bcc2103  [1,17,1]   Recovered via rewrite in-place
   2 12807:37  000000000bcc2e80  [1,17,1]   Recovered via rewrite in-place
   3 13254:46  000000000bcc2202  [1,17,1]   Recovered via rewrite in-place
   4 13349:44  000000000bcc1504  [1,17,1]   Recovered via rewrite in-place
   5 14298:03  000000000bbf05dd  [1,17,1]   Recovered via rewrite in-place
   6 14298:03  000000000bbf05de  [1,17,1]   Recovered via rewrite in-place
   7 14298:03  000000000bbf05df  [1,17,1]   Recovered via rewrite in-place
   8 14823:45  000000000bcc2106  [1,17,1]   Recovered via rewrite in-place
   9 14991:35  000000000bcc2109  [1,17,1]   Recovered via rewrite in-place
  10 15135:43  000000000bcc2115  [1,17,1]   Recovered via rewrite in-place
  11 15254:13  000000001ce87601  [1,17,1]   Recovered via rewrite in-place
  12 15856:40  0000000027f8ff69  [1,17,1]   Recovered via rewrite in-place
  13 16219:00  0000000027f8ff68  [1,17,1]   Recovered via rewrite in-place
 33024 13836:10  008001350e1d0080  [e,18,c9]   Require Write or Reassign Blocks command
 33025 13836:10  0080803504808781  [e,18,c9]   Require Write or Reassign Blocks command
 33026 14292:13  003001330e1d0080  [8,19,2b]   Successfully reassigned
 33027 14292:13  0030803304808781  [8,19,2b]   Successfully reassigned
 33028 14483:55  008001320e1d0080  [4,a9,9d]   Require Write or Reassign Blocks command
 33029 14483:55  0080803204808781  [4,a9,9d]   Require Write or Reassign Blocks command
 33030 14699:53  007001330e1d0080  [b,cc,14]   Reserved [0x0]
 33031 14699:53  0070803304808781  [b,cc,14]   Reserved [0x0]
 33032 15203:47  001001320e1d0080  [5,5,af]   Require Write or Reassign Blocks command
 33033 15203:47  0010803204808781  [5,5,af]   Require Write or Reassign Blocks command
 33034 15371:45  008001330e1d0080  [f,bd,a8]   Reserved [0x0]
 33035 15371:45  0080803304808781  [f,bd,a8]   Reserved [0x0]
 33036 15731:41  000801360e1d0080  [c,ea,30]   Require Write or Reassign Blocks command
 33037 15731:41  0008803604808781  [c,ea,30]   Require Write or Reassign Blocks command
 33038 16187:35  008001330e1d0080  [c,e9,2d]   Require Write or Reassign Blocks command
 33039 16187:35  0080803304808781  [c,e9,2d]   Require Write or Reassign Blocks command
Protocol Specific port log page for SAS SSP
relative target port id = 1
  generation code = 8
  number of phys = 1
  phy identifier = 0
	attached device type: end device
	attached reason: unknown
	reason: unknown
	negotiated logical link rate: phy enabled; 6 Gbps
	attached initiator port: ssp=1 stp=1 smp=1
	attached target port: ssp=0 stp=0 smp=0
	SAS address = 0x5000c500777acdf5
	attached SAS address = 0x5d4ae520b3be3307
	attached phy identifier = 3
	Invalid DWORD count = 0
	Running disparity error count = 0
	Loss of DWORD synchronization = 952
	Phy reset problem = 0
	Phy event descriptors:
	 Invalid word count: 0
	 Running disparity error count: 0
	 Loss of dword synchronization count: 952
	 Phy reset problem count: 0
relative target port id = 2
  generation code = 8
  number of phys = 1
  phy identifier = 1
	attached device type: no device attached
	attached reason: unknown
	reason: unknown
	negotiated logical link rate: phy enabled; 1.5 Gbps
	attached initiator port: ssp=0 stp=0 smp=0
	attached target port: ssp=0 stp=0 smp=0
	SAS address = 0x5000c500777acdf6
	attached SAS address = 0x0
	attached phy identifier = 0
	Invalid DWORD count = 0
	Running disparity error count = 0
	Loss of DWORD synchronization = 0
	Phy reset problem = 0
	Phy event descriptors:
	 Invalid word count: 0
	 Running disparity error count: 0
	 Loss of dword synchronization count: 0
	 Phy reset problem count: 0


I need some advice about this.
1) "Elements in grown defect list: 0" should means that there aren't any reallocated sector in this disk, right? the grown list shuould count the total number of reallocated sector.

2) All "background long" are completed properly, right ?

3) " 1 12681:19 000000000bcc2103 [1,17,1] Recovered via rewrite in-place" messages like this means that the sector is wrote properly by issuing a rewrite. This should be normal and not an issue

4) "33028 14483:55 008001320e1d0080 [4,a9,9d] Require Write or Reassign Blocks command" what does it mean? Is something should I warry about ?

5) "33030 14699:53 007001330e1d0080 [b,cc,14] Reserved [0x0]" reserved by who and why?

6) "33026 14292:13 003001330e1d0080 [8,19,2b] Successfully reassigned" this should indicate a sector reassigned. But why this is not counting in the grown list ?

In other words, disks like this, should be changed or are fine ? (i'm using an hardware raid that is not reporting any kind of issue even during the "Patrol Read" scan (the controller reads the whole raid looking for bad sectors)

Thank you in advance
 

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
i'm using an hardware raid
WHHHHHYYYYYYYYYYYYYYYYYYYYYYYYYYY??????

Not using FreeNAS, that's why.
Current Drive Temperature: 51 C
Woah, waaaayyy too high.
Errors Corrected by Total Correction Gigabytes Total ECC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 3494281588 1 0 3494281589 3494281589 221886,840 0 write: 0 0 0 0 0 35407,108 0 verify: 668376162 0 0 668376162 668376162 56884,374 0 Non-medium error count: 0
This is an unreadable jumbled mess and I have a very hard time interpreting what is going on.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
This should probably be moved to the off-topic section.
@Ericloewe @joeschmuck
Didn't even notice that.

Moving to offtopic mostly so that no one gets the idea that HW RAID is an acceptable solution for FreeNAS.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Okay, so you do not use FreeNAS, you are just looking for a little hard drive advice. I can respect that.

First, SAS drives are still a bit new to me but deciphering the data shouldn't be too difficult, at least the important parts.

1) The drive status is "OK", a big thumbs up!
2) As you mentioned, the defect list is "0", another thumbs up! This value directly correlates with any new sectors being marked bad after manufacturing so you are good to go.
3) The drive temp is 51C, not great but not a fail either.
4) You have 16,300+ hours (1.86 years) on the drive, not too bad.
5) You have a "short" test running once a day which passes with flying colors!
6) You have a "long" test running once a week which passes with flying colors!

Overall your drive is fine. If you see the defect list increment at all, the drive is likely going bad. A count of 3 would not have me worried but a count of 3 today and 5 next week, well if it keeps going up, you will need to replace it.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
It's not too off topic other than it has nothing to do with FreeNAS. But it will be moved.

Edit: Me thinks someone beat me to it.
 
Joined
Oct 8, 2016
Messages
48
As wrote, I'm not using freenas.
What i really don't understand is the meaning of that "warning" messages reported by background media scan.

Keep in mind that these are 15k rpm disks, temperature is a little bit higher than 7200rpm
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Again, I'm no SAS expert. If the drive had to mark the block as bad then the defect list would not be zero. Is the drive failing, I personally don't think so, not yet at least.
I can see that properly even on mobile phone, i don't know why you are seeing it like this
The formatting didn't come out very cleanly, not your fault, long lines can make it a bear to read.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Not worth debating over but, when I read it I had to be mindful of the spacing at the "Error counter log:" section. The bulk of the text was fine.
 
Status
Not open for further replies.
Top