HDD Serial number not displaying, SMART tests not consistent

Status
Not open for further replies.

Marc Locchi

Dabbler
Joined
May 15, 2014
Messages
12
Hi all,
I have an issue with what seems to be my SATA4 port, so that the drive does not report a serial number, nor the right number of smart test results (example: all drives return 21 SMART short test results, the drive in SATA4 returns 11 when running these via Putty, see output below)

I am not sure what to try next, as I cannot understand why only this drive port is not showing the drive details. Could it be related to this bug?

What I have tried:
- New SATA cable (literally new)
- New Drive (I have brand new spare WD Se)
- Change drive position in chain, always da4 is showing with missing serial.

My config:
- FreeNAS-9.2.1.6-RC2-72b8479-x64 - running off USB key
- Intel Server Board S2600CP4 Dual Xeon socket, Latest Firmware - 02.03.0003
- 10x 4TB HDD - WE Se Enterprise Drives, JBOD, no hardware raid, set up as RSTE (Intel SATA), AHCI, Bios set up to staggered spin up for drives.
- Xeon E2630 V3, 6 Core, 2.3GHz
- 32GB DDR3 HP ECC Server Memory
- 1x ZFS-2 volume (RAID 6)
- 4x Gb NIC aggregated
- Board Link: http://ark.intel.com/products/56334

Image of the "View Disks" panel:
FreeNas.jpg

Example output of smartctl -a -q noserial /dev/da4
Code:
[root@LSERVER] ~# smartctl -a -q noserial /dev/da4
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p9 amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD4000F9YZ-09N20L0
Firmware Version: 01.01A01
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Tue Jul 29 18:30:52 2014 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Warning! SMART Attribute Thresholds Structure error: invalid SMART checksum.
=== START OF READ SMART DATA SECTION ===
SMART STATUS RETURN: incomplete response, ATA output registers missing
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (41040) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 444) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x70bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   162   145   021    Pre-fail  Always       -       10900
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       154
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       887
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       154
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       153
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   122   112   000    Old_age   Always       -       30
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%       779         -
# 2  Short offline       Completed without error       00%       605         -
# 3  Short offline       Completed without error       00%       525         -
# 4  Short offline       Completed without error       00%       523         -
# 5  Short offline       Completed without error       00%       359         -
# 6  Short offline       Completed without error       00%       347         -
# 7  Short offline       Completed without error       00%       335         -
# 8  Short offline       Completed without error       00%       323         -
# 9  Short offline       Completed without error       00%       311         -
#10  Short offline       Completed without error       00%       167         -
#11  Short offline       Completed without error       00%       155         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


Example of another drive:
Code:
[root@LSERVER] ~# smartctl -a -q noserial /dev/ada0
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p9 amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD4000F9YZ-09N20L0
Firmware Version: 01.01A01
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue Jul 29 18:30:03 2014 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                                        was suspended by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (40680) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 440) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x70bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   152   148   021    Pre-fail  Always       -       11400
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       151
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       888
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       150
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       100
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       50
194 Temperature_Celsius     0x0022   123   103   000    Old_age   Always       -       29
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%       887         -
# 2  Short offline       Completed without error       00%       875         -
# 3  Short offline       Completed without error       00%       861         -
# 4  Short offline       Completed without error       00%       859         -
# 5  Short offline       Completed without error       00%       848         -
# 6  Short offline       Completed without error       00%       844         -
# 7  Short offline       Completed without error       00%       832         -
# 8  Short offline       Completed without error       00%       820         -
# 9  Short offline       Completed without error       00%       808         -
#10  Short offline       Completed without error       00%       805         -
#11  Short offline       Completed without error       00%       793         -
#12  Short offline       Completed without error       00%       792         -
#13  Short offline       Completed without error       00%       780         -
#14  Short offline       Completed without error       00%       779         -
#15  Short offline       Completed without error       00%       767         -
#16  Short offline       Completed without error       00%       761         -
#17  Short offline       Completed without error       00%       752         -
#18  Short offline       Completed without error       00%       741         -
#19  Short offline       Completed without error       00%       723         -
#20  Short offline       Completed without error       00%       711         -
#21  Short offline       Completed without error       00%       700         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 

Marc Locchi

Dabbler
Joined
May 15, 2014
Messages
12
I have just updated to 9.2.1.6 and problem still persists. Any suggestions on where to check are welcome!
 

Z300M

Guru
Joined
Sep 9, 2011
Messages
882
After the last-but-one reboot, one of the drive serial numbers was missing. After the most recent reboot, all serial numbers are displayed. The only change I had made was add another drive cage.
 

Marc Locchi

Dabbler
Joined
May 15, 2014
Messages
12
Hi @Z300M, thanks for the reply. I cannot add another drive as my board is fully populated.

I found this in the boot log which shows the HW seems to be reporting all details OK, but FreeNas is not using these in GUI:

Aug 13 08:50:14 LSERVER kernel: da4 at isci0 bus 1 scbus1 target 0 lun 0
Aug 13 08:50:14 LSERVER kernel: da4: <ATA WDC WD4000F9YZ-0 1A01> Fixed Direct Access SCSI-5 device
Aug 13 08:50:14 LSERVER kernel: da4: Serial Number WD-WCC132087714
Aug 13 08:50:14 LSERVER kernel: da4: 300.000MB/s transfers
Aug 13 08:50:14 LSERVER kernel: da4: Command Queueing enabled
Aug 13 08:50:14 LSERVER kernel: da4: 3815447MB (7814037168 512 byte sectors: 255H 63S/T 486401C)

Anybody else out there with any suggestions?
 
Last edited:

Knowltey

Patron
Joined
Jul 21, 2013
Messages
430
Have you attempted yet booting up with a configless USB stick to rule out the possibility of some sort of config issue or the data where that information is stored being corrupted somehow?

Also if you have a moment, try logging into the shell via your method of choice and running the "fsck" command. Usually when I encounter random issues like this springing up with nothign having been changed to cuase them it'll usually find something and fix the issue.
 

Z300M

Guru
Joined
Sep 9, 2011
Messages
882
anybody, anybody, Bueller...? Maybe Synology is the next thing to try then?
That would be a move far too drastic, surely. For me, at least, the missing serial number thing is intermittent: sometimes they all show up, sometimes one (or more? I don't recall) will be missing. I don't think I've ever seen a serial number go missing after it was shown immediately after booting. I can always use a command-line script to check the drive serial numbers and temperatures if I need to -- either from the shell in the FreeNAS GUI or from the IPMI console.

da4 is the only one with the "incomplete" SMART report? Does your motherboard have an additional (currently unused) port to which you can connect that drive and see whether the problem remains? Or can you switch the connections of two drives and see whether the problem follows the drive or stays with the port? Faulty port or cable?

When I wrote previously that the missing serial no. had reappeared when I rebooted after adding another drive cage, all I meant was that I had made no change that I considered to be significant: it was an empty cage -- no additional drives were installed.

BTW, I recall reading a post a month or two back by one of the very experienced FreeNAS people expressing the view that although Intel CPUs are very good their motherboards are not that great.
 
Status
Not open for further replies.
Top