Cam status: ATA Status Error (WRITE_FPDMA_QUEUED)

Status
Not open for further replies.

Archaeopteryx

Dabbler
Joined
May 9, 2014
Messages
12
Hi,

first, thank you for this magnificent (and powerful) operating system.

At the moment, my personal experience is, that I saw every Error that FreeNAS (FreeBSD) could make, twice;) . But after consulting the forum, I noticed that I'm not alone and that the time and effort for the configuration in the beginning pays off in the longtime, if the system runs unattended.

After I defeated the Samba Daemon and nearly everything ran fine, I got an Error Message:


At this moment, i was writing big chunks of data on the HDD.After this error message, i couldn't login per SSH and there was no reaction from the GUI. The write perfomance over CIFS went down to few kb/s. I could press 11 for reboot directly at the system (Concole setup). Not every service wanted to exit, so I made a hard reboot.

For a better problem solution I sticked pretty good to the hardware recommendations:

1 x Supermicro MBD-X10SLM+-F-B - Bulk
(Server Mainboard, no gamer/multimediaboard)
1 x Intel Xeon E3-1230v3 4x 3.30GHz So.1150 BOX
(Server Processor)
2 x 8GB Kingston ValueRAM DDR3-1333 ECC DIMM CL9 Single
(ECC and more than 8GB for ZFS)
1 x 400 Watt be quiet! Straight Power E9 Non-Modular 80+ Gold
(I'm very satisfied with be quiet!, in my personal Computer runs a BeQuiet! Power Supply for over 5 Years.)


My HDD is (at the moment) a Samsung HDD 1.5TB SATA Disk. I use it as a Single Disk with ZFS and ecryption. The error comes only after some time and if i write large data on the disk.

After this error message i made the following commands:
Code:
[root@freenas] ~# smartctl -a -q noserial /dev/ada0
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p4 amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
 
=== START OF INFORMATION SECTION ===
Model Family:    SAMSUNG SpinPoint F2 EG
Device Model:    SAMSUNG HD154UI
Firmware Version: 1AG01118
User Capacity:    1,500,301,910,016 bytes [1.50 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:  ATA/ATAPI-7, ATA8-ACS T13/1699-D revision 3b
Local Time is:    Fri May  9 17:27:52 2014 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
 
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
 
General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (  0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (19109) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off supp                                    ort.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (  2) minutes.
Extended self-test routine
recommended polling time:        ( 319) minutes.
Conveyance self-test routine
recommended polling time:        (  33) minutes.
SCT capabilities:              (0x003f) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.
 
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_                                    FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x000f  100  100  051    Pre-fail  Always      -                                          0
  3 Spin_Up_Time            0x0007  076  076  011    Pre-fail  Always      -                                          8070
  4 Start_Stop_Count        0x0032  100  100  000    Old_age  Always      -                                          325
  5 Reallocated_Sector_Ct  0x0033  100  100  010    Pre-fail  Always      -                                          0
  7 Seek_Error_Rate        0x000f  100  100  051    Pre-fail  Always      -                                          0
  8 Seek_Time_Performance  0x0025  100  100  015    Pre-fail  Offline      -                                          11673
  9 Power_On_Hours          0x0032  099  099  000    Old_age  Always      -                                          7296
10 Spin_Retry_Count        0x0033  100  100  051    Pre-fail  Always      -                                          0
11 Calibration_Retry_Count 0x0012  100  100  000    Old_age  Always      -                                          0
12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -                                          290
13 Read_Soft_Error_Rate    0x000e  100  100  000    Old_age  Always      -                                          0
183 Runtime_Bad_Block      0x0032  100  100  000    Old_age  Always      -                                          3
184 End-to-End_Error        0x0033  100  100  000    Pre-fail  Always      -                                          0
187 Reported_Uncorrect      0x0032  100  100  000    Old_age  Always      -                                          0
188 Command_Timeout        0x0032  100  100  000    Old_age  Always      -                                          0
190 Airflow_Temperature_Cel 0x0022  076  050  000    Old_age  Always      -                                          24 (Min/Max 24/24)
194 Temperature_Celsius    0x0022  077  049  000    Old_age  Always      -                                          23 (Min/Max 23/24)
195 Hardware_ECC_Recovered  0x001a  100  100  000    Old_age  Always      -                                          2785
196 Reallocated_Event_Count 0x0032  100  100  000    Old_age  Always      -                                          0
197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always      -                                          0
198 Offline_Uncorrectable  0x0030  100  100  000    Old_age  Offline      -                                          0
199 UDMA_CRC_Error_Count    0x003e  100  100  000    Old_age  Always      -                                          161
200 Multi_Zone_Error_Rate  0x000a  100  100  000    Old_age  Always      -                                          0
201 Soft_Read_Error_Rate    0x000a  253  253  000    Old_age  Always      -                                          0
 
SMART Error Log Version: 1
ATA Error Count: 2
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
 
Error 2 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle                                    .
 
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  00 d2 01 4f e5 e4 45  at LBA = 0x05e4e54f = 98886991
 
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ca 00 eb 65 e4 e4 45 08      01:13:49.310  WRITE DMA
  ef 02 00 00 00 00 40 00      01:13:49.300  SET FEATURES [Enable write cache]
  ef aa 00 00 00 00 40 00      01:13:49.300  SET FEATURES [Enable read look-ahea                                    d]
  c6 00 10 00 00 00 40 00      01:13:49.300  SET MULTIPLE MODE
  ef 10 02 00 00 00 40 00      01:13:49.300  SET FEATURES [Enable SATA feature]
 
Error 1 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle                                    .
 
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 2f 89 be 50 e0  Error: ICRC, ABRT 47 sectors at LBA = 0x0050be89 = 52916                                    57
 
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 00 b8 bd 50 e0 00      01:02:07.210  READ DMA
  c8 00 20 20 cb 48 e0 00      01:02:07.210  READ DMA
  c8 00 08 10 cb 48 e0 00      01:02:07.210  READ DMA
  c8 00 08 f8 ca 48 e0 00      01:02:07.200  READ DMA
  c8 00 00 b8 bc 50 e0 00      01:02:07.180  READ DMA
 
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA                                    _of_first_error
# 1  Short offline      Completed without error      00%      7292        -
# 2  Short offline      Completed without error      00%      7279        -
# 3  Short offline      Completed without error      00%      7278        -
# 4  Extended offline    Completed without error      90%      7277        -
# 5  Short offline      Completed without error      00%      7275        -
 
SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
[root@freenas] ~#


and (if it helps):
Code:
[root@freenas] ~# gpart list
Geom name: ada0
modified: false
state: OK
fwheads: 16
fwsectors: 63
last: 2930277134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: ada0p1
  Mediasize: 2147483648 (2.0G)
  Sectorsize: 512
  Stripesize: 0
  Stripeoffset: 65536
  Mode: r1w1e1
  rawuuid: b7c28d33-d6b5-11e3-8ef4-0cc47a06f15f
  rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
  label: (null)
  length: 2147483648
  offset: 65536
  type: freebsd-swap
  index: 1
  end: 4194431
  start: 128
2. Name: ada0p2
  Mediasize: 1498154343936 (1.4T)
  Sectorsize: 512
  Stripesize: 0
  Stripeoffset: 2147549184
  Mode: r0w0e0
  rawuuid: b7d71a80-d6b5-11e3-8ef4-0cc47a06f15f
  rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
  label: (null)
  length: 1498154343936
  offset: 2147549184
  type: freebsd-zfs
  index: 2
  end: 2930277134
  start: 4194432
Consumers:
1. Name: ada0
  Mediasize: 1500301910016 (1.4T)
  Sectorsize: 512
  Mode: r1w1e2
 
Geom name: da0
modified: false
state: OK
fwheads: 255
fwsectors: 63
last: 62259199
first: 63
entries: 4
scheme: MBR
Providers:
1. Name: da0s1
  Mediasize: 988291584 (942M)
  Sectorsize: 512
  Stripesize: 0
  Stripeoffset: 32256
  Mode: r1w0e1
  attrib: active
  rawtype: 165
  length: 988291584
  offset: 32256
  type: freebsd
  index: 1
  end: 1930319
  start: 63
2. Name: da0s2
  Mediasize: 988291584 (942M)
  Sectorsize: 512
  Stripesize: 0
  Stripeoffset: 988356096
  Mode: r0w0e0
  rawtype: 165
  length: 988291584
  offset: 988356096
  type: freebsd
  index: 2
  end: 3860639
  start: 1930383
3. Name: da0s3
  Mediasize: 1548288 (1.5M)
  Sectorsize: 512
  Stripesize: 0
  Stripeoffset: 1976647680
  Mode: r0w0e0
  rawtype: 165
  length: 1548288
  offset: 1976647680
  type: freebsd
  index: 3
  end: 3863663
  start: 3860640
4. Name: da0s4
  Mediasize: 21159936 (20M)
  Sectorsize: 512
  Stripesize: 0
  Stripeoffset: 1978195968
  Mode: r1w1e2
  rawtype: 165
  length: 21159936
  offset: 1978195968
  type: freebsd
  index: 4
  end: 3904991
  start: 3863664
Consumers:
1. Name: da0
  Mediasize: 31876710400 (29G)
  Sectorsize: 512
  Mode: r2w1e4
 
Geom name: da0s1
modified: false
state: OK
fwheads: 255
fwsectors: 63
last: 1930256
first: 0
entries: 8
scheme: BSD
Providers:
1. Name: da0s1a
  Mediasize: 988283392 (942M)
  Sectorsize: 512
  Stripesize: 0
  Stripeoffset: 40448
  Mode: r1w0e1
  rawtype: 0
  length: 988283392
  offset: 8192
  type: !0
  index: 1
  end: 1930256
  start: 16
Consumers:
1. Name: da0s1
  Mediasize: 988291584 (942M)
  Sectorsize: 512
  Stripesize: 0
  Stripeoffset: 32256
  Mode: r1w0e1
 
[root@freenas] ~#


After that error message, i changed the port on the mainboard (from white to black, 6G -> 3G)
and switched the SATA Cable.

I booted from USB Linux Mint and made a badblock check:
lVhtvuA.jpg

(Short translation: Completed. 0 Errors found) This ran over 13h.

After that, i made a short SMART Test:
XTgxPFI.jpg

3GErNxU.jpg


At the end I made the "smartctl -a -q noserial /dev/ada0" command, which I posted at the top of this Thread.

Could you say what is wrong with the drive even though badblocks found nothing? Even with the heavy use at Linux Live System over 13 hours appeared no error message. Don't worry about the data, I have multiple backups and the system has only (estimated) 20 GB of data - If I need to format it, it wouldn't be a big deal.

You're help is appreciated,

Archaeopteryx

For a short solution I read, that it would help to turn SATA Mode from AHCI to IDE. At 1 hour uptime, I didn't experienced this error. But I will now go and test it with some data writing.
 
Status
Not open for further replies.
Top