Unrecoverable error: Logical block address out of range

Status
Not open for further replies.

soulbabel

Cadet
Joined
Nov 23, 2013
Messages
8
Over the last few weeks I keep getting the following error:

Code:
> (da4:mps0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 18 48 3f 90 00 00 00 08 00 00
> (da4:mps0:0:6:0): CAM status: SCSI Status Error
> (da4:mps0:0:6:0): SCSI status: Check Condition
> (da4:mps0:0:6:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range)
> (da4:mps0:0:6:0): Info: 0x118483f90
> (da4:mps0:0:6:0): Error 22, Unretryable error


Zpool status tells me that a small amount of data has been resilvered. The size of the data has always been different the three times that this has happened so far (156K, 180K, now 80K):

Code:
  pool: tank1
state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
  see: http://illumos.org/msg/ZFS-8000-9P
  scan: resilvered 80K in 0h0m with 0 errors on Sun Feb 16 13:34:37 2014
config:
 
        NAME                                            STATE    READ WRITE CKSUM
        tank1                                          ONLINE      0    0    0
          raidz2-0                                      ONLINE      0    0    0
            gptid/6177eafb-4c2c-11e3-a43a-002590d72d3b  ONLINE      0    0    0
            gptid/91cabea9-8b48-11e3-a7b6-002590d72d3b  ONLINE      0    0    0
            gptid/634c34ad-4c2c-11e3-a43a-002590d72d3b  ONLINE      0    0    0
            gptid/64368d6a-4c2c-11e3-a43a-002590d72d3b  ONLINE      0    0    0
            gptid/6524b559-4c2c-11e3-a43a-002590d72d3b  ONLINE      0    0    0
            gptid/6a4eecce-75d3-11e3-8a04-002590d72d3b  ONLINE      0    1    0
        logs
          gptid/349c2eca-4c2d-11e3-a43a-002590d72d3b    ONLINE      0    0    0
 
errors: No known data errors


Everytime this happens I run an extended SMART test to find any problems with the drive, but each time no errors are reported.

Code:
=== START OF INFORMATION SECTION ===
Device Model:    WDC WD40EFRX-68WT0N0
Serial Number:    WD-WCC4E0477430
LU WWN Device Id: 5 0014ee 2b3e0e355
Firmware Version: 80.00A80
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:    512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:  ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Feb 17 06:45:10 2014 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
 
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
 
General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (  0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (53760) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (  2) minutes.
Extended self-test routine
recommended polling time:        ( 537) minutes.
Conveyance self-test routine
recommended polling time:        (  5) minutes.
SCT capabilities:              (0x703d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.
 
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x002f  200  200  051    Pre-fail  Always      -      0
  3 Spin_Up_Time            0x0027  178  176  021    Pre-fail  Always      -      8100
  4 Start_Stop_Count        0x0032  100  100  000    Old_age  Always      -      24
  5 Reallocated_Sector_Ct  0x0033  200  200  140    Pre-fail  Always      -      0
  7 Seek_Error_Rate        0x002e  200  200  000    Old_age  Always      -      0
  9 Power_On_Hours          0x0032  099  099  000    Old_age  Always      -      1100
10 Spin_Retry_Count        0x0032  100  253  000    Old_age  Always      -      0
11 Calibration_Retry_Count 0x0032  100  253  000    Old_age  Always      -      0
12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      24
192 Power-Off_Retract_Count 0x0032  200  200  000    Old_age  Always      -      21
193 Load_Cycle_Count        0x0032  196  196  000    Old_age  Always      -      12736
194 Temperature_Celsius    0x0022  122  095  000    Old_age  Always      -      30
196 Reallocated_Event_Count 0x0032  200  200  000    Old_age  Always      -      0
197 Current_Pending_Sector  0x0032  200  200  000    Old_age  Always      -      0
198 Offline_Uncorrectable  0x0030  100  253  000    Old_age  Offline      -      0
199 UDMA_CRC_Error_Count    0x0032  200  200  000    Old_age  Always      -      0
200 Multi_Zone_Error_Rate  0x0008  200  200  000    Old_age  Offline      -      0
 
SMART Error Log Version: 1
No Errors Logged
 
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline      Completed without error      00%      1093        -
# 2  Extended offline    Completed without error      00%      1073        -
# 3  Extended offline    Interrupted (host reset)      10%      1056        - (Note: I rebooted the server before this test finished)
# 4  Short offline      Completed without error      00%      1045        -
# 5  Short offline      Completed without error      00%      1021        -
# 6  Short offline      Completed without error      00%      997        -
# 7  Short offline      Completed without error      00%      974        -
# 8  Short offline      Completed without error      00%      950        -
# 9  Short offline      Completed without error      00%      926        -
#10  Short offline      Completed without error      00%      902        -
#11  Short offline      Completed without error      00%      878        -
#12  Extended offline    Completed without error      00%      867        -
#13  Short offline      Completed without error      00%      857        -
#14  Short offline      Completed without error      00%      854        -
#15  Short offline      Completed without error      00%      830        -
#16  Short offline      Completed without error      00%      806        -
#17  Extended offline    Completed without error      00%      794        -
#18  Short offline      Completed without error      00%      782        -
#19  Short offline      Completed without error      00%      758        -
#20  Short offline      Completed without error      00%      734        -
#21  Short offline      Completed without error      00%      715        -
 
SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


Just to make sure I wasn't testing the wrong disk, I went into the FreeNAS Volume Status > Edit Disk to pull the serial number of the problem disk. I confirmed that I was checking the problem disk via this method. When I received the particular disk about four months ago, Newegg shipped it in a bubble wrap HDD sleeve. A few of the bubbles, which are few and span the length of the disk, were completely deflated. I did an extended SMART test and a badblocks -wsv test on the drive when I received it, but no problems were revealed so I didn't send the drive back to Newegg.

All the drives in the NAS are WD40EFRX connected to an IT flashed IBM ServeRAID M1015 via mini-SAS to SATA cables. I am using 32GB ECC RAM in this system. I'd appreciate any help in trying to fix this problem. I've already RMA'd two of the WD40EFRX drives that were in the system, but they had SMART test errors. Since this particular drive keeps passing the SMART tests, I'm afraid of sending it in for an Advanced RMA and getting charged for sending in a good drive.

Thanks in advance for any help.
 

soulbabel

Cadet
Joined
Nov 23, 2013
Messages
8
In case it is helpful, here is the SCSI error from the most recent time, and the time before that:

Code:
> (da4:mps0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 18 48 3f 90 00 00 00 08 00 00
> (da4:mps0:0:6:0): CAM status: SCSI Status Error
> (da4:mps0:0:6:0): SCSI status: Check Condition
> (da4:mps0:0:6:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range)
> (da4:mps0:0:6:0): Info: 0x118483f90
> (da4:mps0:0:6:0): Error 22, Unretryable error


Code:
> (da4:mps0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 18 4a a5 68 00 00 00 08 00 00
> (da4:mps0:0:6:0): CAM status: SCSI Status Error
> (da4:mps0:0:6:0): SCSI status: Check Condition
> (da4:mps0:0:6:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range)
> (da4:mps0:0:6:0): Info: 0x1184aa568
> (da4:mps0:0:6:0): Error 22, Unretryable error
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Well, the first think I'd ask is if your phase firmware version matches the driver version. Everything looks normal on the surface, but clearly something is wrong.

Considering you've replaced 2 disks.. that makes me wonder if you have something else wrong. Bad PSU perhaps? Bad SATA cabling?
 

soulbabel

Cadet
Joined
Nov 23, 2013
Messages
8
First off, thanks for your reply Cyberjock, and thanks for all the helpful information you've provided as I found them essential in this, my first FreeNAS build.

if your phase firmware version matches the driver version
I am unfamiliar with this topic. How do I go about verifying this information?


So far I've replaced 2 out of the 14 WD40EFRX disks in the NAS. I always figured it was average based on the failure reports in the Newegg reviews, although I now have a spare WD40EFRX sitting around since I've started to anticipate drive failure despite initially stressing these disks with badblocks and extended SMART testing when I received them. In the case of the other two drives, I received a SMART error notification from FreeNAS. After performing an extended SMART test on those drives to confirm the problem, I removed them from the NAS and placed them in a separate computer. I ran the Western Digital diagnostics utility to confirm that the disks themselves were failing, before sending them back to WD.

This drive that I am currently having problems with has so far returned successful SMART results. The error it keeps giving me is unlike the previous errors from the other two disks.

I suppose it's possible that the power supply could be bad. I'm hoping it's not though, as I put extra money towards buying a quality one when I built the machine. The PSU is a PC Power & Cooling Silencer Mk III Series 750W. Is there an easier way to test the PSU other than swapping out the PSU with a new one?

I suppose I could take two of the mini-SAS to SATA cables and swap them, and see if errors start occurring on the other drives. I currently have two IBM M1015 cards installed to support the 14 drives. I don't use any multipliers other than mini-SAS to 4-SATA breakout cables.

If you have any advice in the priority of how I should conduct these tests, I'd be very grateful. Downtime is not a big deal with this machine as it is just being used in a home environment.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
You check out the driver version for your FreeNAS system on bootup. dmesg will tell you the version. It'll say something like

mp0.. driver: X.Y.Z. firmware: A.B.C.

As long as they are both 14, 15, 16, etc. then you are fine.

I'm not a fan of PC Power and Cooling.. at all. The best advice is to swap with another PSU. There's no other "good" test unless you want to use an oscilloscope.

Cable problems are very unlikely. I'd look at those last.
 

soulbabel

Cadet
Joined
Nov 23, 2013
Messages
8
Hi all, I was hoping we could revisit this post. I ended up swapping out the drive that was giving the "SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range)" error noted in the opening post, with a spare WD40EFRX back in late February. It's been running error free for almost 6 months now with the rest of the configuration still the same (PSU, cables, memory, etc). As far as I can tell, I was able to get rid of the error by simply swapping the HDD so I'm pretty sure it was the drive itself that was problematic.

My question now, is how do I go about RMAing this drive because no errors are found when I run an extended SMART test on it using Western Digital's provided tool. I've tried multiple attempts which were conducted in both the FreeNAS box and a different Windows OS machine. I know the drive is probably bad, but since it is passing all the diagnostic tests that I know about, I'm afraid I might get charged for sending in a supposedly good drive. Do they usually run additional tests that can probably catch the error?
 

Z300M

Guru
Joined
Sep 9, 2011
Messages
882
Hi all, I was hoping we could revisit this post. I ended up swapping out the drive that was giving the "SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range)" error noted in the opening post, with a spare WD40EFRX back in late February. It's been running error free for almost 6 months now with the rest of the configuration still the same (PSU, cables, memory, etc). As far as I can tell, I was able to get rid of the error by simply swapping the HDD so I'm pretty sure it was the drive itself that was problematic.

My question now, is how do I go about RMAing this drive because no errors are found when I run an extended SMART test on it using Western Digital's provided tool. I've tried multiple attempts which were conducted in both the FreeNAS box and a different Windows OS machine. I know the drive is probably bad, but since it is passing all the diagnostic tests that I know about, I'm afraid I might get charged for sending in a supposedly good drive. Do they usually run additional tests that can probably catch the error?
Does WD have a bootable diagnostics utility (CD or Flash drive) that detects more than just SMART errors? (Seagate does, so maybe WD does also.)
 

soulbabel

Cadet
Joined
Nov 23, 2013
Messages
8
I went to their website to check and they only seem to list a "Data Lifeguard Diagnostic" for DOS and for Windows. They provide a description of the available tests on the download page for the DOS version, and it looks like it's pretty identical to what was available in the Windows version which is the version I used to test with previously. Unfortunately I didn't see anything with more advanced testing capabilities available.
 

FreeN@s!

Dabbler
Joined
Aug 18, 2016
Messages
12
Hi,

Just for the record, i'm hitting same error on one of my disks, /dev/da4 which is WD30EFRX
Resilvered amount is very low, around ~200K

I did not replace drive yet, will see how it goes. Drive is under warranty, but SMART did not report anything, so i don't think i'll get it replaced.

@soulbabel, did your warranty replace drive ?

Code:
root@freenas:~ # zpool status
  pool: ZFSMirror
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
		attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
		using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-9P
  scan: resilvered 176K in 0 days 00:00:01 with 0 errors on Wed Aug 29 19:54:49 2018
config:

		NAME											STATE	 READ WRITE CKSUM
		ZFSMirror									   ONLINE	   0	 0	 0
		  mirror-0									  ONLINE	   0	 0	 0
			gptid/ad7bf1a7-3f90-11e7-8925-645106d88468  ONLINE	   0	 0	 0
			gptid/4ce56666-fd68-11e7-84ff-000c29e400bb  ONLINE	   0	 0	 0
		  mirror-1									  ONLINE	   0	 0	 0
			gptid/19b82900-c37c-11e7-b4c2-000c29e400bb  ONLINE	   0	 1	 0
			gptid/592450be-fb78-11e7-a9f2-000c29e400bb  ONLINE	   0	 0	 0

errors: No known data errors


upload_2018-8-29_21-17-22.png
 
Last edited:

IceBoosteR

Guru
Joined
Sep 27, 2016
Messages
503
Hi,

Just for the record, i'm hitting same error on one of my disks, /dev/da4 which is WD30EFRX
Resilvered amount is very low, around ~200K

I did not replace drive yet, will see how it goes. Drive is under warranty, but SMART did not report anything, so i don't think i'll get it replaced.

@soulbabel, did your warranty replaced drive ?

Code:
root@freenas:~ # zpool status
  pool: ZFSMirror
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
		attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
		using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-9P
  scan: resilvered 176K in 0 days 00:00:01 with 0 errors on Wed Aug 29 19:54:49 2018
config:

		NAME											STATE	 READ WRITE CKSUM
		ZFSMirror									   ONLINE	   0	 0	 0
		  mirror-0									  ONLINE	   0	 0	 0
			gptid/ad7bf1a7-3f90-11e7-8925-645106d88468  ONLINE	   0	 0	 0
			gptid/4ce56666-fd68-11e7-84ff-000c29e400bb  ONLINE	   0	 0	 0
		  mirror-1									  ONLINE	   0	 0	 0
			gptid/19b82900-c37c-11e7-b4c2-000c29e400bb  ONLINE	   0	 1	 0
			gptid/592450be-fb78-11e7-a9f2-000c29e400bb  ONLINE	   0	 0	 0

errors: No known data errors


View attachment 25401
How long did you run your system until you observe the error?
For year with the same hardware, or is it something new?
Is SMART showing you anything?
Did this error happens in the past or is this the first time?
 

FreeN@s!

Dabbler
Joined
Aug 18, 2016
Messages
12
@IceBoosteR,

System ir running for 1year + with same configuration, on same hardware.
Error started ~1-2 month ago. I receive error once every ~2-3 weeks.

SMART looks good:

Code:
root@freenas:~ # smartctl -a /dev/da4
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:	 Western Digital Red
Device Model:	 WDC WD30EFRX-68EUZN0
Serial Number:	WD-WCC4N6YPZVP6
LU WWN Device Id: 5 0014ee 2629d98a2
Firmware Version: 82.00A82
User Capacity:	3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:	 512 bytes logical, 4096 bytes physical
Rotation Rate:	5400 rpm
Device is:		In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:	Wed Aug 29 21:33:44 2018 EEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
										was never started.
										Auto Offline Data Collection: Disabled.
Self-test execution status:	  (   0) The previous self-test routine completed
										without error or no self-test has ever
										been run.
Total time to complete Offline
data collection:				(40680) seconds.
Offline data collection
capabilities:					(0x7b) SMART execute Offline immediate.
										Auto Offline data collection on/off support.
										Suspend Offline collection upon new
										command.
										Offline surface scan supported.
										Self-test supported.
										Conveyance Self-test supported.
										Selective Self-test supported.
SMART capabilities:			(0x0003) Saves SMART data before entering
										power-saving mode.
										Supports SMART auto save timer.
Error logging capability:		(0x01) Error logging supported.
										General Purpose Logging supported.
Short self-test routine
recommended polling time:		(   2) minutes.
Extended self-test routine
recommended polling time:		( 408) minutes.
Conveyance self-test routine
recommended polling time:		(   5) minutes.
SCT capabilities:			  (0x703d) SCT Status supported.
										SCT Error Recovery Control supported.
										SCT Feature Control supported.
										SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x002f   200   200   051	Pre-fail  Always	   -	   0
  3 Spin_Up_Time			0x0027   182   181   021	Pre-fail  Always	   -	   5866
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -	   240
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x002e   200   200   000	Old_age   Always	   -	   0
  9 Power_On_Hours		  0x0032   080   080   000	Old_age   Always	   -	   14788
 10 Spin_Retry_Count		0x0032   100   100   000	Old_age   Always	   -	   0
 11 Calibration_Retry_Count 0x0032   100   100   000	Old_age   Always	   -	   0
 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   224
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   209
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   633
194 Temperature_Celsius	 0x0022   118   112   000	Old_age   Always	   -	   32
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0030   100   253   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -	   0
200 Multi_Zone_Error_Rate   0x0008   200   200   000	Old_age   Offline	  -	   0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline	Completed without error	   00%	 14672		 -
# 2  Short offline	   Completed without error	   00%	 14409		 -
# 3  Extended offline	Completed without error	   00%	  3882		 -
# 4  Extended offline	Completed without error	   00%	  3162		 -
# 5  Extended offline	Completed without error	   00%	  2421		 -
# 6  Extended offline	Completed without error	   00%	  1760		 -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
	1		0		0  Not_testing
	2		0		0  Not_testing
	3		0		0  Not_testing
	4		0		0  Not_testing
	5		0		0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 

IceBoosteR

Guru
Joined
Sep 27, 2016
Messages
503
Hi,
the drive looks good. Normally when these errors do occur, something else might be wrong. Could be something from a bad SATA/SAS cable to a faulty or to hot storage controller to a faulty PSU. that is really hard to troubleshhot. Is it always the same drive?
If yes, you may want to switch cables/ports to make sure that it is not the drive and you may find out the root cause. I know this is causing work, but I can say from my personal experience, that the time is worth it, otherwise you may have data loss...
-Ice
 

FreeN@s!

Dabbler
Joined
Aug 18, 2016
Messages
12
Hi,
the drive looks good. Normally when these errors do occur, something else might be wrong. Could be something from a bad SATA/SAS cable to a faulty or to hot storage controller to a faulty PSU. that is really hard to troubleshhot. Is it always the same drive?
If yes, you may want to switch cables/ports to make sure that it is not the drive and you may find out the root cause. I know this is causing work, but I can say from my personal experience, that the time is worth it, otherwise you may have data loss...
-Ice

Hi,
thanks for a reply.
I did change backplane port ( added drive to another slot), checked with another controller. Still looks same. It is always same drive.
I strongly believe it is a drive problem, as in the original post in this topic, problem gone after changing drive. I'll try to do that, and change with WD30EFRX from my remote server. I'll report status later. It looks this is quite rare issue.
 

IceBoosteR

Guru
Joined
Sep 27, 2016
Messages
503
Hi,
thanks for a reply.
I did change backplane port ( added drive to another slot), checked with another controller. Still looks same. It is always same drive.
I strongly believe it is a drive problem, as in the original post in this topic, problem gone after changing drive. I'll try to do that, and change with WD30EFRX from my remote server. I'll report status later. It looks this is quite rare issue.
Thats true. Those SCSI error are rare. It depends on the error message, but often they're hard to troubleshoot.
Have you done some scrubs in the past? Because that is only a write error...

Anyway I will wait for any information/update you can provide.
 

FreeN@s!

Dabbler
Joined
Aug 18, 2016
Messages
12
Could be something from a bad SATA/SAS cable to a faulty or to hot storage controller to a faulty
-Ice
you mean too hot storage controller can cause this ? In my DELL T320 there is only 1 x 120mm fan which is in the back and duct limit air flow only through disks and cpu.
No air passes through LSI controller, and it gets really really hot along with intel dell i350-t4 nic.. I'll consider adding small fan maybe

yes, i'm doing scrubs, no problem, all good.
 

IceBoosteR

Guru
Joined
Sep 27, 2016
Messages
503
you mean too hot storage controller can cause this ? In my DELL T320 there is only 1 x 120mm fan which is in the back and duct limit air flow only through disks and cpu.
No air passes through LSI controller, and it gets really really hot along with intel dell i350-t4 nic.. I'll consider adding small fan maybe

yes, i'm doing scrubs, no problem, all good.
Hi,
yes I have read something similar. MAybe you have an IR thermal device (dont know the correct word for it) where you can get the surface temperatur in operation on the controller-(heatsink). This problem is very rare, and normally you would have the problems on all drives, but special problems can lead to special problems. On the other side, your disk temperatur looks very good, so maybe this applies to the LSI controller aswell....
 
Status
Not open for further replies.
Top