Errors during scrub, but all hard drives show normal Smart data

Status
Not open for further replies.

joel3452

Dabbler
Joined
Oct 13, 2016
Messages
36
So I have been running my Freenas server for ~ 4 months and during that time 3 of the scrubs have come up with minor errors similar to the one below. The latest scrub gave the following error messages: I will attach all of the information as one big file too for easier viewing.

Code:
freenas.local kernel log messages:
>	   (da8:mpr0:0:7:0): READ(16). CDB: 88 00 00 00 00 01 a0 87 57 b0 00 00 00 e0 00 00 length 114688 SMID 770 terminated ioc 804b scsi 0 state 0 xfer 0
> (da8:mpr0:0:7:0): READ(16). CDB: 88 00 00 00 00 01 a0 87 57 b0 00 00 00 e0 00 00
> (da8:mpr0:0:7:0): CAM status: CCB request completed with an error
> (da8:mpr0:0:7:0): Retrying command
> (da8:mpr0:0:7:0): READ(16). CDB: 88 00 00 00 00 01 a0 87 56 c8 00 00 00 e8 00 00
> (da8:mpr0:0:7:0): CAM status: SCSI Status Error
> (da8:mpr0:0:7:0): SCSI status: Check Condition
> (da8:mpr0:0:7:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
> (da8:mpr0:0:7:0): Info: 0x1a08756c8
> (da8:mpr0:0:7:0): Error 5, Unretryable error
>	   (da3:mpr0:0:2:0): READ(16). CDB: 88 00 00 00 00 01 a2 0b 66 08 00 00 00 30 00 00 length 24576 SMID 509 terminated ioc 804b scsi 0 state 0 xfer 0
> (da3:mpr0:0:2:0): READ(16). CDB: 88 00 00 00 00 01 a2 0b 66 08 00 00 00 30 00 00
> (da3:mpr0:0:2:0): CAM status: CCB request completed with an error
> (da3:mpr0:0:2:0): Retrying command
> (da3:mpr0:0:2:0): READ(16). CDB: 88 00 00 00 00 01 a2 0b 65 28 00 00 00 e0 00 00
> (da3:mpr0:0:2:0): CAM status: SCSI Status Error
> (da3:mpr0:0:2:0): SCSI status: Check Condition
> (da3:mpr0:0:2:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
> (da3:mpr0:0:2:0): Info: 0x1a20b6528
> (da3:mpr0:0:2:0): Error 5, Unretryable error
-- End of security output --

[root@freenas] ~# zpool status
  pool: freenas-boot
state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Mon Jan 16 03:45:11 2017
config:

		NAME		STATE	 READ WRITE CKSUM
		freenas-boot  ONLINE	   0	 0	 0
		  da0p2	 ONLINE	   0	 0	 0

errors: No known data errors

  pool: tank
state: ONLINE
  scan: scrub repaired 176K in 12h56m with 0 errors on Wed Jan 18 10:56:07 2017
config:

		NAME											STATE	 READ WRITE CKSUM
		tank											ONLINE	   0	 0	 0
		  raidz2-0									  ONLINE	   0	 0	 0
			gptid/c375e1a1-a141-11e6-8641-0cc47ac565d4  ONLINE	   0	 0	 0
			gptid/c42a5d2c-a141-11e6-8641-0cc47ac565d4  ONLINE	   0	 0	 0
			gptid/c4d3f78e-a141-11e6-8641-0cc47ac565d4  ONLINE	   0	 0	 0
			gptid/c576f47a-a141-11e6-8641-0cc47ac565d4  ONLINE	   0	 0	 0
			gptid/c621429c-a141-11e6-8641-0cc47ac565d4  ONLINE	   0	 0	 0
			gptid/c6c8e874-a141-11e6-8641-0cc47ac565d4  ONLINE	   0	 0	 0
			gptid/c771fa0c-a141-11e6-8641-0cc47ac565d4  ONLINE	   0	 0	 0
			gptid/c82043ca-a141-11e6-8641-0cc47ac565d4  ONLINE	   0	 0	 0

errors: No known data errors



My System specs are:
Code:
Motherboard: Supermicro X11SSH-CTF
CPU: Intel Xeon E3-1230 V5 3.4GHz Quad-Core Processor
Memory: Crucial 32GB Kit (2 x 16GB) DDR4-2133 ECC
2* Storage: Sandisk Extreme Pro 240GB 2.5" Solid State Drive for ESX datastore
8* Storage: Western Digital Red 6TB 3.5" 5400RPM Internal Hard Drive ($234.52 @ Newegg)
Case: Fractal Design Define R5 (Black) ATX Mid Tower Case ($119.98 @ Newegg)
Power Supply: SeaSonic 660W 80+ Platinum Certified Fully-Modular ATX Power Supply ($104.99 @ Newegg)
FreeNAS-9.10.2 (a476f16)
ESX 6.0U2 with the Motherboard's LSI controller flashed to IT mode and passthrough to the Freenas VM. The Freenas VM is assigned 2 VCPU and 20 GB reserved memory.


I have the system run a scrub every 2 weeks and have the hard drives run: short test daily, long test weekly. All of the hard drives show "Completed without error " for all tests and have none of the typical trouble signs (ie. pending sectors, reallocated sectors, etc). I have attached all of the hard drives smartctl output in the freenas logs.txt. I The machine is running ECC memory and has an UPS. Before bring the server online memory and hard drives were tested for over a week each. I checked & reattached the SAS and power cables just incase any were loose. I am using these SAS cables: https://www.amazon.com/gp/product/B01GPDBHDY/ and using the stock power cables that came with my PSU (aka no goofy adapters). I don't believe I lost any data since the scrubs show "No known data errors", but still it causes me not to fully trust the server until these issues go away.
Thanks for any help
 

Attachments

  • freenas logs.txt
    59.1 KB · Views: 272

bigphil

Patron
Joined
Jan 30, 2014
Messages
486
Do you know if the errors always occur on da3 and da8?
 

joel3452

Dabbler
Joined
Oct 13, 2016
Messages
36
Last scrub when this happened it was on da4. But I did restart in between the scrubs, so not sure if the da numbers always match to the same physical ports or if they vary with reboots.
 

bigphil

Patron
Joined
Jan 30, 2014
Messages
486
Well, the smart data looks good to me. Maybe something going on with the controller. Do you have a PCIe HBA you could try?

Edit: removed incorrect info
 
Last edited:

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
So I have been running my Freenas server for ~ 4 months and during that time 3 of the scrubs have come up with minor errors similar to the one below. The latest scrub gave the following error messages: I will attach all of the information as one big file too for easier viewing.

Code:
freenas.local kernel log messages:
>	   (da8:mpr0:0:7:0): READ(16). CDB: 88 00 00 00 00 01 a0 87 57 b0 00 00 00 e0 00 00 length 114688 SMID 770 terminated ioc 804b scsi 0 state 0 xfer 0
> (da8:mpr0:0:7:0): READ(16). CDB: 88 00 00 00 00 01 a0 87 57 b0 00 00 00 e0 00 00
> (da8:mpr0:0:7:0): CAM status: CCB request completed with an error
> (da8:mpr0:0:7:0): Retrying command
> (da8:mpr0:0:7:0): READ(16). CDB: 88 00 00 00 00 01 a0 87 56 c8 00 00 00 e8 00 00
> (da8:mpr0:0:7:0): CAM status: SCSI Status Error
> (da8:mpr0:0:7:0): SCSI status: Check Condition
> (da8:mpr0:0:7:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
> (da8:mpr0:0:7:0): Info: 0x1a08756c8
> (da8:mpr0:0:7:0): Error 5, Unretryable error
>	   (da3:mpr0:0:2:0): READ(16). CDB: 88 00 00 00 00 01 a2 0b 66 08 00 00 00 30 00 00 length 24576 SMID 509 terminated ioc 804b scsi 0 state 0 xfer 0
> (da3:mpr0:0:2:0): READ(16). CDB: 88 00 00 00 00 01 a2 0b 66 08 00 00 00 30 00 00
> (da3:mpr0:0:2:0): CAM status: CCB request completed with an error
> (da3:mpr0:0:2:0): Retrying command
> (da3:mpr0:0:2:0): READ(16). CDB: 88 00 00 00 00 01 a2 0b 65 28 00 00 00 e0 00 00
> (da3:mpr0:0:2:0): CAM status: SCSI Status Error
> (da3:mpr0:0:2:0): SCSI status: Check Condition
> (da3:mpr0:0:2:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
> (da3:mpr0:0:2:0): Info: 0x1a20b6528
> (da3:mpr0:0:2:0): Error 5, Unretryable error
-- End of security output --

[root@freenas] ~# zpool status
  pool: freenas-boot
state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Mon Jan 16 03:45:11 2017
config:

		NAME		STATE	 READ WRITE CKSUM
		freenas-boot  ONLINE	   0	 0	 0
		  da0p2	 ONLINE	   0	 0	 0

errors: No known data errors

  pool: tank
state: ONLINE
  scan: scrub repaired 176K in 12h56m with 0 errors on Wed Jan 18 10:56:07 2017
config:

		NAME											STATE	 READ WRITE CKSUM
		tank											ONLINE	   0	 0	 0
		  raidz2-0									  ONLINE	   0	 0	 0
			gptid/c375e1a1-a141-11e6-8641-0cc47ac565d4  ONLINE	   0	 0	 0
			gptid/c42a5d2c-a141-11e6-8641-0cc47ac565d4  ONLINE	   0	 0	 0
			gptid/c4d3f78e-a141-11e6-8641-0cc47ac565d4  ONLINE	   0	 0	 0
			gptid/c576f47a-a141-11e6-8641-0cc47ac565d4  ONLINE	   0	 0	 0
			gptid/c621429c-a141-11e6-8641-0cc47ac565d4  ONLINE	   0	 0	 0
			gptid/c6c8e874-a141-11e6-8641-0cc47ac565d4  ONLINE	   0	 0	 0
			gptid/c771fa0c-a141-11e6-8641-0cc47ac565d4  ONLINE	   0	 0	 0
			gptid/c82043ca-a141-11e6-8641-0cc47ac565d4  ONLINE	   0	 0	 0

errors: No known data errors



My System specs are:
Code:
Motherboard: Supermicro X11SSH-CTF
CPU: Intel Xeon E3-1230 V5 3.4GHz Quad-Core Processor
Memory: Crucial 32GB Kit (2 x 16GB) DDR4-2133 ECC
2* Storage: Sandisk Extreme Pro 240GB 2.5" Solid State Drive for ESX datastore
8* Storage: Western Digital Red 6TB 3.5" 5400RPM Internal Hard Drive ($234.52 @ Newegg)
Case: Fractal Design Define R5 (Black) ATX Mid Tower Case ($119.98 @ Newegg)
Power Supply: SeaSonic 660W 80+ Platinum Certified Fully-Modular ATX Power Supply ($104.99 @ Newegg)
FreeNAS-9.10.2 (a476f16)
ESX 6.0U2 with the Motherboard's LSI controller flashed to IT mode and passthrough to the Freenas VM. The Freenas VM is assigned 2 VCPU and 20 GB reserved memory.


I have the system run a scrub every 2 weeks and have the hard drives run: short test daily, long test weekly. All of the hard drives show "Completed without error " for all tests and have none of the typical trouble signs (ie. pending sectors, reallocated sectors, etc). I have attached all of the hard drives smartctl output in the freenas logs.txt. I The machine is running ECC memory and has an UPS. Before bring the server online memory and hard drives were tested for over a week each. I checked & reattached the SAS and power cables just incase any were loose. I am using these SAS cables: https://www.amazon.com/gp/product/B01GPDBHDY/ and using the stock power cables that came with my PSU (aka no goofy adapters). I don't believe I lost any data since the scrubs show "No known data errors", but still it causes me not to fully trust the server until these issues go away.
Thanks for any help

You seem to be running an All-in-One w/ ESXi booting FreeNAS and passing an HBA through to it, etc. I've got two of these; they work flawlessly. Do you lock all of the memory allocated to the FreeNAS VM? I do this and reserve CPU as well, as shown in this vSphere screenshot:

vmware-freenas-vm-memory-resources.jpg


Doing this may help or it may not. Other than this suggestion... I got nothin'!

Good luck
 

joel3452

Dabbler
Joined
Oct 13, 2016
Messages
36
bigphil, I have an extra pcie HBA, but my motherboard only has one x16 pcie slot which I currently have a HBA in raid 1 in that slot for the ESX datastore. So it would be some work to reconfigure it. I would probably have to import my freenas config onto a flash drive and boot it native to use the extra HBA. And I don't think I can passthrough the Intel sata ports on the board to try those.

Spearfoot, yes I am running an All-in-one passing through the MB LSI SAS controller through to Freenas VM. My resource allocation is set as all 20 GB memory reserved. What does the CPU reservation do in this instance, just guarantee a min cpu time? I only have about 7 vcpu assigned among all of vm's and that is with a 8 thread (4 core cpu). So I have almost 0% readiness in the cluster. My readiness stats for the last 2 hours is Max 0.09 percent max and 0.068 percent avg, and I have never seen it go past half a percent. So well below the recommended 5% readiness. Are there any drawbacks to having a cpu reservation?

Here are my passthrough options, but I don't see the Intel SATA unless I am missing it.
upload_2017-1-19_20-11-18.png
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
device numbers wont change. Well, the smart data looks good to me. Maybe something going on with the controller. Do you have a PCIe HBA you could try?
Incorrect, disk numbers are not guaranteed to be consistent through a reboot.

Sent from my Nexus 5X using Tapatalk
 

bigphil

Patron
Joined
Jan 30, 2014
Messages
486
Never personally seen it, but I'll take your word for it.
 

joel3452

Dabbler
Joined
Oct 13, 2016
Messages
36
So what would be the best order of steps to figure out the root cause?
SAS Cables then try different HBA or something else? With tested ECC memory and clean SMART tests, I'm not sure where to go for the next step.

If next step is cables, Can anyone recommend good quality SFF-8643 --> Sata cables. The ones I got from amazon where decent quality with locking snaps. Looking around on amazon, newegg, etc it seems like fairly slim pickings for SFF-8643 for anything except 8643 to backplane.

Edit: To try the HBA idea, the motherboard's 2nd slot is listed as: 1 PCI-E 3.0 x2 (in x4). So that means it will fit a 4x card, but operate at 2x speed correct?
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
You could try booting freenas off a usb and running a couple scrubs to see if the problem isn't related to your virtualzation.

Sent from my Nexus 5X using Tapatalk
 

dcevansiii

Dabbler
Joined
Sep 9, 2013
Messages
22
A couple of thoughts:

1. Your data is safe, because ZFS is doing its job and correcting any corruption that has occurred. Even if you never find the root cause, you could still run the pool with these errors. (Am I wrong in this statement? anyone?) The errors certainly don't give you peace of mind though.
2. I'll second what SweetAndLow said: The device numbers can definitely change on reboot.
3. Do the read errors happen to occur during the scrubs of the pools? I couldn't tell if that was the case from what you provided.
4. Does it always happen with the same drive? (tracked by serial number, not daX)
5. How's your backup plan? ;)
 

joel3452

Dabbler
Joined
Oct 13, 2016
Messages
36
sweetandlow,
So just export the config, install the same version on flash drive and then import the config? And since ESX has been passing through the card, there should be no difference right? That was what I did during the install setup in reverse. Used Freenas for a few weeks standalone and then imported into esx vm.

dcevansiii,
1. Glad to hear the data is okay
2. noted
3. Yes the errors only occur during scrubs. There has never been any errors in normal operation or any smart tests, etc.
4. How is the best way to match up serial number to daX to track this for the future?
5. Current backup is actual important stuff, pictures, documents, etc copied to USB hard drive stored in firebox in the closet. About the only feasible backup for movies/tv would be for me to cold storage them and store them at work/etc. I would like to setup a cloud backup or something automated for the actual important things but most of them seem to have severe drawbacks. It looks like Crashplan is about the one supported natively and is apparently slow for many users in plugin form. But I guess I could install it into a VM running linux and mount the shares.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Here are my passthrough options, but I don't see the Intel SATA unless I am missing it.
That's not unusual. Intel SATA is notoriously hit-and-miss for passthrough.
 
Status
Not open for further replies.
Top