Accumulating Checksum Errors on a mirror

Status
Not open for further replies.

myoung

Explorer
Joined
Mar 14, 2018
Messages
70
Over the last few weeks I have been accumulating checksum errors on one of the mirrors in my pool. These errors look like they are accumulating in one of the zvols in the pool. My understanding is that ZFS + regular scrubs should prevent/correct these errors. I there something I am missing that will prevent these in the future?

Freenas-9.10.2-U4
Xeon E5-2680v2
128GB RAM
36x2x4TB HDs

Code:
zpool status -v pool0
  pool: pool0
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
		corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
		entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: scrub repaired 0 in 2h34m with 1 errors on Sun Mar 11 08:19:38 2018
config:

		NAME											STATE	 READ WRITE CKSUM
		pool0										   DEGRADED	 0	 0	26
		  mirror-0									  ONLINE	   0	 0	 0
			gptid/a1f2cc80-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
			gptid/a2975dcd-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
		  mirror-1									  ONLINE	   0	 0	 0
			gptid/a3455744-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
			gptid/a3f634a9-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
		  mirror-2									  ONLINE	   0	 0	 0
			gptid/a4a7d6b4-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
			gptid/a552702d-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
		  mirror-3									  ONLINE	   0	 0	 0
			gptid/a603e1a2-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
			gptid/a6b19c7c-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
		  mirror-4									  ONLINE	   0	 0	 0
			gptid/a76721bb-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
			gptid/a80fb9e1-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
		  mirror-5									  ONLINE	   0	 0	 0
			gptid/a8c7b83c-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
			gptid/a970c846-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
		  mirror-6									  ONLINE	   0	 0	 0
			gptid/aa359282-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
			gptid/aae02163-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
		  mirror-7									  ONLINE	   0	 0	 0
			gptid/ab9d8dcb-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
			gptid/ac4ce475-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
		  mirror-8									  ONLINE	   0	 0	 0
			gptid/ad03bb4f-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
			gptid/adb22bc6-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
		  mirror-9									  ONLINE	   0	 0	 0
			gptid/ae6ffe1c-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
			gptid/af1e587f-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
		  mirror-10									 DEGRADED	 0	 0	52
			gptid/afd9a82a-4ae7-11e7-9b14-002590e79fed  DEGRADED	 0	 0	52  too many errors
			gptid/b0945ec6-4ae7-11e7-9b14-002590e79fed  DEGRADED	 0	 0	52  too many errors
		  mirror-11									 ONLINE	   0	 0	 0
			gptid/b15866e8-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
			gptid/b206dbdd-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
		  mirror-12									 ONLINE	   0	 0	 0
			gptid/b2cf1fda-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
			gptid/b37f4e47-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 1
		  mirror-13									 ONLINE	   0	 0	 0
			gptid/b444fa3e-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
			gptid/b4f655ba-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
		  mirror-14									 ONLINE	   0	 0	 0
			gptid/b5bc4bf0-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
			gptid/b67151f8-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
		  mirror-15									 ONLINE	   0	 0	 0
			gptid/b73f4c10-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
			gptid/b8030e3a-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
		  mirror-16									 ONLINE	   0	 0	 0
			gptid/b8d4ad3e-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
			gptid/b9932b09-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
		  mirror-17									 ONLINE	   0	 0	 0
			gptid/ba60fc21-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
			gptid/bb131986-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
		  mirror-18									 ONLINE	   0	 0	 0
			gptid/bbdf9570-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
			gptid/bc997455-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
		  mirror-19									 ONLINE	   0	 0	 0
			gptid/bd65681b-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
			gptid/be293d57-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
		  mirror-20									 ONLINE	   0	 0	 0
			gptid/bef608af-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
			gptid/bfb35937-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
		  mirror-21									 ONLINE	   0	 0	 0
			gptid/c088c29f-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
			gptid/c1443922-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
		  mirror-22									 ONLINE	   0	 0	 0
			gptid/c22121f9-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
			gptid/c2da5c57-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
		  mirror-23									 ONLINE	   0	 0	 0
			gptid/c3b58017-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
			gptid/c476cfad-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
		  mirror-24									 ONLINE	   0	 0	 0
			gptid/c54ec618-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
			gptid/c6154410-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
		  mirror-25									 ONLINE	   0	 0	 0
			gptid/c6ef4f5a-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
			gptid/c7b02b5d-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
		  mirror-26									 ONLINE	   0	 0	 0
			gptid/c8917dbf-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
			gptid/c956c7e4-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
		  mirror-27									 ONLINE	   0	 0	 0
			gptid/ca37da6e-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
			gptid/cb04eda7-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
		  mirror-28									 ONLINE	   0	 0	 0
			gptid/cbf01c83-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
			gptid/ccc898c5-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
		  mirror-29									 ONLINE	   0	 0	 0
			gptid/cdade9b8-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
			gptid/ce717b87-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 1
		  mirror-30									 ONLINE	   0	 0	 0
			gptid/cf5e5659-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
			gptid/d02b8969-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
		  mirror-31									 ONLINE	   0	 0	 0
			gptid/d11764ec-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
			gptid/d1dcd157-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
		  mirror-32									 ONLINE	   0	 0	 0
			gptid/d2d2ec61-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
			gptid/d3a4fb8e-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
		  mirror-33									 ONLINE	   0	 0	 0
			gptid/d49c18fc-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
			gptid/d5686387-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
		  mirror-34									 ONLINE	   0	 0	 0
			gptid/d65e53b8-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
			gptid/d72ba8a5-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
		  mirror-35									 ONLINE	   0	 0	 0
			gptid/d821ebed-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0
			gptid/d8f26635-4ae7-11e7-9b14-002590e79fed  ONLINE	   0	 0	 0

errors: Permanent errors have been detected in the following files:

		pool0/dss-vdi00:<0x1>
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
These errors look like they are accumulating in one of the zvols in the pool.
It's one of the vdevs, not one of the zvols (zvols are made on a pool, they don't comprise the pool). It's curious that both disks are showing the same number of errors. Is there something that's unique about those two? Are they on a different controller? Different power connector? Something along those lines?
 

myoung

Explorer
Joined
Mar 14, 2018
Messages
70
The disks use the same power supply as the rest of the chassis. Both disks are on the same controller, but also share it with several other disks.

Code:
LSI Corporation SAS2 IR Configuration Utility.
Version 20.00.00.00 (2014.09.18)
Copyright (c) 2008-2014 LSI Corporation. All rights reserved.

Read configuration has been initiated for controller 2
------------------------------------------------------------------------
Controller information
------------------------------------------------------------------------
  Controller type						 : SAS2308_1
  BIOS version							: 7.31.00.00
  Firmware version						: 16.00.01.00
  Channel description					 : 1 Serial Attached SCSI
  Initiator ID							: 0
  Maximum physical devices				: 1023
  Concurrent commands supported		   : 10240
  Slot									: 5
  Segment								 : 0
  Bus									 : 3
  Device								  : 0
  Function								: 0
  RAID Support							: No
------------------------------------------------------------------------
IR Volume information
------------------------------------------------------------------------
------------------------------------------------------------------------
Physical device information
------------------------------------------------------------------------
Initiator at ID #0

...

Device is a Hard disk
  Enclosure #							 : 2
  Slot #								  : 7
  SAS Address							 : 5000cca-0-5c1f-b4e9
  State								   : Ready (RDY)
  Size (in MB)/(in sectors)			   : 3815447/7814037167
  Manufacturer							: HGST
  Model Number							: HUS724040ALS640
  Firmware Revision					   : A1C4
  Serial No							   : PCGKEKHX
  GUID									: N/A
  Protocol								: SAS
  Drive Type							  : SAS_HDD

Device is a Hard disk
  Enclosure #							 : 2
  Slot #								  : 8
  SAS Address							 : 5000cca-0-5c2a-603d
  State								   : Ready (RDY)
  Size (in MB)/(in sectors)			   : 3815447/7814037167
  Manufacturer							: HGST
  Model Number							: HUS724040ALS640
  Firmware Revision					   : A1C4
  Serial No							   : PCGS9GBX
  GUID									: N/A
  Protocol								: SAS
  Drive Type							  : SAS_HDD

...

------------------------------------------------------------------------
Enclosure information
------------------------------------------------------------------------
  Enclosure#							  : 1
  Logical ID							  : 50030480:1506cd00
  Numslots								: 8
  StartSlot							   : 0
  Enclosure#							  : 2
  Logical ID							  : 50030480:00cfb9bf
  Numslots								: 25
  StartSlot							   : 0
------------------------------------------------------------------------
SAS2IRCU: Command DISPLAY Completed Successfully.
SAS2IRCU: Utility Completed Successfully.


I mentioned that the errors were accumulating in a zvol because all the errors were in a zvol named dss-vdi00

Code:
errors: Permanent errors have been detected in the following files:

	   pool0/dss-vdi00:<0x1>
 

tvsjr

Guru
Joined
Aug 29, 2015
Messages
959
How are the data cables configured? Expander backplane or discrete cables?

You do have a good backup, yes?
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
Are you running the IR firmware on your controller? You need to update your firmware on your controller to the latest IT version. There should have been a warning in the GUI telling you that your driver and firmware are mismatched.
 

myoung

Explorer
Joined
Mar 14, 2018
Messages
70
How are the data cables configured? Expander backplane or discrete cables?

Expander backplane

You do have a good backup, yes?

We have good snapshots

Are you running the IR firmware on your controller? You need to update your firmware on your controller to the latest IT version. There should have been a warning in the GUI telling you that your driver and firmware are mismatched.

I don't see any errors in the GUI, but it looks like that controller is running an older firmware version:

Code:
LSI Corporation SAS2 Flash Utility
Version 16.00.00.00 (2013.03.01)
Copyright (c) 2008-2013 LSI Corporation. All rights reserved

		Adapter Selected is a LSI SAS: SAS2308_1(D1)

Num   Ctlr			FW Ver		NVDATA		x86-BIOS		 PCI Addr
----------------------------------------------------------------------------

0  SAS2308_1(D1)   19.00.00.00	11.00.00.04	07.37.00.00	 00:01:00:00
1  SAS2308_1(D1)   19.00.00.00	11.00.00.04	07.37.00.00	 00:02:00:00
2  SAS2308_1(D1)   16.00.01.00	10.00.00.04	07.31.00.00	 00:03:00:00

		Finished Processing Commands Successfully.
		Exiting SAS2Flash.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
There should have been a warning in the GUI telling you that your driver and firmware are mismatched.
No, they took that back out a while back.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504

myoung

Explorer
Joined
Mar 14, 2018
Messages
70
All of them are--current is 20.00.07.

Does this seem like something that would cause data corruption in a mirror?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Does this seem like something that would cause data corruption in a mirror?
I wouldn't expect so; I don't recall hearing of issues with those firmware versions, but it probably wouldn't hurt to flash them all to current.
 

tvsjr

Guru
Joined
Aug 29, 2015
Messages
959
We have good snapshots
Snapshots aren't backups. You are in extreme danger of losing your pool... before you go down the troubleshooting path too far, I would suggest an immediate backup of whatever you can pull off of that pool to some other storage medium.
 
Status
Not open for further replies.
Top