SOLVED One or more devices has experienced an error resulting in data corruption (freenas-boot)

Status
Not open for further replies.

tomasi

Cadet
Joined
Nov 18, 2017
Messages
7
I have new HP 8 gen micro server with following disk configuration:
1. slot WD SSD (boot)
2. slot 8 TB HDD
3. slot 8 TB HDD
4. slot 8 TB HDD
storage controller is AHCI mode, boot from slot1, disks in slot 2-3 are zpool in zraid1.
I moved system dataset to freenas-boot (because of ssd)

Server is after boot showing alert:
CRITICAL: Nov. 19, 2017, 8:19 a.m. - The boot volume state is ONLINE: One or more devices has experienced an error resulting in data corruption. Applications may be affected.

Need help to identify what is wrong. I tried already FreeNAS reinstall several times. Is system SSD lemon?
Or point me to some direction.

In details it looks like this:

Code:
root@freenas:/var/db/system/syslog-76c11d7f8a944b3d8e42fe35420dbaa3/log # zpool status -v
  pool: freenas-boot
 state: ONLINE
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: none requested
config:

	NAME		STATE	 READ WRITE CKSUM
	freenas-boot  ONLINE	   0	 0	 7
	  ada0p2	ONLINE	   0	 0	17

errors: Permanent errors have been detected in the following files:

		/var/db/system/syslog-76c11d7f8a944b3d8e42fe35420dbaa3/log/utx.lastlogin
		/var/db/system/syslog-76c11d7f8a944b3d8e42fe35420dbaa3/log/samba4/log.wb-FREENAS
		/var/db/system/syslog-76c11d7f8a944b3d8e42fe35420dbaa3/log/samba4/log.winbindd-idmap
		/var/db/system/rrd-76c11d7f8a944b3d8e42fe35420dbaa3/freenas.local/df-mnt-h8gen_nas/df_complex-used.rrd

  pool: h8gen_nas
 state: ONLINE
  scan: none requested
config:

	NAME											STATE	 READ WRITE CKSUM
	h8gen_nas									   ONLINE	   0	 0	 0
	  raidz1-0									  ONLINE	   0	 0	 0
		gptid/ff3ec5c5-c57b-11e7-84fa-00fd45fc8468  ONLINE	   0	 0	 0
		gptid/ffe3e4db-c57b-11e7-84fa-00fd45fc8468  ONLINE	   0	 0	 0
		gptid/00871210-c57c-11e7-84fa-00fd45fc8468  ONLINE	   0	 0	 0

errors: No known data errors


Code:
root@freenas:/var/db/system/syslog-76c11d7f8a944b3d8e42fe35420dbaa3/log # less /var/db/system/syslog-76c11d7f8a944b3d8e42fe35420dbaa3/log/samba4/log.wb-FREENAS
read error  (press RETURN)


Code:
root@freenas:/ # df -h .
Filesystem				   Size	Used   Avail Capacity  Mounted on
freenas-boot/ROOT/default	108G	727M	107G	 1%	/



after running smartctl:
Code:
root@freenas:/var/db/system/syslog-76c11d7f8a944b3d8e42fe35420dbaa3/log # camcontrol devlist
<WDC WDS120G2G0A-00JH30 UE300000>  at scbus0 target 0 lun 0 (pass0,ada0)
<ST8000VN0022-2EL112 SC61>		 at scbus1 target 0 lun 0 (pass1,ada1)
<ST8000VN0022-2EL112 SC61>		 at scbus2 target 0 lun 0 (pass2,ada2)
<ST8000VN0022-2EL112 SC61>		 at scbus3 target 0 lun 0 (pass3,ada3)
root@freenas:/var/db/system/syslog-76c11d7f8a944b3d8e42fe35420dbaa3/log # smartctl -t long /dev/ada0
smartctl 6.5 2016-05-07 r4318 [FreeBSD 11.0-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 21 minutes for test to complete.
Test will complete after Sun Nov 19 09:26:24 2017


test completed without error:

Code:
root@freenas:~ # smartctl -l selftest /dev/ada0
smartctl 6.5 2016-05-07 r4318 [FreeBSD 11.0-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline	Completed without error	   00%	   112		 -
# 2  Short offline	   Completed without error	   00%		27		 -




here is hw configuration:

Build
FreeNAS-11.0-U4 (54848d13b)
Platform Intel(R) Xeon(R) CPU E3-1220L V2 @ 2.30GHz
Memory 16312MB (ECC)
 
Last edited by a moderator:

tomasi

Cadet
Joined
Nov 18, 2017
Messages
7
edit:
after restart another corruption in
freenas-boot/ROOT/default:<0x0>
freenas-boot/ROOT/default:<0x203bb>:
Code:
root@freenas:/nonexistent # zpool status -v
  pool: freenas-boot
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: none requested
config:

	NAME		STATE	 READ WRITE CKSUM
	freenas-boot  DEGRADED	 0	 0	 4
	  ada0p2	DEGRADED	 0	 0	 8  too many errors

errors: Permanent errors have been detected in the following files:

		freenas-boot/ROOT/default:<0x0>
		freenas-boot/ROOT/default:<0x203bb>
		/var/db/system/syslog-76c11d7f8a944b3d8e42fe35420dbaa3/log/utx.lastlogin
		/var/db/system/syslog-76c11d7f8a944b3d8e42fe35420dbaa3/log/samba4/log.wb-FREENAS
		/var/db/system/syslog-76c11d7f8a944b3d8e42fe35420dbaa3/log/samba4/log.winbindd-idmap
		/var/db/system/rrd-76c11d7f8a944b3d8e42fe35420dbaa3/freenas.local/df-mnt-h8gen_nas/df_complex-used.rrd
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Your boot SSD is clearly crapping out for some reason. Replace it and reinstall to new media.

Might as well take a look at the SMART data before doing so, though.
 

tomasi

Cadet
Joined
Nov 18, 2017
Messages
7
it must be SSD...
SMART data bellow, but nothing i can point.


Code:
root@freenas:/nonexistent # smartctl  /dev/ada0
smartctl 6.5 2016-05-07 r4318 [FreeBSD 11.0-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

ATA device successfully opened

Use 'smartctl -a' (or '-x') to print SMART (and more) information

root@freenas:/nonexistent # smartctl -a /dev/ada0
smartctl 6.5 2016-05-07 r4318 [FreeBSD 11.0-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:	 WDC WDS120G2G0A-00JH30
Serial Number:	173920800722
LU WWN Device Id: 5 001b44 8b43bdd5a
Firmware Version: UE300000
User Capacity:	120,040,980,480 bytes [120 GB]
Sector Size:	  512 bytes logical/physical
Rotation Rate:	Solid State Device
Form Factor:	  2.5 inches
Device is:		Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:	Sun Nov 19 20:00:58 2017 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:	  (  32)	The self-test routine was interrupted
					by the host with a hard or soft reset.
Total time to complete Offline
data collection:		 (  120) seconds.
Offline data collection
capabilities:			 (0x15) SMART execute Offline immediate.
					No Auto Offline data collection support.
					Abort Offline collection upon new
					command.
					No Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					No Selective Self-test supported.
SMART capabilities:			(0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:		(0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine
recommended polling time:	 (   2) minutes.
Extended self-test routine
recommended polling time:	 (  21) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0032   100   100   000	Old_age   Always	   -	   0
  9 Power_On_Hours		  0x0032   100   100   000	Old_age   Always	   -	   113
 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   20
165 Unknown_Attribute	   0x0032   100   100   000	Old_age   Always	   -	   5
166 Unknown_Attribute	   0x0032   100   100   ---	Old_age   Always	   -	   0
167 Unknown_Attribute	   0x0032   100   100   ---	Old_age   Always	   -	   0
168 Unknown_Attribute	   0x0032   100   100   ---	Old_age   Always	   -	   0
169 Unknown_Attribute	   0x0032   100   100   ---	Old_age   Always	   -	   53
170 Unknown_Attribute	   0x0032   100   100   ---	Old_age   Always	   -	   0
171 Unknown_Attribute	   0x0032   100   100   000	Old_age   Always	   -	   0
172 Unknown_Attribute	   0x0032   100   100   000	Old_age   Always	   -	   0
173 Unknown_Attribute	   0x0032   100   100   000	Old_age   Always	   -	   0
174 Unknown_Attribute	   0x0032   100   100   000	Old_age   Always	   -	   9
184 End-to-End_Error		0x0032   100   100   ---	Old_age   Always	   -	   0
187 Reported_Uncorrect	  0x0032   100   100   000	Old_age   Always	   -	   0
188 Command_Timeout		 0x0032   100   100   ---	Old_age   Always	   -	   0
194 Temperature_Celsius	 0x0022   070   055   000	Old_age   Always	   -	   30 (Min/Max 12/55)
199 UDMA_CRC_Error_Count	0x0032   100   100   ---	Old_age   Always	   -	   0
230 Unknown_SSD_Attribute   0x0032   100   100   000	Old_age   Always	   -	   4294967297
232 Available_Reservd_Space 0x0033   100   100   005	Pre-fail  Always	   -	   100
233 Media_Wearout_Indicator 0x0032   100   100   ---	Old_age   Always	   -	   0
234 Unknown_Attribute	   0x0032   100   100   000	Old_age   Always	   -	   28
241 Total_LBAs_Written	  0x0030   100   100   000	Old_age   Offline	  -	   8
242 Total_LBAs_Read		 0x0030   100   100   000	Old_age   Offline	  -	   20
244 Unknown_Attribute	   0x0032   000   100   ---	Old_age   Always	   -	   0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline	Completed without error	   00%	   112		 -
# 2  Short offline	   Completed without error	   00%		27		 -

Selective Self-tests/Logging not supported

root@freenas:/nonexistent #
 
Last edited by a moderator:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Yeah, this version of smartctl has close to no idea what the SMART parameters really are on the drive.
 

Green750one

Dabbler
Joined
Mar 16, 2015
Messages
36
You could try replacing the cable on the SSD first. I've had numerous issues because of crappy cables!

Sent from my G3221 using Tapatalk
 

tomasi

Cadet
Joined
Nov 18, 2017
Messages
7
I got WD green SSD back from RMA. Got new one in new box.
Guess what?
Same error appeared after reboot :

pool: freenas-boot
state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://illumos.org/msg/ZFS-8000-8A
scan: none requested
config:
NAME STATE READ WRITE CKSUM
freenas-boot DEGRADED 0 0 658
ada0p2 DEGRADED 0 0 2.57K too many errors
errors: Permanent errors have been detected in the following files:
//usr/local/www/freenasUI/system/ixselftests/__pycache__
//usr/local/www/freenasUI/support/__pycache__
//usr/local/www/freenasUI/system/alertmods/__pycache__
/var/db/system/rrd-76c11d7f8a944b3d8e42fe35420dbaa3/freenas.local/geom_stat/geom_ops_rwd-ada0p1.rrd


So.
SSD was good, but there was still something going on badly... Somehow restart of a server been corruption files

I ordered new SSD - another type Kingston 120GB SSDNow UV400, now after fresh install looks fine.
My idea is, that either controller of WD green (Controller Silicon Motion SM2256S) is incompatible with freeBSD or HP microserver gen8 or B120i Controller ....
Kingston has Controller Marvell 88SS1074.
 
Last edited:

Green750one

Dabbler
Joined
Mar 16, 2015
Messages
36
I got WD green SSD back from RMA.
Guess what?
Same error appeared after reboot :

pool: freenas-boot
state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://illumos.org/msg/ZFS-8000-8A
scan: none requested
config:
NAME STATE READ WRITE CKSUM
freenas-boot DEGRADED 0 0 658
ada0p2 DEGRADED 0 0 2.57K too many errors
errors: Permanent errors have been detected in the following files:
//usr/local/www/freenasUI/system/ixselftests/__pycache__
//usr/local/www/freenasUI/support/__pycache__
//usr/local/www/freenasUI/system/alertmods/__pycache__
/var/db/system/rrd-76c11d7f8a944b3d8e42fe35420dbaa3/freenas.local/geom_stat/geom_ops_rwd-ada0p1.rrd


So.
SSD was good, but there was still something going on badly...

I ordered new SSD - another type Kingston 120GB SSDNow UV400, now after fresh install looks fine. (I made about 10 installs of freenas ...)
My idea is, that either controller of WD green (Controller Silicon Motion SM2256S) is incompatible with freeBDS or HP microserver gen8 or B120i Controller ....
Kingston has Controller Marvell 88SS1074.
The other thing to check is the psu and how you have everything cabled. I've had issues with running too many devices on a single channel.
And I know you've re-cabled, but I'd do it again. I've also had issues with rubbish sata cables

Sent from my G3221 using Tapatalk
 

tomasi

Cadet
Joined
Nov 18, 2017
Messages
7
but there is no psu/cable problem see pictures:

disks are connected via sas cable
dEdsLGb.jpg


SSD is in bay 3,5'' converter in slot 1

HNEtLe2.jpg


still after more than 24 hours and several restart/shutdowns no problem with kinsgston SSD.
 
Status
Not open for further replies.
Top