SOLVED Critical error on boot volume

Status
Not open for further replies.

strangelove

Dabbler
Joined
Dec 27, 2017
Messages
14
I am new to FreeNAS. I just finished my first build (Supermicro X11SSM-F, Sandisk Plus SSD as boot device, 16 GB Kingston EEC memory). I already tried several times to install FreeNAS 11.1 onto the Sandisk Plus SSD. No matter what I try, sooner or later I am ending up with a critical error:
Code:
The boot volume state is ONLINE: One or more devices has experienced an error resulting in data corruption. Applications may be affected.


zpool status -v
Code:
  pool: freenas-boot																											
 state: ONLINE																													
status: One or more devices has experienced an error resulting in data															
		corruption.  Applications may be affected.																				
action: Restore the file in question if possible.  Otherwise restore the														
		entire pool from backup.																								
   see: http://illumos.org/msg/ZFS-8000-8A																						
  scan: none requested																											
config:																															
																																
		NAME		STATE	 READ WRITE CKSUM																					
		freenas-boot  ONLINE	   0	 0	 1																				
		  ada0p2	ONLINE	   0	 0	 2																					
																																
errors: Permanent errors have been detected in the following files:																
																																
		//conf/base/etc/remote


smartctl -a /dev/ada0

Code:
smartctl 6.5 2016-05-07 r4318 [FreeBSD 11.1-STABLE amd64] (local build)															
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org														
																																
=== START OF INFORMATION SECTION ===																							
Device Model:	 SanDisk SSD PLUS 120 GB																						
Serial Number:	174500469107																									
LU WWN Device Id: 5 001b44 4a9b2567f																							
Firmware Version: UE3000RL																										
User Capacity:	120,040,980,480 bytes [120 GB]																				
Sector Size:	  512 bytes logical/physical																					
Rotation Rate:	Solid State Device																							
Form Factor:	  2.5 inches																									
Device is:		Not in smartctl database [for details use: -P showall]														
ATA Version is:   ACS-2 T13/2015-D revision 3																					
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)																		
Local Time is:	Thu Jan 11 05:26:10 2018 PST																					
SMART support is: Available - device has SMART capability.																		
SMART support is: Enabled																										
																																
=== START OF READ SMART DATA SECTION ===																						
SMART overall-health self-assessment test result: PASSED																		
																																
General SMART Values:																											
Offline data collection status:  (0x00) Offline data collection activity														
										was never started.																		
										Auto Offline Data Collection: Disabled.													
Self-test execution status:	  (  32) The self-test routine was interrupted													
										by the host with a hard or soft reset.													
Total time to complete Offline																									
data collection:				(  120) seconds.																				
Offline data collection																											
capabilities:					(0x15) SMART execute Offline immediate.														
										No Auto Offline data collection support.												
										Abort Offline collection upon new														
										command.																				
										No Offline surface scan supported.														
										Self-test supported.																	
										No Conveyance Self-test supported.														
										No Selective Self-test supported.														
SMART capabilities:			(0x0003) Saves SMART data before entering														
										power-saving mode.																		
										Supports SMART auto save timer.															
Error logging capability:		(0x01) Error logging supported.																
										General Purpose Logging supported.														
Short self-test routine																											
recommended polling time:		(   2) minutes.																				
Extended self-test routine																										
recommended polling time:		(  21) minutes.

SMART Attributes Data Structure revision number: 1																				
Vendor Specific SMART Attributes with Thresholds:																				
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE								
  5 Reallocated_Sector_Ct   0x0032   100   100   000	Old_age   Always	   -	   0										
  9 Power_On_Hours		  0x0032   100   100   000	Old_age   Always	   -	   4										
 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   32										
165 Unknown_Attribute	   0x0032   100   100   000	Old_age   Always	   -	   3										
166 Unknown_Attribute	   0x0032   100   100   ---	Old_age   Always	   -	   0										
167 Unknown_Attribute	   0x0032   100   100   ---	Old_age   Always	   -	   0										
168 Unknown_Attribute	   0x0032   100   100   ---	Old_age   Always	   -	   0										
169 Unknown_Attribute	   0x0032   100   100   ---	Old_age   Always	   -	   130										
170 Unknown_Attribute	   0x0032   100   100   ---	Old_age   Always	   -	   0										
171 Unknown_Attribute	   0x0032   100   100   000	Old_age   Always	   -	   0										
172 Unknown_Attribute	   0x0032   100   100   000	Old_age   Always	   -	   0										
173 Unknown_Attribute	   0x0032   100   100   000	Old_age   Always	   -	   0										
174 Unknown_Attribute	   0x0032   100   100   000	Old_age   Always	   -	   20										
184 End-to-End_Error		0x0032   100   100   ---	Old_age   Always	   -	   0										
187 Reported_Uncorrect	  0x0032   100   100   000	Old_age   Always	   -	   0										
188 Command_Timeout		 0x0032   100   100   ---	Old_age   Always	   -	   0										
194 Temperature_Celsius	 0x0022   070   037   000	Old_age   Always	   -	   30 (Min/Max 0/37)						
199 UDMA_CRC_Error_Count	0x0032   100   100   ---	Old_age   Always	   -	   0										
230 Unknown_SSD_Attribute   0x0032   100   100   000	Old_age   Always	   -	   0										
232 Available_Reservd_Space 0x0033   100   100   005	Pre-fail  Always	   -	   100										
233 Media_Wearout_Indicator 0x0032   100   100   ---	Old_age   Always	   -	   0										
234 Unknown_Attribute	   0x0032   100   100   000	Old_age   Always	   -	   19										
241 Total_LBAs_Written	  0x0030   100   100   000	Old_age   Offline	  -	   5										
242 Total_LBAs_Read		 0x0030   100   100   000	Old_age   Offline	  -	   2										
244 Unknown_Attribute	   0x0032   000   100   ---	Old_age   Always	   -	   0										
																																
SMART Error Log Version: 1																										
No Errors Logged																												
																																
SMART Self-test log structure revision number 1																					
No self-tests have been logged.  [To run self-tests, use: smartctl -t]															
																																
Selective Self-tests/Logging not supported


Also, I did run a complete pass of memtest86+, which did not report any errors.

I have 6 x WD Red 4TB (the X11SSM-F has 8 sata ports, but the first two also support Sata DOM, I have the boot device (Sandisk Plus SSD) connected to one of those, this should be fine, right (It is also connected to power)?

Another side note: I only tried installing in UEFI, maybe I should try to install via legacy BIOS (although I don't think this should make a difference...)?

Any ideas?
 
Last edited by a moderator:

Nick2253

Wizard
Joined
Apr 21, 2014
Messages
1,633
The immediate and obvious assumption from what you've provided is that your SSD is bad. What happens if you install FreeNAS on a USB device, or try a different SSD?
 

strangelove

Dabbler
Joined
Dec 27, 2017
Messages
14
The immediate and obvious assumption from what you've provided is that your SSD is bad. What happens if you install FreeNAS on a USB device, or try a different SSD?

Just installed FreeNAS on a USB device and it is working. Unfortunatelly, I don't have another SSD which I can use. But I think I am just going to RMA the SSD.
 

strangelove

Dabbler
Joined
Dec 27, 2017
Messages
14
Quick update: I just realized the beep sound of the mainboard during POST (which I completely ignored before).

I observed, that the pattern is not always the same. For example, while I have an USB device plugged in, the pattern is 4 short beeps followed by 1 beep sound a little bit later. According to this article https://www.computerhope.com/beep.htm and assuming that I have a AMI mainboard (Supermicro X11SSM-F), four short beep sounds indicate a system time failure. However, when I unplug the USB device and reboot the computer, I observe only 2 short beep sounds followed by 1 more beep sound later. Does this make any sense at all?
 
Last edited:

Nick2253

Wizard
Joined
Apr 21, 2014
Messages
1,633

strangelove

Dabbler
Joined
Dec 27, 2017
Messages
14
I guess it makes sense to look in the mainboard manual first, before consulting the rest of the internet :p. Anyways, to my understanding the beeps during POST indicate a connected USB device. So that's why it changes, once I disconnect (two) external USB devices. The remaining two beeps probably correspond to the front panel USB connections (case has two of those). And the last beep (that appears delayed), indicates that the system is ready to boot. So nothing I have to worry about :).

In the meantime, I managed to replace the SSD with a new one, and the critical error (due to data corruption) from the boot device has disappeared since. Thanks, once again for the kind support.
 
Last edited:

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
@strangelove So you had an actualy SSD failure? I'm curious why you had this failure. Did you ever try BIOS/Legacy installation just to rule UEFI out as a problem? If you still have the SSD, are you planning to run any testing on it?

I'm asking these questions just in case there is something unique to this hardware causing the error you experienced or if the SSD is just bad. As yiou know we do recommend parts for people and also warn about parts not to use so if this is a part they should not use, it would be nice to educate our community on that. Of course if you replaced the SSD with an identical model then that of course is not the issue.

Thanks
 

strangelove

Dabbler
Joined
Dec 27, 2017
Messages
14
@strangelove So you had an actualy SSD failure? I'm curious why you had this failure. Did you ever try BIOS/Legacy installation just to rule UEFI out as a problem? If you still have the SSD, are you planning to run any testing on it?

I'm asking these questions just in case there is something unique to this hardware causing the error you experienced or if the SSD is just bad. As yiou know we do recommend parts for people and also warn about parts not to use so if this is a part they should not use, it would be nice to educate our community on that. Of course if you replaced the SSD with an identical model then that of course is not the issue.

Thanks

I just returned the SSD and got a new one (of the same model) as a replacemnt. The only thing I tried (prior returning it), was installing FreeNAS on a USB thumbdrive rather than the SSD. Since this seemed to have resolved the issue regarding data corruption on the boot pool, and I already tested the memory (with one pass of memtest86+), I did not further investigate what the actual issue was with the SSD. In retrospect, I would probably try to run some more tests, which would also allow me to share any insights here... But unfortunatelly, I already returned the "broken" SSD.

However, as of now I am not able to gain any more insights in what the actual error was. The only thing I can say for sure, is that I did not change anything else besides replacing the SSD (even using the exact same SATA connector, etc.) and now the system seems to run smooth and stable. I also did not try installing in legacy BIOS mode (but right now UEFI is working, so we can kind of rule it out as a cause, I guess).
 
Last edited:

Nick2253

Wizard
Joined
Apr 21, 2014
Messages
1,633
with one pass of memtest86+
For future reference, one pass of a memory test is insufficient. I would consider 3 passes a bare minimum for general purpose testing, and 5 passes minimally acceptable for something critical like a data server.

Sandisk Plus SSD as boot device
Just to provide a datapoint on these drives, I use them pretty widely in my home servers without any issues (including Dell, Supermicro, ASRock, and HP motherboards).
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I just returned the SSD and got a new one (of the same model) as a replacemnt.
Sounds good. I was just hoping it wasn't some odd SSD firmware thing going on. Yours is the first SSD that had basically an instant failure and could be accessed and could be written to. Odd failure for sure but glad to know the fix was easy.
 
Status
Not open for further replies.
Top