SOLVED FreeNAS-9.10.2-u1 error getting available space 2 SSDs

Status
Not open for further replies.

maul0r

Dabbler
Joined
Jul 15, 2013
Messages
13
Hey there!

I am using a HP Proliant Microserver G8 for FreeNAS 9.10 at the moment. After I, shame on me, had to hardpower down the server, on boot I noticed that I have a problem with my SSDs. I am getting "error getting available space" for my SSDs.

I was wondering if there is anything I can do about that?

Code:
[root@nas ~]# zpool status																										
  pool: freenas-boot																												
state: ONLINE																													
  scan: scrub repaired 0 in 0h3m with 0 errors on Wed Aug 30 03:48:48 2017														
config:																															
																																	
		NAME		STATE	 READ WRITE CKSUM																					
		freenas-boot  ONLINE	   0	 0	 0																					
		  da0p2	 ONLINE	   0	 0	 0																					
																																	
errors: No known data errors																										
																																	
  pool: vol1																														
state: ONLINE																													
  scan: scrub repaired 0 in 6h23m with 0 errors on Sun Sep 17 06:23:59 2017														
config:																															
																																	
		NAME										  STATE	 READ WRITE CKSUM													
		vol1										  ONLINE	   0	 0	 0													
		  gptid/65e0fc56-3e3b-11e2-b6ef-50465d4eba36  ONLINE	   0	 0	 0													
		  gptid/66574a88-3e3b-11e2-b6ef-50465d4eba36  ONLINE	   0	 0	 0													
																																	
errors: No known data errors																										
[root@nas ~]# camcontrol devlist																									
<Samsung SSD 840 EVO 500GB EXT0BB6Q>  at scbus0 target 0 lun 0 (pass0,ada0)														
<KINGSTON SV300S37A480G 603ABBF0>  at scbus1 target 0 lun 0 (pass1,ada1)															
<WDC WD30EFRX-68AX9N0 80.00A80>	at scbus2 target 0 lun 0 (pass2,ada2)															
<WDC WD30EFRX-68AX9N0 80.00A80>	at scbus3 target 0 lun 0 (pass3,ada3)															
<SanDisk Cruzer Fit 1.27>		  at scbus7 target 0 lun 0 (pass4,da0)															
																										



disk.png

error.png
 
Last edited by a moderator:

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Do you think this message could have been there before and you missed it?
Are you missing any of your pools?
What are the SSDs used for?
Have you tried to shutdown and then power it back on? (not reboot)
 

maul0r

Dabbler
Joined
Jul 15, 2013
Messages
13
Hi! Thanks for your answer.
I wouldn't bet on it - it could be. But then it must have happened just before. I am fairly certain I wouldn't have missed it because it's my VM Storage.

zpool status doesn't report the pool anymore. I have two pools, vol1 (HDDs) and SSD (Guess :) )

I am going to try that right now.

I just noticed the disks show up in the volume manager (I believe that means the data on it died?)

shit.png


It wouldn't matter that much - just some VMs that I was in the process of building up... I am just wondering how it happened to avoid it next time.
 
Last edited:

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Please post your images on the forums, not on a remote website. This is for everyone's protection. Many of us do not like to click on links to these sites, even if they are well known. Someone could create a website of thier own and just have the URL having swapped two letters and BAM, we are infected. The link you provided was not a screen capture, it was an ad to use that website. Maybe it didn't show up because I'm in the US, I don't know.

Anyway, I doubt your data is gone so don't do anything to mess with the pool like forcing it to mount or recreating it unless someone specifically tells you too. The good thing is it looks like your two drives are operational.

If rebooting it fails to work then the next step that I'd do is find a spare USB Flash drive and do the following:
1) Shutdown your machine.
2) Disconnect your two hard drives SATA cables.
3) Replace the USB Flash drive with a new one and install a clean copy of FreeNAS on it.
4) Once your system boots up, abort the automatic setup.
5) Now try to Import Volume and cross your fingers it works.
6) If this works, run a Scrub on the volume.
7) After the scrub you can shut the system down and reconnect your other two hard drives.
8) Power up your system and restore your configuration file that we tell everyone to ensure they retain. A copy from just before the failure would be best but you can retrieve a copy from the old USB Flash drive, maybe. If you cannot restore your configuration file then Import Volume again to add your HDDs pool. And then you need to finish setting up FreeNAS.

I actually hope that just rebooting the system fixes it for you, easy is good.
 

maul0r

Dabbler
Joined
Jul 15, 2013
Messages
13
Thanks for your reply.

I tried the reboot - unfortunately that didn't work. I went back to the forum in the morning and noticed I couldn't reach it. So I tried the following:

I disconnected the drives and deleted the remnants of the SSD Pool from my FreeNAS. I then connected them again, and I saw my SSD pool. However, I couldn't import it.

I did manage to import the pool on the console though.
Code:
zpool import 628557363397984385


I got this message
Code:
For the delegated permission list, run: zfs allow|unallow																		 
[root@nas ~]# zpool import																										 
   pool: SSD																														
	 id: 628557363397984385																										
  state: ONLINE																													
action: The pool can be imported using its name or numeric identifier.															
config:																															
																																  
		SSD										   ONLINE																		
		  gptid/9fe0b862-ad10-11e6-858d-d0bf9c451910  ONLINE																		
		  gptid/a00ffee6-ad10-11e6-858d-d0bf9c451910  ONLINE																		
[root@nas ~]# zpool import 628557363397984385																					 
																																  
cannot import 'SSD': I/O error																									 
		Recovery is possible, but will result in some data loss.																	
		Returning the pool to its state as of Mon Sep 18 22:36:49 2017															 
		should correct the problem.  Approximately 15 seconds of data															 
		must be discarded, irreversibly.  After rewind, several																	
		persistent user-data errors will remain.  Recovery can be attempted														
		by executing 'zpool import -F SSD'.  A scrub of the pool																	
		is strongly recommended after recovery. 


Then then did
Code:
zpool import -F SSD


I then started scrubbing:
Code:
zpool scrub SSD


I noticed that during that process the "CKSUM" Value kept increasing.

Code:
[root@nas ~]# zpool status SSD																									
  pool: SSD																														
state: DEGRADED																													
status: One or more devices has experienced an error resulting in data															
		corruption.  Applications may be affected.																				
action: Restore the file in question if possible.  Otherwise restore the															
		entire pool from backup.																									
   see: http://illumos.org/msg/ZFS-8000-8A																						
  scan: scrub repaired 0 in 0h48m with 62 errors on Sat Sep 23 11:09:26 2017														
config:																															
																																	
		NAME										  STATE	 READ WRITE CKSUM													
		SSD										   DEGRADED	 0	 0	62													
		  gptid/9fe0b862-ad10-11e6-858d-d0bf9c451910  DEGRADED	 0	 0   124  too many errors								  
		  gptid/a00ffee6-ad10-11e6-858d-d0bf9c451910  ONLINE	   0	 0	 0													
																																	
errors: 1 data errors, use '-v' for a list	


Code:
[root@nas ~]# zpool status -v SSD																								  
  pool: SSD																														
state: DEGRADED																													
status: One or more devices has experienced an error resulting in data															
		corruption.  Applications may be affected.																				
action: Restore the file in question if possible.  Otherwise restore the															
		entire pool from backup.																									
   see: http://illumos.org/msg/ZFS-8000-8A																						
  scan: scrub repaired 0 in 0h48m with 62 errors on Sat Sep 23 11:09:26 2017														
config:																															
																																	
		NAME										  STATE	 READ WRITE CKSUM													
		SSD										   DEGRADED	 0	 0	62													
		  gptid/9fe0b862-ad10-11e6-858d-d0bf9c451910  DEGRADED	 0	 0   124  too many errors								  
		  gptid/a00ffee6-ad10-11e6-858d-d0bf9c451910  ONLINE	   0	 0	 0													
																																	
errors: Permanent errors have been detected in the following files:																
																																	
		SSD/SSD:<0x1>


I can try your method this afternoon and I will report back!
 
Last edited by a moderator:

maul0r

Dabbler
Joined
Jul 15, 2013
Messages
13
So i tried what you asked. The pool imported and and at first showed healthy but became degraded quickly.

here is the latest zpool status:
Code:
[root@freenas ~]# zpool status -v																								   
  pool: SSD																														 
state: DEGRADED																													
status: One or more devices has experienced an error resulting in data															 
		corruption.  Applications may be affected.																				 
action: Restore the file in question if possible.  Otherwise restore the															
		entire pool from backup.																									
   see: http://illumos.org/msg/ZFS-8000-8A																						 
  scan: scrub repaired 0 in 0h50m with 62 errors on Sat Sep 23 14:45:41 2017														
config:																															 
																																	
		NAME										  STATE	 READ WRITE CKSUM													
		SSD										   DEGRADED	 0	 0	62													
		  gptid/9fe0b862-ad10-11e6-858d-d0bf9c451910  DEGRADED	 0	 0   124  too many errors								   
		  gptid/a00ffee6-ad10-11e6-858d-d0bf9c451910  ONLINE	   0	 0	 0													
																																	
errors: Permanent errors have been detected in the following files:																 
																																	
		SSD/SSD:<0x1> 


what can I do now?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Since you were able to import the pool, is your data still available? If yes then copy your data off and then destroy your pool and recreate it then place you data back on the pool.
 

maul0r

Dabbler
Joined
Jul 15, 2013
Messages
13
I think it is. Well at least it shows the used capacity. Problem is, that this pool is a VMFS Datastore. I cannot mount it - I am guessing this is because it is degraded. Any tip how I could move the Data over to my vol1 (HDD Pool) so that I can recreate the set? Would that even solve the CKSUM errors?

I ran a smart test. I still think there might be a problem with that Kingston Disk.

Samsung:
Code:
=== START OF INFORMATION SECTION ===
Model Family:	 Samsung based SSDs
Device Model:	 Samsung SSD 840 EVO 500GB
Serial Number:	S1DHNSAF780213X
LU WWN Device Id: 5 002538 8a05c4619
Firmware Version: EXT0BB6Q
User Capacity:	500,107,862,016 bytes [500 GB]
Sector Size:	  512 bytes logical/physical
Rotation Rate:	Solid State Device
Device is:		In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4c
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:	Sun Sep 24 14:13:54 2017 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   010	Pre-fail  Always	   -	   0
  9 Power_On_Hours		  0x0032   097   097   000	Old_age   Always	   -	   13750
12 Power_Cycle_Count	   0x0032   099   099   000	Old_age   Always	   -	   635
177 Wear_Leveling_Count	 0x0013   098   098   000	Pre-fail  Always	   -	   22
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010	Pre-fail  Always	   -	   0
181 Program_Fail_Cnt_Total  0x0032   100   100   010	Old_age   Always	   -	   0
182 Erase_Fail_Count_Total  0x0032   100   100   010	Old_age   Always	   -	   0
183 Runtime_Bad_Block	   0x0013   100   100   010	Pre-fail  Always	   -	   0
187 Uncorrectable_Error_Cnt 0x0032   100   100   000	Old_age   Always	   -	   0
190 Airflow_Temperature_Cel 0x0032   066   044   000	Old_age   Always	   -	   34
195 ECC_Error_Rate		  0x001a   200   200   000	Old_age   Always	   -	   0
199 CRC_Error_Count		 0x003e   100   100   000	Old_age   Always	   -	   0
235 POR_Recovery_Count	  0x0012   099   099   000	Old_age   Always	   -	   120
241 Total_LBAs_Written	  0x0032   099   099   000	Old_age   Always	   -	   30777666890

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline	   Completed without error	   00%		47		 -
# 2  Extended offline	Completed without error	   00%		25		 -
# 3  Extended offline	Completed without error	   00%	   132		 -
# 4  Extended offline	Completed without error	   00%		36		 -
# 5  Extended offline	Completed without error	   00%	   120		 -
# 6  Extended offline	Completed without error	   00%		14		 -
# 7  Extended offline	Completed without error	   00%	   131		 -
# 8  Extended offline	Completed without error	   00%		 2		 -
# 9  Extended offline	Completed without error	   00%		 6		 -
#10  Extended offline	Completed without error	   00%		 1		 -
#11  Extended offline	Completed without error	   00%		 7		 -
#12  Extended offline	Completed without error	   00%	   123		 -
#13  Extended offline	Completed without error	   00%		27		 -
#14  Extended offline	Completed without error	   00%	  1513		 -
#15  Extended offline	Completed without error	   00%	  1345		 -
#16  Extended offline	Completed without error	   00%	  1178		 -
#17  Extended offline	Completed without error	   00%	  1033		 -
#18  Extended offline	Completed without error	   00%	  1009		 -


Kingston
Code:
smartctl -a /dev/ada01

smartctl 6.5 2016-05-07 r4318 [FreeBSD 11.0-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:	 SandForce Driven SSDs
Device Model:	 KINGSTON SV300S37A480G
Serial Number:	50026B725501F24F
LU WWN Device Id: 5 0026b7 25501f24f
Firmware Version: 603ABBF0
User Capacity:	480,103,981,056 bytes [480 GB]
Sector Size:	  512 bytes logical/physical
Rotation Rate:	Solid State Device
Device is:		In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS, ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:	Sun Sep 24 14:13:57 2017 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED



SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x0032   095   095   050	Old_age   Always	   -	   0/213557719
  5 Retired_Block_Count	 0x0033   100   100   003	Pre-fail  Always	   -	   0
  9 Power_On_Hours_and_Msec 0x0032   100   100   000	Old_age   Always	   -	   32h+39m+50.680s
12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   2
171 Program_Fail_Count	  0x000a   100   100   000	Old_age   Always	   -	   0
172 Erase_Fail_Count		0x0032   100   100   000	Old_age   Always	   -	   0
174 Unexpect_Power_Loss_Ct  0x0030   000   000   000	Old_age   Offline	  -	   1
177 Wear_Range_Delta		0x0000   000   000   000	Old_age   Offline	  -	   1
181 Program_Fail_Count	  0x000a   100   100   000	Old_age   Always	   -	   0
182 Erase_Fail_Count		0x0032   100   100   000	Old_age   Always	   -	   0
187 Reported_Uncorrect	  0x0012   100   100   000	Old_age   Always	   -	   0
189 Airflow_Temperature_Cel 0x0000   030   045   000	Old_age   Offline	  -	   30 (Min/Max 14/45)
194 Temperature_Celsius	 0x0022   030   045   000	Old_age   Always	   -	   30 (Min/Max 14/45)
195 ECC_Uncorr_Error_Count  0x001c   120   120   000	Old_age   Offline	  -	   0/213557719
196 Reallocated_Event_Count 0x0033   100   100   003	Pre-fail  Always	   -	   0
201 Unc_Soft_Read_Err_Rate  0x001c   120   120   000	Old_age   Offline	  -	   0/213557719
204 Soft_ECC_Correct_Rate   0x001c   120   120   000	Old_age   Offline	  -	   0/213557719
230 Life_Curve_Status	   0x0013   100   100   000	Pre-fail  Always	   -	   100
231 SSD_Life_Left		   0x0000   099   099   011	Old_age   Offline	  -	   21474836481
233 SandForce_Internal	  0x0032   000   000   000	Old_age   Always	   -	   6
234 SandForce_Internal	  0x0032   000   000   000	Old_age   Always	   -	   2
241 Lifetime_Writes_GiB	 0x0032   000   000   000	Old_age   Always	   -	   2
242 Lifetime_Reads_GiB	  0x0032   000   000   000	Old_age   Always	   -	   407
244 Unknown_Attribute	   0x0000   100   100   010	Old_age   Offline	  -	   2883598

SMART Error Log not supported

SMART Self-test log structure revision number 1
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline	   Completed without error	   00%		47		 -
# 2  Extended offline	Completed without error	   00%		26		 -
# 3  Extended offline	Completed without error	   00%	   133		 -
# 4  Extended offline	Completed without error	   00%		37		 -
# 5  Extended offline	Completed without error	   00%	   121		 -
# 6  Extended offline	Interrupted (host reset)	  10%	   240		 -
# 7  Extended offline	Completed without error	   00%	   133		 -
# 8  Extended offline	Completed without error	   00%		84		 -
# 9  Extended offline	Completed without error	   00%		 8		 -
#10  Extended offline	Completed without error	   00%		 3		 -
#11  Extended offline	Interrupted (host reset)	  10%		13		 -
#12  Extended offline	Completed without error	   00%	  3099		 -
#13  Extended offline	Completed without error	   00%	  3003		 -
#14  Extended offline	Completed without error	   00%	  2836		 -
#15  Extended offline	Completed without error	   00%	  2668		 -
#16  Extended offline	Completed without error	   00%	  2501		 -
#17  Extended offline	Completed without error	   00%	  2357		 -
#18  Extended offline	Completed without error	   00%	  2333		 -



Edit: I am going to wait for tomorrow - but all these VM's are easily recreatable. If there is no way it won't be a drama. I am more concerned about the disk - because if I recreate the pool I wouldn't really want this to happen again.

Btw: Thanks for staying with me and helping me out!
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I don't see anything wrong with either SSD except the SMART Self-test logs, they both list the hours in a very weird order. They are not organized properly by runtime hours. Weird.

As for grabbing your data... If this is a VMFS datastore then you should be able to access the datastore from VMWare and then copy the VMs to your HDD pool. After that I'd destroy the SSD pool and recreate it. If you feel better you could run a Secure Erase on the SSDs and then create a new Mirror pair. Using the Secure Erase will force the SSD to erase all data on the SSD. As for how to do this, sorry, I don't have the time to look that up, but Google is your friend. And I don't know if you can do that from FreeNAS/FreeBSD, you may need to use Windoze or Ubuntu. But you don't have to do the secure erase, but since you question your SSDs, I'd recommend it.

Please post what you end up doing.
 

maul0r

Dabbler
Joined
Jul 15, 2013
Messages
13
Quick Update - not done yet but getting somewhere.

ESX wasn't able to mount the degraded datastore, but the esxcli was.

I found the drive like this:
Code:
[root@esxi01:/dev/disks] esxcfg-volume -l

Scanning for VMFS-3/VMFS-5 host activity (512 bytes/HB, 2048 HBs).

VMFS UUID/label: 582e298c-ce173b02-c525-0025905dc9bc/SSD Datastore

Can mount: Yes

Can resignature: Yes

Extent name: naa.6589cfc000000a27f77d47b6330cbcab:1	range: 0 - 818943 (MB)


and mounted like this:
Code:
[root@esxi01:/dev/disks] esxcfg-volume -m 582e298c-ce173b02-c525-0025905dc9bc Datastore

Mounting volume 582e298c-ce173b02-c525-0025905dc9bc


Now I noticed that my NFS Volumes I'd like to copy to apparently are not writable. So I am troubleshooting NFS now. Will report back..
 

maul0r

Dabbler
Joined
Jul 15, 2013
Messages
13
Issue solved - I am moving my data now. Very Slow - about 80mbps but thats another issue... Thank you for your help!
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Glad that you are discovering some solutions on your own. We like people who take initiative.
 
Status
Not open for further replies.
Top