Drive Failure, won't boot unless I remove the drive

Status
Not open for further replies.

stupes

Dabbler
Joined
Oct 4, 2018
Messages
25
Hi, I really hope someone can help. I have a freenas 9.3 zfs raid array of 3x 2TB disks. 1 of these has developed a UNC failure which will not let me boot past it. See attached. If I remove the disk and boot the remaining disks show in error state, and do not give me any options when I select them. In fact they say something to the effect of Error and "Cannot get drive size".

I feel confident that because the disks are raided that I probably have not lost data just have no idea what to do to bring my volume back on line.

Any pointers greatly welcomed.
Stu
 

Attachments

  • IMG_7438.JPG
    IMG_7438.JPG
    260.1 KB · Views: 426

garm

Wizard
Joined
Aug 19, 2017
Messages
1,556
what is the output of zpool status -v (in [ CODE][/ CODE] tags) if you boot without the failed drive?
 

stupes

Dabbler
Joined
Oct 4, 2018
Messages
25
Apologies, I have an update. After 3 hours the device booted with all 3 drives attached.
Here is that output, I am a bit unsure what to do next. Is my drive dead? or Does it need reformatted or something? TIA:
Code:
[root@stu-nas1 ~]# zpool status -v																								 
  pool: data1																													   
 state: ONLINE																													 
status: One or more devices has experienced an error resulting in data															 
	   corruption.  Applications may be affected.																				 
action: Restore the file in question if possible.  Otherwise restore the															
	   entire pool from backup.																									
   see: http://illumos.org/msg/ZFS-8000-8A																						 
  scan: scrub repaired 0 in 107h24m with 219 errors on Thu Oct  4 11:24:52 2018													 
config:																															 
																																   
	   NAME										  STATE	 READ WRITE CKSUM													
	   data1										 ONLINE	 127	 0	 0													
		 gptid/57bba673-0f46-11e4-a241-000ea6a4c176  ONLINE	 127	 0	 0													
		 gptid/585f2dcd-0f46-11e4-a241-000ea6a4c176  ONLINE	   0	 0	 0													
																																   
errors: Permanent errors have been detected in the following files:																 
																																   
	   /mnt/data1/backups/desktop/DESKTOP-3R1HTA8/Backup Set 2018-09-14 000003/Backup Files 2018-09-14 000003/Backup files 85.zip 
	   /mnt/data1/backups/desktop/DESKTOP-3R1HTA8/Backup Set 2018-09-14 000003/Backup Files 2018-09-14 000003/Backup files 86.zip 
	   /mnt/data1/backups/desktop/DESKTOP-3R1HTA8/Backup Set 2018-09-14 000003/Backup Files 2018-09-14 000003/Backup files 87.zip 
	   /mnt/data1/backups/desktop/DESKTOP-3R1HTA8/Backup Set 2018-09-14 000003/Backup Files 2018-09-14 000003/Backup files 88.zip 
	   /mnt/data1/backups/desktop/DESKTOP-3R1HTA8/Backup Set 2018-09-14 000003/Backup Files 2018-09-14 000003/Backup files 89.zip 
	   /mnt/data1/backups/desktop/DESKTOP-3R1HTA8/Backup Set 2018-09-14 000003/Backup Files 2018-09-14 000003/Backup files 102.zip
	   data1@auto-20180917.0900-2w:/backups/desktop/DESKTOP-3R1HTA8/Backup Set 2018-09-14 000003/Backup Files 2018-09-14 000003/Bac
kup files 85.zip																													
	   data1@auto-20180917.0900-2w:/backups/desktop/DESKTOP-3R1HTA8/Backup Set 2018-09-14 000003/Backup Files 2018-09-14 000003/Bac
kup files 86.zip																													
	   data1@auto-20180917.0900-2w:/backups/desktop/DESKTOP-3R1HTA8/Backup Set 2018-09-14 000003/Backup Files 2018-09-14 000003/Bac
kup files 87.zip																													
	   data1@auto-20180917.0900-2w:/backups/desktop/DESKTOP-3R1HTA8/Backup Set 2018-09-14 000003/Backup Files 2018-09-14 000003/Bac
kup files 88.zip																													
	   data1@auto-20180917.0900-2w:/backups/desktop/DESKTOP-3R1HTA8/Backup Set 2018-09-14 000003/Backup Files 2018-09-14 000003/Bac
kup files 89.zip																													
	   data1@auto-20180917.0900-2w:/backups/desktop/DESKTOP-3R1HTA8/Backup Set 2018-09-14 000003/Backup Files 2018-09-14 000003/Bac
kup files 102.zip																												   
																																   
  pool: freenas-boot																												
 state: ONLINE																													 
  scan: scrub repaired 0 in 0h1m with 0 errors on Thu Aug 30 03:46:24 2018														 
config:																															 
																																   
	   NAME		STATE	 READ WRITE CKSUM																					 
	   freenas-boot  ONLINE	   0	 0	 0																					
		 ada3p2	ONLINE	   0	 0	 0																					 
																																   
errors: No known data errors
 

stupes

Dabbler
Joined
Oct 4, 2018
Messages
25
I have now removed those files and will run a SMART test I think, as the errors seem to have cleared. Thanks for your help
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
The output shows only 2 drives in your pool, so be extremely careful about what you do next... you have no redundancy at this point if that output is right.

You should take a separate backup of the data on those disks (seems one of the remaining 2 also has read errors) and use SMART data to figure out if you just need to replace them all or not (I certainly would if it was me).
 

stupes

Dabbler
Joined
Oct 4, 2018
Messages
25
Thanks for this, it seems I have 3 disks but only added 2 of them to the pool back in the day. I am fairly happy with my backups, so happy to proceed, would a good thing to do be to add a 3rd disk to the pool and monitor? I have already ordered a 4th though I suspect the specs will be slightly better than the old 3.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
You are not done. The system still has significant problems. Your data is at risk.

Sent from my SAMSUNG-SGH-I537 using Tapatalk
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
So, what you have seems to be a 2 disk striped pool (which can't be RAIDZ1... and anyway couldn't be expanded even if it were).

2 more disks could be set up as a new mirrored pool and you could consider either a second mirrored pool (if you still trust the disks you have), or extending the pool with an additional mirror VDEV (if you really trust the disks you have).

It looks like you were lucky to have gotten away with such a small amount of data loss in this case... you should consider if that would have been a problem or not if you lost it all (which it seems nearly happened and may yet happen).
 

stupes

Dabbler
Joined
Oct 4, 2018
Messages
25
So I think you are saying I should set up "data2" using my spare disk and the disk I have on order, then copy over my data and retire the 2 current disks. Sound like the best solution?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
… or replace the failed disk with my spare disk?
There's no point replacing the failed disk, as all indications are that it wasn't doing anything anyway--it certainly wasn't part of your data1 pool. Best answer:
  • Get two more disks, each large enough to hold all the data on your pool.
  • Create a mirrored pool from those two disks
  • Move your data onto that mirrored pool
  • Destroy the pool on your current two disks
  • Add those two disks to your new pool as a second mirrored vdev
Except for step 3, all can be done through the GUI. However, the new pool will have a new name, so all your shares and anything else that uses a path on your pool will have to be reconfigured. Alternative answer:
  • Get two more disks, each at least the same size as your existing two disks (they can be larger, but you'll be wasting that additional space)
  • Add each as a mirror of one of your existing disks
Fewer steps, no paths need to change, but step 2 involves a bit of fiddling at the CLI to do it right (as the devs still haven't gotten around to adding this ability in the GUI).
 

stupes

Dabbler
Joined
Oct 4, 2018
Messages
25
Thank you for this. To be clear. The current situation is …

- that I have a Vdev as part of the Data1 zpool that has 2 disks one of which is showing signs of age I think.
- I have another (identical) disk which is ok, which is on the system but not used.
- I have another slightly higher perfroamnce disk on order arriving tomorrow.

I am not entirely sure I understand the alternative method shown above. Could you expand a little please.
That is to say how can you set up mirror disks on disks that make up an existing vdev?

Another complication I will have is that I only have room for 3 data disks in my system at a time which is annoying as I would need 4, but I am sure I can finesse this some how.
 

stupes

Dabbler
Joined
Oct 4, 2018
Messages
25
I don't entirely understand why the dodgy files have reappeared when I thought I deleted them. According to the filesystem they do not exist. Perhaps they will disappear once the scrub has completed

Code:
[root@stu-nas1 ~]# zpool status -v																								
  pool: data1																													 
 state: ONLINE																													
status: One or more devices has experienced an error resulting in data															
	   corruption.  Applications may be affected.																				
action: Restore the file in question if possible.  Otherwise restore the															
	   entire pool from backup.																									
   see: http://illumos.org/msg/ZFS-8000-8A																						
  scan: scrub in progress since Thu Oct  4 13:18:54 2018																			
	   601G scanned out of 2.34T at 262M/s, 1h56m to go																			
	   0 repaired, 25.10% done																									
config:																															
																																 
	   NAME										  STATE	 READ WRITE CKSUM													
	   data1										 ONLINE	   0	 0	 0													
		 gptid/57bba673-0f46-11e4-a241-000ea6a4c176  ONLINE	   0	 0	 0													
		 gptid/585f2dcd-0f46-11e4-a241-000ea6a4c176  ONLINE	   0	 0	 0													
																																 
errors: Permanent errors have been detected in the following files:																
																																 
	   data1:<0x72208>																											
	   data1:<0x7220e>																											
	   data1:<0x72214>																											
	   data1:<0x7221a>																											
	   data1:<0x72220>																											
	   data1:<0x7226e>																											
	   data1@auto-20180917.0900-2w:/backups/desktop/DESKTOP-3R1HTA8/Backup Set 2018-09-14 000003/Backup Files 2018-09-14 000003/Bac
kup files 85.zip																													
	   data1@auto-20180917.0900-2w:/backups/desktop/DESKTOP-3R1HTA8/Backup Set 2018-09-14 000003/Backup Files 2018-09-14 000003/Bac
kup files 86.zip																													
	   data1@auto-20180917.0900-2w:/backups/desktop/DESKTOP-3R1HTA8/Backup Set 2018-09-14 000003/Backup Files 2018-09-14 000003/Bac
kup files 87.zip																													
	   data1@auto-20180917.0900-2w:/backups/desktop/DESKTOP-3R1HTA8/Backup Set 2018-09-14 000003/Backup Files 2018-09-14 000003/Bac
kup files 88.zip																													
	   data1@auto-20180917.0900-2w:/backups/desktop/DESKTOP-3R1HTA8/Backup Set 2018-09-14 000003/Backup Files 2018-09-14 000003/Bac
kup files 89.zip																													
	   data1@auto-20180917.0900-2w:/backups/desktop/DESKTOP-3R1HTA8/Backup Set 2018-09-14 000003/Backup Files 2018-09-14 000003/Bac
kup files 102.zip																												 
																																 
  pool: freenas-boot																												
 state: ONLINE																													
  scan: scrub repaired 0 in 0h1m with 0 errors on Thu Aug 30 03:46:24 2018														
config:																															
																																 
	   NAME		STATE	 READ WRITE CKSUM																					
	   freenas-boot  ONLINE	   0	 0	 0																					
		 ada3p2	ONLINE	   0	 0	 0																					
																																 
errors: No known data errors
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Looks like those are snapshots. You can destroy the snapshots now or wait for them to expire.

In any case, things don't look good for your data... make a copy ASAP.
 

stupes

Dabbler
Joined
Oct 4, 2018
Messages
25
Thanks I have a copy - multiple actually. Those files are a backup of a home PC that runs weekly. I have deleted that one from the filesystem, which is why I was surprised to see it reported.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Thank you for this. To be clear. The current situation is …

- that I have a Vdev as part of the Data1 zpool that has 2 disks one of which is showing signs of age I think.
- I have another (identical) disk which is ok, which is on the system but not used.
- I have another slightly higher perfroamnce disk on order arriving tomorrow.
You don't understand your own situation and it sounds like you are not listening to what you are being told.
 

stupes

Dabbler
Joined
Oct 4, 2018
Messages
25
I apologise if it sounds like that Chris, I am getting many useful messages from many different folk and struggling to put them together. I have been unable to find a consistent set of instructions or manual-ware that I can work through, hence regrettably am having to throw myself at your mercy. Previously I worked with simple samba shares in Debian and so am not used to the added complexity it seems.
Can you explain which bits you think I am not getting? or understanding? For instance This message below was particularly useful but I wanted to get some clarification.

There's no point replacing the failed disk, as all indications are that it wasn't doing anything anyway--it certainly wasn't part of your data1 pool. Best answer:
  • Get two more disks, each large enough to hold all the data on your pool.
  • Create a mirrored pool from those two disks
  • Move your data onto that mirrored pool
  • Destroy the pool on your current two disks
  • Add those two disks to your new pool as a second mirrored vdev
Except for step 3, all can be done through the GUI. However, the new pool will have a new name, so all your shares and anything else that uses a path on your pool will have to be reconfigured. Alternative answer:
  • Get two more disks, each at least the same size as your existing two disks (they can be larger, but you'll be wasting that additional space)
  • Add each as a mirror of one of your existing disks
Fewer steps, no paths need to change, but step 2 involves a bit of fiddling at the CLI to do it right (as the devs still haven't gotten around to adding this ability in the GUI).

I really need to understand my options better at this point, and really appreciate your help.
I have 3 sata ports, and 4 drives (by tomorrow) one of which is used by the faulty drive.
What steps do I take next?
- Replace the existing faulty drive? (I am getting some sort of rollbar exception see below when I try to do this currently - see attached),
- or create a new volume (in which case can I create it on one drive and then extend it once the data is copied over? or Do I find a way to reduce the current volume to a single drive and set up a new 2 drive volume?)
 

Attachments

  • grab.jpg
    grab.jpg
    352.8 KB · Views: 406

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
I am not entirely sure I understand the alternative method shown above. Could you expand a little please.
That is to say how can you set up mirror disks on disks that make up an existing vdev?

Another complication I will have is that I only have room for 3 data disks in my system at a time which is annoying as I would need 4, but I am sure I can finesse this some how.
Basically, your NAS is improperly configured and it is on it's last leg. About to die entirely. If it is only a backup and you don't care if it dies, then no real loss. You need to reconstruct your pool because the way you have it now has no redundancy. It is a stripe set of two disks and if either of them has an error (what happened) or fails, then you can loose some (what happened this time) or all of your data.
You should probably start over and build a new NAS using proper hardware.

Hardware Requirements
http://www.freenas.org/hardware-requirements/

Did you read the manual?
http://doc.freenas.org/11/freenas.html

Updated Forum Rules 4/11/17
https://forums.freenas.org/index.php?threads/updated-forum-rules-4-11-17.45124/

Slideshow explaining VDev, zpool, ZIL and L2ARC
https://forums.freenas.org/index.ph...ning-vdev-zpool-zil-and-l2arc-for-noobs.7775/

Terminology and Abbreviations Primer
https://forums.freenas.org/index.php?threads/terminology-and-abbreviations-primer.28174/

Why not to use RAID-5 or RAIDz1
https://www.zdnet.com/article/why-raid-5-stops-working-in-2009/

FreeNAS® Quick Hardware Guide
https://forums.freenas.org/index.php?resources/freenas®-quick-hardware-guide.7/

Hardware Recommendations Guide Rev 1e) 2017-05-06
https://forums.freenas.org/index.php?resources/hardware-recommendations-guide.12/
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
PS. Only having room for 3 disks doesn't give you enough room to build a proper array of any kind.
 

stupes

Dabbler
Joined
Oct 4, 2018
Messages
25
OK thanks. BTW I have not lost my data, it appears to be there. Also as far as I was aware, I have 1 drive providing the redundancy for the other - hence why the data is still there even though one drive is misbehaving. I'll have a read through, and hopefully things will become clearer. I am open to the fact that this may be too complicated for my needs and that redundant copies on debian/samba may suit my needs.

By the sound of it best practice dictates that I offload my data while I can and rebuild my system. (Is this what you suggest?) I will check the hardware guide and see what I could manage with my current hardware. It's an old HP Proliant ML115 box with 4 sata ports and 1 IDE. 1 Sata SSD is used for the boot drive. It has been OK for 4 years, so the drives are up for replacement anyhow.
 
Status
Not open for further replies.
Top