Need help squashing some nightmare FreeNAS gremlins guys....

Status
Not open for further replies.

Fr33nasnoobi3

Dabbler
Joined
Mar 11, 2017
Messages
19
I don't even know where to begin with this FreeNAS issue that I'm having so I guess I'll start from the very beginning and hope the puzzle pieces fall into place. So a few days ago I got an alert that one of the volumes (named data2 unencrypted and no valuable data on) had a drive "disconnected" and was running in a degraded state. I thought that was very odd because this things been running flawlessly for several years aside from the 3 hard drives that have failed over time, alerts of errors on the drive> replaced drive> re-silvered everything ok again.

So I just treated this as a failed hard drive and shutdown the system, replaced the drive and brought it back up to resilver now NONE of the drives in that volume show up at all... keep in mind this is 6x4tb drives in raidz2! So I'm thinking to myself that the little 5.25 to 5x3.5" hard drive adapter (silverstone FS305) for some reason took a dump and I proceeded to remove 6 of the drives from my locked (data1) important files volume that this whole time I had not tried to unlock by the way) Just to see if my theory was right about the silverstone drive adapter dying. So I removed the drives from the data1 volume and sat them aside plugged the data2 drives into the other enclosures booted up and it saw all the drives even the one it thought became disconnected. So problem solved right? WRONG ...

I shutdown replaced the drives I took out in a new enclosure a few days later when the new enclosure came in. Booted up data2 had 2 disconnected drives this time and data1 failed to unlock :( So at this point I figured something is wrong with the install. Pulled boot drives out set them aside got a few new usb drives reloaded freenas yada yada couldn't import data1 or data2... *panic setting in* Now I only know enough about unix/linux/bsd commandline stuff to be considered mildly retarded but know enough to not do anything to bone headed to jeopardize my data. So things I Have tried... removing the enclosures completely to eliminate that possibility, swapped HBA controller cards, swapped sas to sata cables, swapped motherboards (to an identical backup motherboard because redundancy right?), swapped ram, swapped power supplies, removed hba controller cards entirely and used on-board sata connectors, swapped to new sata cables....

So now as it sits HBA cards have been removed I don't have any data i care about on data2 drives so they have been removed. I'm running on the new motherboard directly sata connected to mobo, with original boot disks and this is the error I get when I try to unlock data1


Code:
Environment:

Software Version: FreeNAS-11.1-U4 (89e3d93bc)
Request Method: POST
Request URL: http://10.10.10.103/storage/volume/1/unlock/?X-Progress-ID=a69edec6-d03c-4e0f-86c7-d2a8e103ff30


Traceback:
File "/usr/local/lib/python3.6/site-packages/django/core/handlers/exception.py" in inner
  42.			 response = get_response(request)
File "/usr/local/lib/python3.6/site-packages/django/core/handlers/base.py" in _legacy_get_response
  249.			 response = self._get_response(request)
File "/usr/local/lib/python3.6/site-packages/django/core/handlers/base.py" in _get_response
  178.			 response = middleware_method(request, callback, callback_args, callback_kwargs)
File "./freenasUI/freeadmin/middleware.py" in process_view
  162.		 return login_required(view_func)(request, *view_args, **view_kwargs)
File "/usr/local/lib/python3.6/site-packages/django/contrib/auth/decorators.py" in _wrapped_view
  23.				 return view_func(request, *args, **kwargs)
File "./freenasUI/storage/views.py" in volume_unlock
  1033.			 form.done(volume=volume)
File "./freenasUI/storage/forms.py" in done
  2822.			 raise MiddlewareError(msg)

Exception Type: MiddlewareError at /storage/volume/1/unlock/
Exception Value: [MiddlewareError: Volume could not be imported: 1 devices failed to decrypt]


Code:
[root@freenas ~]# zpool status																									
  pool: freenas-boot																												
 state: ONLINE																													
  scan: scrub repaired 0 in 0 days 00:05:23 with 0 errors on Fri Apr 27 03:50:27 2018											  
config:																															
																																  
	   NAME		STATE	 READ WRITE CKSUM																					
	   freenas-boot  ONLINE	   0	 0	 0																					
		 mirror-0  ONLINE	   0	 0	 0																					
		   da1p2   ONLINE	   0	 0	 0																					
		   da0p2   ONLINE	   0	 0	 0																					
																																  
errors: No known data errors																				


Code:
[root@freenas ~]# zpool import																									
   pool: data1																													
	id: 13473567347727306844																									  
  state: UNAVAIL																													
 status: The pool was last accessed by another system.																			
 action: The pool cannot be imported due to damaged devices or data.																
   see: http://illumos.org/msg/ZFS-8000-EY																						
 config:																															
																																  
	   data1											   UNAVAIL  insufficient replicas										
		 raidz2-0										  UNAVAIL  insufficient replicas										
		   13880681796121711656							UNAVAIL  cannot open													
		   gptid/d859e08c-be99-11e7-960c-d050992ecffc.eli  ONLINE																
		   4943386155760196097							 UNAVAIL  cannot open													
		   gptid/d39fc645-c01b-11e7-960c-d050992ecffc.eli  ONLINE																
		   6756959280491151806							 UNAVAIL  cannot open													
		   5062403531773914712							 UNAVAIL  cannot open													
		 raidz2-1										  ONLINE																
		   gptid/6d5a535f-e2aa-11e6-81a3-d050992ecffc.eli  ONLINE																
		   gptid/6e92dd63-e2aa-11e6-81a3-d050992ecffc.eli  ONLINE																
		   gptid/6fc79c28-e2aa-11e6-81a3-d050992ecffc.eli  ONLINE																
		   gptid/7103a362-e2aa-11e6-81a3-d050992ecffc.eli  ONLINE																
		   gptid/723c9664-e2aa-11e6-81a3-d050992ecffc.eli  ONLINE																
		   gptid/737f80ad-e2aa-11e6-81a3-d050992ecffc.eli  ONLINE			

keep in mind that just yesterday all the drives on a new installation came up all online on both raidz2-0 and raidz2-1 and I get random drives that become unavailable not the same ones all the time! :/
and only 2 drives in the raidz2-0 where unavailable yesterday on the old installation and as I understand 2 should be the max I could lose without losing data...

Please any help would be greatly appreciated and if there is a way of mounting and unlocking just the raidz2-1 WITHOUT possibly compromising the data in the volume how do you do that?
 
Last edited by a moderator:

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
and if there is a way of mounting and unlocking just the raidz2-1 WITHOUT possibly compromising the data in the volume how do you do that?
Sorry, the whole pool is needed. You can't access vdev-1 without vdev-0 because the data is spread across both.
It looks like the content of the drives may have become damaged, because this, 'cannot open', tends to indicate that the drive is present but can't be accessed. Do you have a backup at all?
 

Fr33nasnoobi3

Dabbler
Joined
Mar 11, 2017
Messages
19
:( ummmm no I live in the country so my upload bandwidth is >1mbs so offsite/cloud is not an option at all or believe me i would have done that from day one... I did a ton of reading prior to choosing this setup/raid level because i figured the chances of 2 drives just dying overnight where slim to none and i still have no idea what happened the computer was off with a drive in another volume that was going bad... just basically a reboot and things went to hell quick! and if I fiddle with swapping this and that out and around I got all the drives to show up as online yesterday night but it still wouldnt unlock :/ so idk whats going on to be honest... I cant get consistent repeatable results... Worse case scenario do you think the drives are physically damaged aka call drivesavers or is it a software issue that someone happened?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
How old is the system? You said it had been running for several years. What are the approximate ages of the drives?
Have you been running scheduled SMART tests on the drives? Were they all reporting as healthy?
What kind of HBA were you using to connect the drives?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Uhmm, it kinda looks like the middleware just aborted the unlocking process after one disk failed to do so. Manually unlock the drives with your keys and you might be able to get at your pool.
 

Fr33nasnoobi3

Dabbler
Joined
Mar 11, 2017
Messages
19
Originally I started off in.... september of 2014 with an ASRock Intel Avoton c2750 mobo with 6x4tb drives eventually I outgrew that and replaced those drives with 6x6tb drives around... january of 2017... after migrating everything off the 6x4tb volume i was like hell i might was well leave them in there and set it up to use those as well for less critical stuff because most of them were getting older... I think one of them was fairly new from being replaced in 2016. At that point i was still running everything off the 12 onboard sata's. Then I started doing alot of 4k video with my new drone and sony camera so I figured i would get a jump on the storage crisis and got another 6x6tb drives may of 2017 but needed a cheap little rocketraid 2720/2710 card.
 

Fr33nasnoobi3

Dabbler
Joined
Mar 11, 2017
Messages
19
So when the one drive started acting strange (disconnecting its self) I went overboard and bought a High Point Rocket 750 (for future expansion) and to eliminate any issues that could be going on that would cause the drive to just disconnect its self... As for smart test i was doing shorts once a week along with scrubs and longs once a month and never had any issues reported on the health of the drives
 

Fr33nasnoobi3

Dabbler
Joined
Mar 11, 2017
Messages
19
Uhmm, it kinda looks like the middleware just aborted the unlocking process after one disk failed to do so. Manually unlock the drives with your keys and you might be able to get at your pool.
thats where things get a bit dicey... I have a key and a recovery key saved to my drop box and google drive that have the date of when i expanded the volume as part of the file name...and i did try the geli attach -p -k /mnt/geli.key da0p2 yada yada but it came back wrong key or whatever the error was so Im not sure why they werent working.... So my only option is to use the password to unlock in the gui correct?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Did you try the recovery key? The "regular" key only works with the password (naturally works in the command line, too).
 

Fr33nasnoobi3

Dabbler
Joined
Mar 11, 2017
Messages
19
Did you try the recovery key? The "regular" key only works with the password (naturally works in the command line, too).
Yeah I tired both no dice... how do you use the geil key with the password from the command line? if possible* And can the recovery key be downloaded somehow with just knowledge of the password and a volume that wont unlock to give you the option to download?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
So when the one drive started acting strange (disconnecting its self) I went overboard and bought a High Point Rocket 750
I sure wish you would have come to the forum and looked at the suggested hardware first. I have had some experience with those at work and they can be the source of a great deal of grief. It is best to stay far away from any of those Rocket cards.
So sorry for your loss.
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
Yet another reason not to run encryption unless required to by law or company policy.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I have had some experience with those at work and they can be the source of a great deal of grief.
On what OS? We knew they sucked for FreeBSD, but it wouldn't surprise me too much to know that their driver is crap everywhere.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
On what OS? We knew they sucked for FreeBSD, but it wouldn't surprise me too much to know that their driver is crap everywhere.
We have a CentOS server that we finally got to be stable. We had to replace the card though, so I think it was defective. It had made 4 drives drop from the pool, that later testing showed as good, before it failed to even detect almost a dozen drives and it was not always the same drives.

Sent from my SAMSUNG-SGH-I537 using Tapatalk
 

Fr33nasnoobi3

Dabbler
Joined
Mar 11, 2017
Messages
19
We have a CentOS server that we finally got to be stable. We had to replace the card though, so I think it was defective. It had made 4 drives drop from the pool, that later testing showed as good, before it failed to even detect almost a dozen drives and it was not always the same drives.

Sent from my SAMSUNG-SGH-I537 using Tapatalk


So this dropping/disconnecting is standard behavior from these cards? I did read alot about the approved hardware prior to using it but what gathered from it was the driver was buggy years ago and no one had anything really bad to say about it more recently... How where you able to recover from the 4 drives that dropped off the pool? (if you were able to) I figured the drives being buggy was 1. fixed and 2. a work or not work situation not the gremlin it has become....
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I did read a lot about the approved hardware prior to using it but what gathered from it was the driver was buggy years ago and no one had anything really bad to say about it more recently...
I did specify that only LSI HBAs are known to work well on FreeBSD in the Hardware Recommendations Guide. There was some hope when the Rocket 750 came out, but the driver was crap and they didn't bother to fix it. That hope lasted about half a year.
 

Fr33nasnoobi3

Dabbler
Joined
Mar 11, 2017
Messages
19
I did specify that only LSI HBAs are known to work well on FreeBSD in the Hardware Recommendations Guide. There was some hope when the Rocket 750 came out, but the driver was crap and they didn't bother to fix it. That hope lasted about half a year.


I dont mean for this to sound rude but I find a great difference between known to work well or not work well and destroying hard drives/off-lining drives and possibly having unrecoverable data! LOL *sigh*
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
It is pretty extreme and I do feel there is probably still some recovery that can be done here. Unfortunately, I'm not very familiar with GELI and can't be of much help. Maybe the FreeBSD forums might be able to help you get GELI to unlock the disks.
 

Fr33nasnoobi3

Dabbler
Joined
Mar 11, 2017
Messages
19
It is pretty extreme and I do feel there is probably still some recovery that can be done here. Unfortunately, I'm not very familiar with GELI and can't be of much help. Maybe the FreeBSD forums might be able to help you get GELI to unlock the disks.

Yeah i havent tried unlocking them with geli + password yet because i just learned only the recovery key is password free so even if i was using the right geli key without the password it would have thrown the same error. the good news is drivesavers says no big deal just deposit your soul into this jar along with a half years wages and wallah files recovered! Q_Q
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
but what gathered from it was the driver was buggy years ago and no one had anything really bad to say about it more recently...
You are not hearing about it, not because it got better, but because we are not using it.
How where you able to recover from the 4 drives that dropped off the pool?
I actually had to replace the controller. The one I had was apparently bad / defective. It was a couple years ago and I can't remember the command but it was possible to force enough of the drives back online to get the pool functioning again.
the good news is drivesavers says no big deal just deposit your soul into this jar along with a half years wages and wallah files recovered!
Isn't that interesting. Only one soul...
 
Last edited:
Status
Not open for further replies.
Top