Volume "Disappeared" After Improper Shutdown

Status
Not open for further replies.

ere109

Contributor
Joined
Aug 22, 2017
Messages
190
My two-year-old daughter, who likes to push buttons, got into the room and improperly shut down my FN 11.1 system (while I had a movie playing on Plex, so disk load). I've got five 8TB drives in a Raid Z2 pool that are now unavailable. View Volumes still shows the volume name, but says "UNAVAIL." Meanwhile, when I open Volume Manager, it recognizes the five drives, and wants to add them to a new pool. When I click the volume, itself, it gives me the option to detach, scrub, volume status or upgrade.
I'm pretty good at reading and solving my own problems, but it took me weeks to transfer all of my data from the old NAS and every device in the house, and I don't want to lose that time/data. Help would be greatly appreciated.
 

m0nkey_

MVP
Joined
Oct 27, 2015
Messages
2,739
We're going to need more information.
  • How are the drives hooked up? HBA, etc?
  • Was your pool encrypted?
  • What's the output of zpool import and zpool status?
 

ere109

Contributor
Joined
Aug 22, 2017
Messages
190
Thanks for your help. The host Bus is Broadcom 2308. The pool was not encrypted. zpool import yielded no result. This is the output of the status. It seems to instruct me to run zpool clear, but this is scary territory. Thank you.
Code:
zpool status
  pool: MAINSPIN
 state: UNAVAIL
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
   see: http://illumos.org/msg/ZFS-8000-JQ
  scan: none requested
config:

		NAME					  STATE	 READ WRITE CKSUM
		MAINSPIN				  UNAVAIL	  0	 0	 0
		  raidz2-0				UNAVAIL	  0	 5	 0
			2760554092486413937   REMOVED	  0	 0	 0  was /dev/gptid/ead229c8-f655-11e7-ac4f-002590ed9fc4
			5839067863432692929   REMOVED	  0	 0	 0  was /dev/gptid/ebe14794-f655-11e7-ac4f-002590ed9fc4
			12391365849909013340  REMOVED	  0	 0	 0  was /dev/gptid/ecf07c2d-f655-11e7-ac4f-002590ed9fc4
			13734883185564571814  REMOVED	  0	 0	 0  was /dev/gptid/ee066b24-f655-11e7-ac4f-002590ed9fc4
			11380074299853842333  REMOVED	  0	 0	 0  was /dev/gptid/ef14066a-f655-11e7-ac4f-002590ed9fc4

errors: 2 data errors, use '-v' for a list

  pool: SECURITY
 state: ONLINE
  scan: none requested

  pool: SPARE_SSD
 state: ONLINE
  scan: none requested

  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:00:06 with 0 errors on Sun Feb 11 03:45:06 2018
 
Last edited:

m0nkey_

MVP
Joined
Oct 27, 2015
Messages
2,739
Can you also post the output of camcontrol devlist?

At the moment, it appears none of your drives are connected or powered on. Have you verified they're getting power and the cables are connected?

Since you're using a HBA, move the drives to the on-board SATA and see if you see the drives.
 

ere109

Contributor
Joined
Aug 22, 2017
Messages
190
All disks are showing up (or, as I count, perhaps four disks are showing up out of five ST8000s). Here's the output. The reading I've done seems to give me confidence that running zpool clear, followed by zpool online could fix the problem. I just continue to be reluctant to move too quickly.
Code:
root@FreeNAS:~ # camcontrol devlist
<ATA ST8000VN0022-2EL SC61>		at scbus0 target 2 lun 0 (da3,pass3)
<ATA ST8000VN0022-2EL SC61>		at scbus0 target 3 lun 0 (da1,pass1)
<ATA Hitachi HUS72403 A5F0>		at scbus0 target 4 lun 0 (pass2,da2)
<ATA ST8000VN0022-2EL SC61>		at scbus0 target 6 lun 0 (da4,pass4)
<ATA ST8000VN0022-2EL SC61>		at scbus0 target 7 lun 0 (da0,pass0)
<Crucial CT120M500SSD1 MU02>	   at scbus1 target 0 lun 0 (ada0,pass6)
<Samsung SSD 850 EVO 120GB EMT02B6Q>  at scbus2 target 0 lun 0 (ada1,pass7)
root@FreeNAS:~ #
 

ere109

Contributor
Joined
Aug 22, 2017
Messages
190
I'd hate to think that an improper shutdown would cause a hard drive to actually fail, but I suppose it's possible. Perhaps a bad SATA port? Could the system handle it if I unplug the drives and put them into different ports, or do the port assignments have to remain constant to maintain the pool?
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
No, the port assignments do not have to be constant. Drives can be attached to any available port. @m0nkey_ already suggested you move them from your HBA to mobo sata ports for a test...
 
Last edited:

m0nkey_

MVP
Joined
Oct 27, 2015
Messages
2,739
May also be worth checking to see if the pool was 'destroyed'. You can try running zpool import -D.
 

ere109

Contributor
Joined
Aug 22, 2017
Messages
190
May also be worth checking to see if the pool was 'destroyed'. You can try running zpool import -D.
zpool import -D did nothing - or gave no result.
Update today: I restarted the machine, and the remaining disks connected to that Broadcom 2308 have disappeared. If there's a chance that FN has forgotten how to talk to the HBA, I can investigate that. Otherwise, perhaps the HBA has truly gone bad.
I guess I'll pull the machine apart in the next day or two and check all of my cables/ports - reconnect to the mainboard controller.
 
Last edited:

ere109

Contributor
Joined
Aug 22, 2017
Messages
190
Question: is there a device file that could have been corrupted during the improper shutdown? Any chance I can delete that and/or rebuild it? My question is, should I further investigate software issues, rather than go straight to hardware?
I'll check my ipmi logs over lunch.
 
Joined
Oct 9, 2016
Messages
4
I am newbie with FN, but I can say:
- if some files of FN base system got corrupted, you can always re-deploy a fresh copy of FN, and reconfigure it. No problem recognizing the disk/pool afterwards (asumming no HW error)
- the first thing I would to, is to make sure the OS recognize the HBA and drives.. at os level, without hw errors.
just some thoughts.
 

ere109

Contributor
Joined
Aug 22, 2017
Messages
190
Have you tried connecting the drives direct to the SATA ports?
First, thanks for sticking with me. That is my "next" step. After the baby hit the power button, I spent two hours wedging the case into a new location in the basement, that I'll have to unwedge in order to access the drives. I had hoped there might be another option, like resetting the hardware list, before I pull everything apart. I was just reading a thread about "camcontrol rescan all" to see if that might be viable.
 
Last edited:

ere109

Contributor
Joined
Aug 22, 2017
Messages
190
I've been reading about a bug with older HBA's and FN installs, from about a year ago. Another thread discussed firmware/driver compatibility. It was suggested that FW16 / driver 20 was the farthest stretch without issues. I've got 14/21... I also have the IS version of the firmware, and might be helped by a newer IT version. This is all just reading for fun, at the moment.
 

ere109

Contributor
Joined
Aug 22, 2017
Messages
190
Supermicro has firmware v20 available:
ftp://ftp.supermicro.com/driver/SAS/LSI/2308/Firmware/IT/
 

ere109

Contributor
Joined
Aug 22, 2017
Messages
190
I'll take your advice, @m0nkey. As of this afternoon, two of the six hard drives have suddenly re-appeared - and I've done nothing. I'm also getting some funky error readouts at my system info screen, so I think a full re-install is in order. But when I pull the box open, I'll swap ports over to the mobo controller and post results. Then, when there's nothing connected to the HBA, I'll update the firmware.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
I concur.

Take the HBA out of the equation. PLug the drives in to SATA ports.
 

ere109

Contributor
Joined
Aug 22, 2017
Messages
190
I believe you both. Last night, I restarted the machine for the third or fourth time, and all of my disks and pools were back online! Definitely an odd series of events, and it gives me a lot more to scratch my head about.
 
Status
Not open for further replies.
Top