Random shut downs during resilver

Status
Not open for further replies.

Yaniv

Dabbler
Joined
Feb 5, 2015
Messages
12
Hey all,

Having some issues with my box, hopefully someone could point me in the right direction.

Background:
- Been using the system for about 6 or more years. Never really had any major problems. No changes hardware wise apart from the occasional hard disk replacement and boot usb change.
- I am using raidz2, I've had to resilver drives in the past, takes a while but no issues.
- I know the hardware etc are probably all wrong, I plan on creating a new system soon but for now, I need the data on the drives - and haven't lost any..yet...but I can't keep the system on long enough to back it up even if I wanted to ( I do keep backups but some of them are a bit stale, need a refresh).

Problem:
It's taking a VERY long time to get to the boot menu options (if it even gets there). If I check the zpool status, I can see it's resilvering. Whenever it hits a certain %, it seems like the box just shuts down - immediate power off. Nothing comes up logs wise. Sometimes it does not make it to the boot menu, it just powers off. I can't seem to find anything special in any of the logs but maybe I'm looking in the wrong ones? (don't have it open right now but checked /var/db/... message etc)

Things I've tried:
- Tried a new boot usb with a fresh install -> if I do this, the system stays on, everything is fine but my pool isnt there. So I upload my most recent config, and then the problem is back.
- Tried mirroring the usb, tried different usb ports etc.
- I tried to offline the disk while the resilver is in process... doesn't seem to do anything - it goes offline but the resilver doesnt restart. Eventually it shuts down. If the disk is offline and it does happen to restart, it still randomly shuts down at that magic number %.
- I've tried to offline another disk which may or may not have issue... I don't think it has issue but i figure let me offline 2 and see what happens... same thing -> resilver does not restart, and it's still randomly shutting down.
- I've tried moving the logs to persistent storage, copying them with afp and looking at them post shut down to see if there is anything there... nothing stood out
- I've tried putting in a new hard drive and disconnecting the old hard drive... not quite sure how to online it from cli, and I can't get to the UI because the system is so degraded the UI barely opens. If it does open, it will shut down before I can login.
- I wanted to try the replace command from the cli, but im not sure how to find the new drives serial. Doesn't come up with glabel status.. even if I find it, I think the pool would need to know about it first.
- I've started using the latest master release build (11.2 something), but I tried downgrading to different versions of freenas etc. No difference.
- A bit extreme but I've also hooked up the drives to a different PC, new power supply, MB, cpu, memory, etc... so it's just the drives... didn't get to far with this because I couldn't seem to boot into the freenas drive... tried an installer usb but it threw an error about not knowing which disk to use.. might be a bios issue... gave up on this one.
- I've tried turning off all the services that start at boot up and disable the only jail I use - plex.

Hardware:
- 6 x WD Red 3tb drives (they are not encrypted).
- Boot drive : Corsair Flash Voyager Vega 32GB Ultra Compact Low Profile USB 3.0 Flash Drive
- PS: not sure brand but it's about 675W
- Motherboard and CPU: Biostar Mini ITX DDR3 1333 Socket P Motherboards A68I-350 DELUXE R2.0
- Memory 8GB - Kingston KHX16C9B1RK2/8X HyperX Red 8GB (4GB 512M x 64-Bit x 2 pcs.) DDR3-1600 CL9 240-Pin DIMM Kit
- Sata card: HighPoint Rocket 640L Lite Version 4-Port PCI-Express 2.0 x4 SATA 6Gb/s RAID Controller
- Plugged into a battery / surge thing, I've tried directly in the wall, diff power outlets etc.

That's all I can think of off the top of my head. I can try post logs but they are kinda hard to get when the system just shuts down on me... also since it's an immediate power off, I don't think there is anything too helpful in them but I'm hopefully wrong. Please let me know what logs to post etc and I'll try get them.
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
Could be any number of bad parts. Do you have another PSU you could test? Also how are the temps on each part?
 

Yaniv

Dabbler
Joined
Feb 5, 2015
Messages
12
Could be any number of bad parts. Do you have another PSU you could test? Also how are the temps on each part?

I guess I could try the PSU from the other system I tried to test with (I think its just over 700W or so). I'll try that tonight and post back - like I said, if I don't upload my config i.e. use a clean freenas without my pool, it stays on without issue so I'm skeptical but definitely worth a shot! I've seen weirder things happen.

It's pretty well ventilated & cooled, nothing has stood out temp wise.. If you know any commands to check the temps while the resilver is going, I can try post the info here. Haven't tried a memtest yet - I don't think it will stay on long enough.
 

pro lamer

Guru
Joined
Feb 16, 2018
Messages
626
If you know any commands to check the temps while the resilver is going
smartctl -a /dev/blahblah
Or temp history:
smartctl -l scttemp /dev/blahbla

Sent from my mobile phone
 

Yaniv

Dabbler
Joined
Feb 5, 2015
Messages
12
Could be any number of bad parts. Do you have another PSU you could test? Also how are the temps on each part?

Put in a new power supply and managed to hit replace on one of the disks, through the GUI. It got past the % it kept shutting down at which is a great sign. It eventually tried to run something during the night, which caused all the disks to go offline. I had no choice but to restart.. it's picked up the resilvering from past the shut down %. So far so good. Not sure if it's the newer power supply or that I managed to successfully hit replace without it going crazy... For now, I'm just going to wait it out and hope it finishes.

All the HDD temps are in the low / mid 30's.

I'll mark this as solved and post back once it finishes - if it finishes :)
 
Status
Not open for further replies.
Top