James Mason
Dabbler
- Joined
- Jan 21, 2019
- Messages
- 16
Hi. I'm just after some thoughts really on this one. I've only been using FreeNAS for a few months and it would be good to see if anyone has any comments.
Recently I had a bit of a strange situation where I returned home to find that my power at home had tripped and my UPS was beeping (a lot!). It's a APC Smart UPS 1000 and a bit of research showed me that the type of beeping was the current overload warning. Since it's a used item I thought it was likely that something had gone wrong with the UPS but it was actually one of the power supplies in the PC used for FreeNAS had violently died.
The FreeNAS system (was) a Dell T310 with dual redundant 400w PSUs. When I tried to start it up without the UPS there was a reasonably loud bang and sparks actually came from one of the PSUs. I concluded that the UPS was fine and as a temporary measure booted up the T310 with the single remaining PSU. It worked fine for a while but I woke a couple of days later to an email saying that one of the HDDs had a bad sector and another email saying that there was an "Unscheduled system reboot".
With sparks flying around inside the system it didn't surprise me that this may have caused some other issues, although it worked perfectly for 2 days. It had also worked flawlessly since around January before the PSU died. I decided to buy a completely new system (Fujitsu TX150 S7) and transfer the 2x USB boot drives and 4x HDDs (running in RAIDZ2 - I know, very inefficient but it's a work in progress). I transferred the USB boot drives, HDDs and RAM into the new system and it was rebooting every few hours. I have narrowed the rebooting down to the watchdog rebooting after a system crash because when I disabled the watchdog it crashed and just remained crashed - no reboot.
Thinking at this point that it was the RAM (2x 4GB Kingston DDR3 1333MHz ECC - I don't have the part number to hand but I can get it if necessary), I ran a Memtest86 run and it passed a few times with no issues. I also ran the test with the RAM in another machine and again had no errors.
I am waiting on delivery of a set of RAM that is an original part for the Fujitsu (which does complain at boot that the Kingston RAM isn't an authorised part), to see if the RAM is at fault (or a bit incompatible). At this point the only common hardware with the system that was originally running fine and the new system will be the HDDs and USB drives. Could these cause a system to crash? I was running 11.2-U7 and I tried booting into U6 and it still crashed.
Would it be worth taking out one of the HDDs and seeing if it still crashes, and then repeating this with a different HDD removed until all have been tested? Or the same with the USB boot drives? Is this particularly risky? I have backups of the data.
If anyone can see anything I have missed that is blindingly obvious then that would be great.
For clarity, here is the system as it is currently set up:
Fujitsu TX150 S7
Xeon X3430
8GB Kingston ECC DDR3 1333MHz (running at 800MHz because of the CPU - I have an X3450 to fix that)
4x HGST 3TB SATA HDDs in RAIDZ2 (about 4TB used)
2x SanDisk USB 3.1 16GB boot drives
Finally, how risky is it to have it crash whilst I'm trying to figure this out? I have backups and would assume that data loss is fairly unlikely but I'd prefer not to run into problems from repeated crashing.
Thanks in advance.
Recently I had a bit of a strange situation where I returned home to find that my power at home had tripped and my UPS was beeping (a lot!). It's a APC Smart UPS 1000 and a bit of research showed me that the type of beeping was the current overload warning. Since it's a used item I thought it was likely that something had gone wrong with the UPS but it was actually one of the power supplies in the PC used for FreeNAS had violently died.
The FreeNAS system (was) a Dell T310 with dual redundant 400w PSUs. When I tried to start it up without the UPS there was a reasonably loud bang and sparks actually came from one of the PSUs. I concluded that the UPS was fine and as a temporary measure booted up the T310 with the single remaining PSU. It worked fine for a while but I woke a couple of days later to an email saying that one of the HDDs had a bad sector and another email saying that there was an "Unscheduled system reboot".
With sparks flying around inside the system it didn't surprise me that this may have caused some other issues, although it worked perfectly for 2 days. It had also worked flawlessly since around January before the PSU died. I decided to buy a completely new system (Fujitsu TX150 S7) and transfer the 2x USB boot drives and 4x HDDs (running in RAIDZ2 - I know, very inefficient but it's a work in progress). I transferred the USB boot drives, HDDs and RAM into the new system and it was rebooting every few hours. I have narrowed the rebooting down to the watchdog rebooting after a system crash because when I disabled the watchdog it crashed and just remained crashed - no reboot.
Thinking at this point that it was the RAM (2x 4GB Kingston DDR3 1333MHz ECC - I don't have the part number to hand but I can get it if necessary), I ran a Memtest86 run and it passed a few times with no issues. I also ran the test with the RAM in another machine and again had no errors.
I am waiting on delivery of a set of RAM that is an original part for the Fujitsu (which does complain at boot that the Kingston RAM isn't an authorised part), to see if the RAM is at fault (or a bit incompatible). At this point the only common hardware with the system that was originally running fine and the new system will be the HDDs and USB drives. Could these cause a system to crash? I was running 11.2-U7 and I tried booting into U6 and it still crashed.
Would it be worth taking out one of the HDDs and seeing if it still crashes, and then repeating this with a different HDD removed until all have been tested? Or the same with the USB boot drives? Is this particularly risky? I have backups of the data.
If anyone can see anything I have missed that is blindingly obvious then that would be great.
For clarity, here is the system as it is currently set up:
Fujitsu TX150 S7
Xeon X3430
8GB Kingston ECC DDR3 1333MHz (running at 800MHz because of the CPU - I have an X3450 to fix that)
4x HGST 3TB SATA HDDs in RAIDZ2 (about 4TB used)
2x SanDisk USB 3.1 16GB boot drives
Finally, how risky is it to have it crash whilst I'm trying to figure this out? I have backups and would assume that data loss is fairly unlikely but I'd prefer not to run into problems from repeated crashing.
Thanks in advance.