Server becomes unresponsive over time

BetYourBottom

Contributor
Joined
Nov 26, 2016
Messages
141
I'm having issues with my TrueNAS setup that was working fine until recently.
For the past week or so my box will generally work fine on boot and for hours afterwards but eventually it'll get unresponsive, there will be reports of timeouts in the middleware, SMB will be unusable, even the console itself will be slow.

However, I've checked the processes during that time and there will be nothing taking up the CPU, the Swap will be unused and I believe even the disks will not be seeing much activity. However, there will be a huge amount of slowdown that makes it virtually useless until rebooted.

On reboot, generally it'll regain all it's lost performance and will be usable again.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
More details of your hardware, please.
 

BetYourBottom

Contributor
Joined
Nov 26, 2016
Messages
141
More details of your hardware, please.
Dell C2100
2x Intel E5640
40GB RAM (It used to be 72GB but something is off with the MB or CPU or something and 32GB no longer passes memtest despite the memory itself being fine and this issue is rather old at this point)
OS Drives - 2x Intel 710 100GB

I'm not entirely certain if this is a hardware issue since this is a rather recent change and that it generally will run well for the day if restarted in the morning and will be slow the next day.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
How are your data drives connected? You've still not provided full details, although you've already revealed you have RAM hardware issues.
 

BetYourBottom

Contributor
Joined
Nov 26, 2016
Messages
141
How are your data drives connected? You've still not provided full details, although you've already revealed you have RAM hardware issues.
Apologies, I didn't think it was necessary since the performance issues include the webUI and console which I thought would likely be entirely in RAM and not involve the data drives.

The OS drives are directly connected to the motherboard SATA ports.
The Data drives are connected via a SAS backplane to both the motherboard SATA ports and the HBA Riser card.

Data drives are 14TB WD Reds shucked from external HDs like is popular at the moment.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Could be cooling. HBAs tend to run hot, and you have a dual socket board.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
I don't think the HBA itself reports temps to the UI, but you might want to point an IR thermometer at it to see. If it's above 60 C, you'll need more cooling for its slot.
 

BetYourBottom

Contributor
Joined
Nov 26, 2016
Messages
141
I don't think the HBA itself reports temps to the UI, but you might want to point an IR thermometer at it to see. If it's above 60 C, you'll need more cooling for its slot.
I don't have an IR thermometer to test that sort of thing. However, that still should only explain slow pool access and not every single thing being slow.

I think the most shocking thing about this whole thing is the lack of errors being reported (outside of all the timeouts in the python scripting for the middleware).

Memory usage looks fine, CPU usage looks fine, Disk usage looks fine.

It's just somehow perpetually slow. Also as of recently it seems like reboots aren't fixing the issue. I'm starting to think that maybe something is dying in a weird way that allows it to continue (sort of) functioning but at an insanely reduced speed throughout. Like maybe a CPU is dying or something but in a way were it's self correcting 1000x until it succeeds.

EDIT: I will note that the one temperature that the IPMI reports that I'd think would be high (relative to everything else) is the IOH temperature which is >60C when the CPUs are 45-48C. So maybe it's the motherboard dying?
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
If your ICH is running hot, that's a bad sign. The ICH is one of the main IO hubs on your motherboard, so it very could well be the case your board is going south.
 

BetYourBottom

Contributor
Joined
Nov 26, 2016
Messages
141
If your ICH is running hot, that's a bad sign. The ICH is one of the main IO hubs on your motherboard, so it very could well be the case your board is going south.
Which sucks because this is not the market to look for new hardware in. Even used server prices seem rather high atm (I've seen old posts referencing the servers I'm looking at being half the price as they are now).

Any recommendations?
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
You could perform a full transplant of your disks into another C2100. A quick check of eBay shows these running around ~$100. Just the motherboard will run ~$80.
 

BetYourBottom

Contributor
Joined
Nov 26, 2016
Messages
141
You could perform a full transplant of your disks into another C2100. A quick check of eBay shows these running around ~$100. Just the motherboard will run ~$80.
I'll keep that in mind but I was thinking of doing a full change-over to a newer system that should hopefully be a little better condition and a little more efficient. I've not "outgrown" it, it's not like it was generally slow before the issues but I was thinking it would be good practice to phase it out.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
How large is your pool? If it's 4-6 drives, you could look at the HP MicroServers. Folks here have had good track records with them.
 

BetYourBottom

Contributor
Joined
Nov 26, 2016
Messages
141
How large is your pool? If it's 4-6 drives, you could look at the HP MicroServers. Folks here have had good track records with them.
Forgot to mention, sorry. 6x14TB 3.5" drives.
So if I'm going to expand my pool eventually I'd probably have to make a new vdev since it's unlikely that it'd be worth it to just grow in place again (I moved from 4TB drives awhile back).
 

BetYourBottom

Contributor
Joined
Nov 26, 2016
Messages
141
I've been looking around on Ebay for a new server to replace this one with a preference for Supermicro since I've heard so many good things about them and it seems like I should be able to just replace the MB/CPU/RAM as prices come down without needing a fully new unit.

What would you think about either of these:
https://www.ebay.com/itm/233694317173
- Newer than mine but a little older than the other and I could reuse my RAM, should fit a GPU, my network card, and SAS HBA without issues.
https://www.ebay.com/itm/144303815140
- Newer still but that comes with a higher cost and I'll need to spend even more to get new RAM. Plus I lose the rear 2x2.5 bay that seems like it'd be expensive to regain (Seeing over $100 easy for it) and it doesn't have rails. (realizing now after looking again that it'd likely be cheaper to just get the older one and buy a new motherboard at the cost of rails and 2x2.5 bay)

I should also mention that the other one has a SAS3 backplane but that likely doesn't matter since I am planning on using HDDs in all the bays and if I get SSDs for cache or a separate pool, I'd likely get NVMe drives anyways.
 
Top