ioquatix
Dabbler
- Joined
- May 9, 2017
- Messages
- 48
I've got an old N40L. It's been an awesome first server but it's a bit long in the tooth.
Recently, it's been crashing on a semi-regular basis - it started off about once every few months and eventually once a week even.
I chalked it up to ZFS on Linux issues and not enough RAM. After upgrading my system to 2x8GB, I had even more frequent problems.
After running memtest86, the system would crash almost always at the 9 minute mark. I found that in some cases, it was reported on the boot screen, unrecoverable ECC error detected. I checked in the bios and there were a LOT of errors in the log, going back for quite a while. So the system was detecting the error and locking up to prevent any further issues.
While being far from ideal - ZFS was fine through all of this (can't say the same for my EXT4 boot drive) - it's survived numerous hard crashes, and as far as I can tell (I've checked and scrubbed), all my data is absolutely fine.
I'm absolutely certain that without ECC, I would have experienced silent data corruption. I ingest 10-100Gbytes of data per week as backups, and the system has 16Gbytes of RAM, so it seems plausible that every bit of memory was utilised on a regular basis.
I will always use ECC in my servers, and I'm seriously considering it for my workstation too now.
Recently, it's been crashing on a semi-regular basis - it started off about once every few months and eventually once a week even.
I chalked it up to ZFS on Linux issues and not enough RAM. After upgrading my system to 2x8GB, I had even more frequent problems.
After running memtest86, the system would crash almost always at the 9 minute mark. I found that in some cases, it was reported on the boot screen, unrecoverable ECC error detected. I checked in the bios and there were a LOT of errors in the log, going back for quite a while. So the system was detecting the error and locking up to prevent any further issues.
While being far from ideal - ZFS was fine through all of this (can't say the same for my EXT4 boot drive) - it's survived numerous hard crashes, and as far as I can tell (I've checked and scrubbed), all my data is absolutely fine.
I'm absolutely certain that without ECC, I would have experienced silent data corruption. I ingest 10-100Gbytes of data per week as backups, and the system has 16Gbytes of RAM, so it seems plausible that every bit of memory was utilised on a regular basis.
I will always use ECC in my servers, and I'm seriously considering it for my workstation too now.
Last edited: