Recently, after testing and going beta by uploading our VMs to our new FreeNAS build, we started seeing "random" reboots during (thankfully) off-hours; about one every day and a half. I checked out the IPMI on our system, and normally during memory events - if it's a ECC error - I'll see the problematic DIMM in question.
However, this time I'm seeing CPLD error which causes the reboot - and thus no faulty DIMM information. Unfortunately, the only text is: "OEM CPLD CATTER - Asserted", and that then trips a watchdog timer (on the board) interrupt, which then hard resets the system.
My guesses are, at this point, that I either don't have enough memory for my system or that my mobo is having problems. Given the "OEM" part of the message above, I'm thinking more that it's the FreeNAS interaction with the memory that caused the hang.
My configuration is a Supermicro 6027R-E1R12L, two CPUs, 192Gb of ECC RAM, 8 800gb Intel S3700 SSDs in a RAID10, Intel P3600 ZIL, two Chelsio 10Gb nics. Running latest 9.3.1 FreeNAS. Those SSDs make a pool of 2.6Tb, which I carved a 1.8Tb volume for VMDKs and created a iSCSI target file of 1.5Tb. Currently we're using 250Gb of that iSCSI storage.
-- also attached via SAS expander to a Supermicro JBOD box (just disks and the SAS controller) are 12 2Tb WD RE series (SAS) in another pool that's 6.5Tb of usable space. That's another iSCSI target for our Windows VM, where we've got files stored on that Windows system to about 700Gb of data.
We've had 6 VMs on this system for a few weeks with no problems, but over the last few days we'd migrated another 2 (one of which had a rather large lazy zeroed partition associated with it). After doing so, that's when the reboots started.
I've got the ability to swap out that memory for 32Gb LDIMMS, but that'll be a blow to our budget that - if it's just a matter of further tuning or fixing our configuration - I'd like to avoid if possible.
However, this time I'm seeing CPLD error which causes the reboot - and thus no faulty DIMM information. Unfortunately, the only text is: "OEM CPLD CATTER - Asserted", and that then trips a watchdog timer (on the board) interrupt, which then hard resets the system.
My guesses are, at this point, that I either don't have enough memory for my system or that my mobo is having problems. Given the "OEM" part of the message above, I'm thinking more that it's the FreeNAS interaction with the memory that caused the hang.
My configuration is a Supermicro 6027R-E1R12L, two CPUs, 192Gb of ECC RAM, 8 800gb Intel S3700 SSDs in a RAID10, Intel P3600 ZIL, two Chelsio 10Gb nics. Running latest 9.3.1 FreeNAS. Those SSDs make a pool of 2.6Tb, which I carved a 1.8Tb volume for VMDKs and created a iSCSI target file of 1.5Tb. Currently we're using 250Gb of that iSCSI storage.
-- also attached via SAS expander to a Supermicro JBOD box (just disks and the SAS controller) are 12 2Tb WD RE series (SAS) in another pool that's 6.5Tb of usable space. That's another iSCSI target for our Windows VM, where we've got files stored on that Windows system to about 700Gb of data.
We've had 6 VMs on this system for a few weeks with no problems, but over the last few days we'd migrated another 2 (one of which had a rather large lazy zeroed partition associated with it). After doing so, that's when the reboots started.
I've got the ability to swap out that memory for 32Gb LDIMMS, but that'll be a blow to our budget that - if it's just a matter of further tuning or fixing our configuration - I'd like to avoid if possible.