About a year ago, I built a new FreeNAS server for backups. All of the components in it were picked from the HCL.
FreeNAS-9.10.1-U4 (ec9a7d3)
Intel Xeon CPU E3-1220 v3 @ 3.10GHz
32GB RAM Crucial 240-pin DIMM, DDR3 PC3-12800
SUPERMICRO Supermicro X10SLL-F-B
SUPERMICRO 4U 24-BAY 846E1-R900B
24X WD Red 2TB NAS WD20EFRX
Lenovo IBM Intel I350-T2 2XGBE BaseT Ethernet Adapter
LSI Logic Controller Card MegaRAID SAS 9211-8i
Pool configured as Raid Z2.
I use this as an multipathed iSCSI disk presented to VMWare 5.5. I have various VMDK's attached to virtual machines in a farm of 5 ESX servers.
During normal operation, daily backups running and what not, I have no problems. It's only when I turn up the heat, the box starts freaking out. As an example, right now, I'm backing up 7TB from one ESX host and about 1.5 from other. I get checksum errors spread throughout the disks. These will get into the thousands before it will eventually kick the disk offline. It happened last weekend when copying over a server for archiving. I lost that server due to the disk going offline and being corrupted.
I'm at a bit of a loss here. I've noticed this behavior since I first installed it.
I've swapped out a few disks, controller for an identical one
Changed controller to backplane cable.
Interface utilization isn't even close to maxing out either GB interface. 150Mbps on one nic.
CPU is maybe 5%,
System load is less than 1
Disk I/0 is about 25/M
Ram is full, which I thought to be expected.
No swap utilization
iSCSI Read 9M Write 15M
ARC size = 28 GB
ARC hit ratio - 88%
ARC demand_data - 667
ARC demand_metadata 593
ARC prefetch_data 66
ARC prefetch_metadata 187
I'm not sure where else I need to look.
Thanks in advance.
FreeNAS-9.10.1-U4 (ec9a7d3)
Intel Xeon CPU E3-1220 v3 @ 3.10GHz
32GB RAM Crucial 240-pin DIMM, DDR3 PC3-12800
SUPERMICRO Supermicro X10SLL-F-B
SUPERMICRO 4U 24-BAY 846E1-R900B
24X WD Red 2TB NAS WD20EFRX
Lenovo IBM Intel I350-T2 2XGBE BaseT Ethernet Adapter
LSI Logic Controller Card MegaRAID SAS 9211-8i
Pool configured as Raid Z2.
I use this as an multipathed iSCSI disk presented to VMWare 5.5. I have various VMDK's attached to virtual machines in a farm of 5 ESX servers.
During normal operation, daily backups running and what not, I have no problems. It's only when I turn up the heat, the box starts freaking out. As an example, right now, I'm backing up 7TB from one ESX host and about 1.5 from other. I get checksum errors spread throughout the disks. These will get into the thousands before it will eventually kick the disk offline. It happened last weekend when copying over a server for archiving. I lost that server due to the disk going offline and being corrupted.
I'm at a bit of a loss here. I've noticed this behavior since I first installed it.
I've swapped out a few disks, controller for an identical one
Changed controller to backplane cable.
Interface utilization isn't even close to maxing out either GB interface. 150Mbps on one nic.
CPU is maybe 5%,
System load is less than 1
Disk I/0 is about 25/M
Ram is full, which I thought to be expected.
No swap utilization
iSCSI Read 9M Write 15M
ARC size = 28 GB
ARC hit ratio - 88%
ARC demand_data - 667
ARC demand_metadata 593
ARC prefetch_data 66
ARC prefetch_metadata 187
I'm not sure where else I need to look.
Thanks in advance.
Last edited by a moderator: