Previously my system had some memory problems (faulty) which caused the system to reboot and hang during the boot process.
Now things seem to be working much better, but recently I've been seeing ctl_datamove errors show up at 5 minutes past the hour, every hour, and about 3 times in a few seconds - like so:
> ctl_datamove: tag 0x166c1 on (1:4:0:0) aborted
> ctl_datamove: tag 0x594e7 on (3:4:0:0) aborted
> ctl_datamove: tag 0xe5cfd8 on (0:4:0:1) aborted
Also, I'm getting additional SAS errors about once a month:
> ses0: da0,pass0: Element descriptor: 'Slot 01'
> ses0: da0,pass0: SAS Device Slot Element: 1 Phys at Slot 0
> ses0: phy 0: SAS device type 1 id 0
> ses0: phy 0: protocols: Initiator( None ) Target( SSP )
> ses0: phy 0: parent 50030480013ccd3f addr 50000c0f01ed1402
... across all slots in the array. I'm running a Supermicro 6027R-E1R12L with 192Gb of RAM, a 10Gb Chelsio (with Chelsio optics), an Intel P3700 ZIL, and 12 WD 2Tb RE SAS disks. I've got iSCSI running off those two 10Gb connections to my ESXi hosts, round-robin, and on their own network. Not running de-dupe, very vanilla ESXi installs (haven't toyed with any drivers in that regard, they're using Intel X520-da2's). Freenas is hosting the vmdk storage for the ESXi hosts, and the RPM speed has been set to 7200.
The regularity of the ctl_datamove errors makes me think that the ESXi hosts (there are two) are "checking in" or some such thing. I've only got the one VM guest server running now, and it's not doing anything. Accessing and performing tasks on the guest server doesn't cause errors either.
When the ctl_errors occur, the VMs become unresponsive for about a minute, but then pop right back up to working normally.
My guess is that there's something I should look at in the ESXi network settings? I'm just drawing a blank as to which ones I should be giving a look at.
Now things seem to be working much better, but recently I've been seeing ctl_datamove errors show up at 5 minutes past the hour, every hour, and about 3 times in a few seconds - like so:
> ctl_datamove: tag 0x166c1 on (1:4:0:0) aborted
> ctl_datamove: tag 0x594e7 on (3:4:0:0) aborted
> ctl_datamove: tag 0xe5cfd8 on (0:4:0:1) aborted
Also, I'm getting additional SAS errors about once a month:
> ses0: da0,pass0: Element descriptor: 'Slot 01'
> ses0: da0,pass0: SAS Device Slot Element: 1 Phys at Slot 0
> ses0: phy 0: SAS device type 1 id 0
> ses0: phy 0: protocols: Initiator( None ) Target( SSP )
> ses0: phy 0: parent 50030480013ccd3f addr 50000c0f01ed1402
... across all slots in the array. I'm running a Supermicro 6027R-E1R12L with 192Gb of RAM, a 10Gb Chelsio (with Chelsio optics), an Intel P3700 ZIL, and 12 WD 2Tb RE SAS disks. I've got iSCSI running off those two 10Gb connections to my ESXi hosts, round-robin, and on their own network. Not running de-dupe, very vanilla ESXi installs (haven't toyed with any drivers in that regard, they're using Intel X520-da2's). Freenas is hosting the vmdk storage for the ESXi hosts, and the RPM speed has been set to 7200.
The regularity of the ctl_datamove errors makes me think that the ESXi hosts (there are two) are "checking in" or some such thing. I've only got the one VM guest server running now, and it's not doing anything. Accessing and performing tasks on the guest server doesn't cause errors either.
When the ctl_errors occur, the VMs become unresponsive for about a minute, but then pop right back up to working normally.
My guess is that there's something I should look at in the ESXi network settings? I'm just drawing a blank as to which ones I should be giving a look at.