High CPU and dropping iSCSI connections on san server - where to look?

Status
Not open for further replies.

yanky83

Dabbler
Joined
Mar 1, 2016
Messages
15
Hey,

I have already searched the forum (and Google obviously), but couldnt really find anything particularly helpful. Any suggestions would be much appreciated!

The problem is:
I have an iSCSI target server. Under normal circumstances, the server runs fine (load average: 0.38, 0.26, 0.25).
However, when I move large files on and off the iSCSI volumes, the CPU load explodes at some point, causing iSCSI connections to be dropped, and some system processes to be killed (e.g. ntp, ssh) claiming to be out of swap (however swap doesnt seem to be full). After some time, the system seems to resume normal operations.

I was thinking of memory issues (missing sysctl tuning?), network issues (tso, lro, etc.), or just an elderly CPU?
I would be very glad to get some pointers, where to look for issues, as Im slightly out of ideas at this point.

Hardware
  • Supermicro X8DTH-iF
  • 96GB RAM
  • LSI SAS 9207-8i HBA
  • 10 x Ultrastar 7K4000 4TB (multipath enabled on one HBA)
  • 2 x 240GB Samsung SM863 SSD
  • 4 x 1GB network
Setup:
  • Mirrored devs (5x2)
  • Mirrored SSD for ZIL, no L2ARC
  • 2 NICs bundeled using LAGG for management
  • 2 NICs with jumbo frames for iSCSI
  • About 8 server connecting via iSCSI (XEN, VMWare, Hyper-V)
  • About 20-30 VMs running their storage on this server
  • 12GB swap
No particular tuning in sysctl or loader.conf.

Code:
# sysctl -a | egrep -i 'hw.machine|hw.model|hw.ncpu'
hw.machine: amd64
hw.model: Intel(R) Xeon(R) CPU           E5620  @ 2.40GHz
hw.ncpu: 16
hw.machine_arch: amd64


Code:
# netstat -m
32738/16132/48870 mbufs in use (current/cache/total)
16368/4064/20432/6127332 mbuf clusters in use (current/cache/total/max)
16368/3998 mbuf+clusters out of packet secondary zone in use (current/cache)
0/9643/9643/3063666 4k (page size) jumbo clusters in use (current/cache/total/max)
16368/10103/26471/907752 9k jumbo clusters in use (current/cache/total/max)
0/0/0/510611 16k jumbo clusters in use (current/cache/total/max)
188232K/141660K/329892K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
2/11091729/0 requests for jumbo clusters denied (4k/9k/16k)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile


Code:
# top -SIz
last pid: 83860;  load averages:  0.11,  0.21,  0.23                                            up 117+00:21:44 11:15:05
46 processes:  2 running, 43 sleeping, 1 waiting
CPU:  0.0% user,  0.0% nice,  0.4% system,  0.0% interrupt, 99.6% idle
Mem: 1108K Active, 22M Inact, 92G Wired, 1728K Cache, 1780M Free
ARC: 52G Total, 14G MFU, 33G MRU, 14M Anon, 2630M Header, 1557M Other
Swap: 12G Total, 9216K Used, 12G Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
    0 root        613   8    -     0K  9808K icl_rx 14 337.1H   0.59% kernel
    4 root         17  -8    -     0K   304K zvol:i  6  18.5H   0.29% zfskern


Code:
# zpool iostat -v
                                           capacity     operations    bandwidth
pool                                    alloc   free   read  write   read  write
--------------------------------------  -----  -----  -----  -----  -----  -----
tank1                                   5.52T  12.6T      3     62  9.44K   771K
  mirror                                1.10T  2.52T      0      0  1.89K      0
    multipath/PEGKD3KX                      -      -      0      0  1.89K      0
    multipath/PEGKDS1X                      -      -      0      0      0      0
  mirror                                1.10T  2.52T      0      0      0      0
    multipath/PEGM3K8X                      -      -      0      0      0      0
    multipath/PEGM7Z9X                      -      -      0      0      0      0
  mirror                                1.10T  2.52T      0      0      0      0
    multipath/PEGM8YLX                      -      -      0      0      0      0
    multipath/PEGM8ZMX                      -      -      0      0      0      0
  mirror                                1.10T  2.52T      0      0      0      0
    multipath/PEGMAMWX                      -      -      0      0      0      0
    multipath/PEGMB93X                      -      -      0      0      0      0
  mirror                                1.10T  2.52T      2      0  7.55K      0
    multipath/PEGMDREX                      -      -      0      0      0      0
    multipath/PEGMK1GX                      -      -      2      0  7.55K      0
logs                                        -      -      -      -      -      -
  mirror                                 220M   222G      0     62      0   771K
    diskid/DISK-S2L4NXAG800381N%20%20%20%20%20      -      -      0     62      0   771K
    diskid/DISK-S2L4NXAG800356Z%20%20%20%20%20      -      -      0     62      0   771K
--------------------------------------  -----  -----  -----  -----  -----  -----
zroot                                   20.9G  36.6G      0      0      0      0
  mirror                                20.9G  36.6G      0      0      0      0
    ada0p3                                  -      -      0      0      0      0
    ada1p3                                  -      -      0      0      0      0
--------------------------------------  -----  -----  -----  -----  -----  -----


Code:
# zpool status
  pool: tank1
state: ONLINE
  scan: scrub repaired 0 in 12h42m with 0 errors on Thu Aug 11 16:15:30 2016
config:

        NAME                                            STATE     READ WRITE CKSUM
        tank1                                           ONLINE       0     0     0
          mirror-0                                      ONLINE       0     0     0
            multipath/PEGKD3KX                          ONLINE       0     0     0
            multipath/PEGKDS1X                          ONLINE       0     0     0
          mirror-1                                      ONLINE       0     0     0
            multipath/PEGM3K8X                          ONLINE       0     0     0
            multipath/PEGM7Z9X                          ONLINE       0     0     0
          mirror-2                                      ONLINE       0     0     0
            multipath/PEGM8YLX                          ONLINE       0     0     0
            multipath/PEGM8ZMX                          ONLINE       0     0     0
          mirror-3                                      ONLINE       0     0     0
            multipath/PEGMAMWX                          ONLINE       0     0     0
            multipath/PEGMB93X                          ONLINE       0     0     0
          mirror-4                                      ONLINE       0     0     0
            multipath/PEGMDREX                          ONLINE       0     0     0
            multipath/PEGMK1GX                          ONLINE       0     0     0
        logs
          mirror-5                                      ONLINE       0     0     0
            diskid/DISK-S2L4NXAG800381N%20%20%20%20%20  ONLINE       0     0     0
            diskid/DISK-S2L4NXAG800356Z%20%20%20%20%20  ONLINE       0     0     0

errors: No known data errors

  pool: zroot
state: ONLINE
  scan: scrub repaired 0 in 0h1m with 0 errors on Sun Aug 14 03:02:48 2016
config:

        NAME        STATE     READ WRITE CKSUM
        zroot       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            ada0p3  ONLINE       0     0     0
            ada1p3  ONLINE       0     0     0

errors: No known data errors


Code:
extract from /var/log/messages
Sep 12 23:48:18 san3 kernel: WARNING: 10.252.28.203 (iqn.2002-07.au.com.company:xen-host03): no ping reply (NOP-Out) after 5 seconds; dropping connection
Sep 12 23:48:19 san3 kernel: WARNING: 10.252.28.201 (iqn.2002-07.au.com.company:xen-host01): no ping reply (NOP-Out) after 5 seconds; dropping connection
Sep 12 23:48:21 san3 kernel: WARNING: 10.252.38.202 (iqn.2002-07.au.com.company:xen-host02): no ping reply (NOP-Out) after 5 seconds; dropping connection
Sep 12 23:48:28 san3 kernel: WARNING: 10.252.38.202 (iqn.2002-07.au.com.company:xen-host02): no ping reply (NOP-Out) after 5 seconds; dropping connection
Sep 12 23:48:36 san3 kernel: WARNING: 10.252.38.202 (iqn.2002-07.au.com.company:xen-host02): no ping reply (NOP-Out) after 5 seconds; dropping connection
Sep 12 23:48:36 san3 kernel: WARNING: 10.252.38.202 (iqn.2002-07.au.com.company:xen-host02): waiting for CTL to terminate 1 tasks
Sep 12 23:48:36 san3 kernel: WARNING: 10.252.38.202 (iqn.2002-07.au.com.company:xen-host02): tasks terminated
Sep 12 23:48:37 san3 kernel: WARNING: 10.252.38.101 (iqn.2002-07.au.com.company:barhost1): no ping reply (NOP-Out) after 5 seconds; dropping connection
Sep 12 23:49:10 san3 kernel: WARNING: 10.252.28.201 (iqn.2002-07.au.com.company:xen-host01): no ping reply (NOP-Out) after 5 seconds; dropping connection
Sep 12 23:49:10 san3 kernel: WARNING: 10.252.28.104 (iqn.2002-07.au.com.company:barhost4): no ping reply (NOP-Out) after 5 seconds; dropping connection
Sep 12 23:49:10 san3 kernel: WARNING: 10.252.38.202 (iqn.2002-07.au.com.company:xen-host02): no ping reply (NOP-Out) after 5 seconds; dropping connection
Sep 12 23:49:12 san3 kernel: WARNING: 10.252.28.12 (iqn.2002-07.au.com.company:vmware-host02): no ping reply (NOP-Out) after 5 seconds; dropping connection
Sep 12 23:49:23 san3 kernel: WARNING: 10.252.38.202 (iqn.2002-07.au.com.company:xen-host02): no ping reply (NOP-Out) after 5 seconds; dropping connection
Sep 12 23:49:32 san3 kernel: WARNING: 10.252.28.201 (iqn.2002-07.au.com.company:xen-host01): no ping reply (NOP-Out) after 5 seconds; dropping connection
Sep 12 23:49:32 san3 kernel: WARNING: 10.252.28.11 (iqn.2002-07.au.com.company:vmware-host01): no ping reply (NOP-Out) after 5 seconds; dropping connection
Sep 12 23:49:45 san3 kernel: WARNING: 10.252.28.101 (iqn.2002-07.au.com.company:barhost1): no ping reply (NOP-Out) after 5 seconds; dropping connection
Sep 12 23:49:45 san3 kernel: WARNING: 10.252.28.201 (iqn.2002-07.au.com.company:xen-host01): no ping reply (NOP-Out) after 5 seconds; dropping connection
Sep 12 23:49:46 san3 kernel: WARNING: 10.252.28.102 (iqn.2002-07.au.com.company:barhost2): no ping reply (NOP-Out) after 5 seconds; dropping connection
Sep 12 23:49:53 san3 kernel: WARNING: 10.252.28.13 (iqn.2002-07.au.com.company:vmware-host03-temp): no ping reply (NOP-Out) after 5 seconds; dropping connection
Sep 12 23:49:54 san3 kernel: WARNING: 10.252.28.201 (iqn.2002-07.au.com.company:xen-host01): no ping reply (NOP-Out) after 5 seconds; dropping connection
Sep 12 23:49:57 san3 kernel: WARNING: 10.252.38.202 (iqn.2002-07.au.com.company:xen-host02): no ping reply (NOP-Out) after 5 seconds; dropping connection
Sep 12 23:49:58 san3 kernel: WARNING: 10.252.28.16 (iqn.2013-07.com.example:d86434b0): no ping reply (NOP-Out) after 5 seconds; dropping connection
Sep 12 23:49:58 san3 kernel: WARNING: 10.252.28.11 (iqn.2002-07.au.com.company:vmware-host01): no ping reply (NOP-Out) after 5 seconds; dropping connection
Sep 12 23:50:06 san3 kernel: WARNING: 10.252.28.201 (iqn.2002-07.au.com.company:xen-host01): no ping reply (NOP-Out) after 5 seconds; dropping connection
Sep 12 23:50:26 san3 kernel: WARNING: 10.252.28.104 (iqn.2002-07.au.com.company:barhost4): no ping reply (NOP-Out) after 5 seconds; dropping connection
Sep 12 23:50:31 san3 kernel: WARNING: 10.252.28.201 (iqn.2002-07.au.com.company:xen-host01): no ping reply (NOP-Out) after 5 seconds; dropping connection
Sep 12 23:50:45 san3 kernel: WARNING: 10.252.38.13 (iqn.2002-07.au.com.company:vmware-host03-temp): no ping reply (NOP-Out) after 5 seconds; dropping connection
Sep 12 23:50:51 san3 kernel: WARNING: 10.252.28.201 (iqn.2002-07.au.com.company:xen-host01): no ping reply (NOP-Out) after 5 seconds; dropping connection
Sep 12 23:52:33 san3 kernel: pid 71395 (ntpd), uid 0, was killed: out of swap space
Sep 12 23:56:30 san3 kernel: ctl_datamove: tag 0x2c70e53 on (5:3:1) aborted
Sep 12 23:56:30 san3 kernel: ctl_datamove: tag 0x2c625ec on (9:3:2) aborted
Sep 12 23:56:30 san3 kernel: ctl_datamove: tag 0x1d8a74c on (11:3:0) aborted
Sep 12 23:56:30 san3 kernel: ctl_datamove: tag 0x2c70e54 on (5:3:1) aborted
Sep 12 23:56:30 san3 kernel: ctl_datamove: tag 0x2c70e55 on (5:3:1) aborted
Sep 12 23:56:30 san3 kernel: ctl_datamove: tag 0x2c70e56 on (5:3:1) aborted
Sep 12 23:56:30 san3 kernel: ctl_datamove: tag 0x1d8cd7d on (10:3:1) aborted
Sep 12 23:56:30 san3 kernel: ctl_datamove: tag 0x2c625ed on (9:3:2) aborted
Sep 12 23:56:30 san3 kernel: ctl_datamove: tag 0x2c625ee on (9:3:2) aborted
Sep 12 23:56:30 san3 kernel: ctl_datamove: tag 0x2c625ef on (9:3:2) aborted
Sep 12 23:56:30 san3 kernel: ctl_datamove: tag 0x2c625f0 on (9:3:2) aborted
Sep 12 23:56:30 san3 kernel: ctl_datamove: tag 0x2c625f1 on (9:3:2) aborted
Sep 12 23:56:30 san3 kernel: ctl_datamove: tag 0x2c625f2 on (9:3:2) aborted
Sep 12 23:56:30 san3 kernel: ctl_datamove: tag 0x2c625f3 on (9:3:2) aborted
Sep 12 23:56:30 san3 kernel: ctl_datamove: tag 0x1d8a756 on (11:3:2) aborted
Sep 12 23:56:30 san3 kernel: ctl_datamove: tag 0x1d8a757 on (11:3:2) aborted
Sep 12 23:56:30 san3 kernel: ctl_datamove: tag 0x1d8a758 on (11:3:2) aborted
Sep 12 23:56:30 san3 kernel: ctl_datamove: tag 0x2c625f4 on (9:3:2) aborted
Sep 12 23:56:30 san3 kernel: ctl_datamove: tag 0x2c625f5 on (9:3:2) aborted
Sep 12 23:56:30 san3 kernel: ctl_datamove: tag 0x2c625f6 on (9:3:2) aborted
Sep 12 23:56:30 san3 kernel: ctl_datamove: tag 0x2c625f7 on (9:3:2) aborted
Sep 12 23:56:30 san3 kernel: ctl_datamove: tag 0x2c625f8 on (9:3:2) aborted
Sep 12 23:56:30 san3 kernel: ctl_datamove: tag 0x2c625f9 on (9:3:2) aborted
Sep 12 23:56:30 san3 kernel: ctl_datamove: tag 0x1d8a759 on (11:3:2) aborted
Sep 12 23:56:30 san3 kernel: ctl_datamove: tag 0x1d8a761 on (11:3:2) aborted
Sep 12 23:56:30 san3 kernel: ctl_datamove: tag 0x1d8a74e on (11:3:0) aborted
Sep 12 23:56:30 san3 kernel: ctl_datamove: tag 0x1d8a74f on (11:3:0) aborted
Sep 12 23:56:30 san3 kernel: ctl_datamove: tag 0x1d8a750 on (11:3:0) aborted
Sep 12 23:56:30 san3 kernel: ctl_datamove: tag 0x1d8a751 on (11:3:0) aborted
Sep 12 23:56:30 san3 kernel: ctl_datamove: tag 0x1d8a752 on (11:3:0) aborted
Sep 12 23:56:30 san3 kernel: ctl_datamove: tag 0x1d8a753 on (11:3:0) aborted
Sep 12 23:56:30 san3 kernel: ctl_datamove: tag 0x1d8a754 on (11:3:0) aborted
Sep 12 23:56:30 san3 kernel: ctl_datamove: tag 0x2c625e4 on (9:3:0) aborted
Sep 12 23:56:30 san3 kernel: ctl_datamove: tag 0x2c625e6 on (9:3:0) aborted
Sep 12 23:56:30 san3 kernel: ctl_datamove: tag 0x2c625e7 on (9:3:0) aborted
Sep 12 23:56:30 san3 kernel: ctl_datamove: tag 0x1d8a75a on (11:3:0) aborted
Sep 12 23:56:30 san3 kernel: ctl_datamove: tag 0x1d8a75b on (11:3:0) aborted
Sep 12 23:56:30 san3 kernel: ctl_datamove: tag 0x1d8a75c on (11:3:0) aborted
Sep 12 23:56:46 san3 kernel: ctl_datamove: tag 0x2c626f5 on (9:3:2) aborted
Sep 12 23:56:46 san3 kernel: ctl_datamove: tag 0x2c626f7 on (9:3:2) aborted
Sep 12 23:56:46 san3 kernel: ctl_datamove: tag 0x2c626f8 on (9:3:2) aborted
Sep 12 23:56:46 san3 kernel: ctl_datamove: tag 0x1d8cdb7 on (10:3:0) aborted
Sep 12 23:56:46 san3 kernel: ctl_datamove: tag 0x2c626f9 on (9:3:2) aborted
Sep 12 23:56:46 san3 kernel: ctl_datamove: tag 0x1d8cdb8 on (10:3:0) aborted
Sep 12 23:56:46 san3 kernel: ctl_datamove: tag 0x2c626fa on (9:3:2) aborted
Sep 12 23:56:46 san3 kernel: ctl_datamove: tag 0x1d8cdb9 on (10:3:0) aborted
Sep 12 23:56:46 san3 kernel: ctl_datamove: tag 0x1d8a7b9 on (11:3:2) aborted
Sep 12 23:56:46 san3 kernel: ctl_datamove: tag 0x1d8cdba on (10:3:0) aborted
Sep 12 23:56:46 san3 kernel: ctl_datamove: tag 0x1d8a7ba on (11:3:2) aborted
Sep 12 23:56:46 san3 kernel: ctl_datamove: tag 0x2c626ef on (9:3:0) aborted
Sep 12 23:56:46 san3 kernel: ctl_datamove: tag 0x1d8cdbb on (10:3:0) aborted
Sep 12 23:56:46 san3 kernel: ctl_datamove: tag 0x2c626f6 on (9:3:0) aborted
Sep 12 23:56:46 san3 kernel: ctl_datamove: tag 0x1d8cdbe on (10:3:0) aborted
Sep 12 23:56:46 san3 kernel: ctl_datamove: tag 0x2c626fd on (9:3:0) aborted
Sep 12 23:56:46 san3 kernel: ctl_datamove: tag 0x2c626fb on (9:3:2) aborted
Sep 12 23:56:46 san3 kernel: ctl_datamove: tag 0x1d8cdc2 on (10:3:0) aborted
Sep 12 23:56:46 san3 kernel: ctl_datamove: tag 0x1d8cdc1 on (10:3:0) aborted
Sep 12 23:56:46 san3 kernel: ctl_datamove: tag 0x1d8cdc3 on (10:3:0) aborted
Sep 12 23:56:46 san3 kernel: ctl_datamove: tag 0x2c626fc on (9:3:2) aborted
Sep 12 23:56:46 san3 kernel: ctl_datamove: tag 0x2c62706 on (9:3:2) aborted
Sep 12 23:56:46 san3 kernel: ctl_datamove: tag 0x2c62700 on (9:3:0) aborted
Sep 12 23:56:46 san3 kernel: ctl_datamove: tag 0x2c626fe on (9:3:0) aborted
Sep 12 23:56:46 san3 kernel: ctl_datamove: tag 0x2c626ff on (9:3:0) aborted
Sep 12 23:56:46 san3 kernel: ctl_datamove: tag 0x2c62701 on (9:3:0) aborted
Sep 12 23:57:59 san3 kernel: pid 73756 (sshd), uid 0, was killed: out of swap space


Code:
# ifconfig -m
igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO>
        capabilities=507bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO>
        ether 00:25:90:3b:3f:c2
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        supported media:
                media autoselect
                media 1000baseT
                media 1000baseT mediaopt full-duplex
                media 100baseTX mediaopt full-duplex
                media 100baseTX
                media 10baseT/UTP mediaopt full-duplex
                media 10baseT/UTP
igb1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9014
        options=403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO>
        capabilities=507bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO>
        ether 00:25:90:3b:3f:c3
        inet 10.252.38.3 netmask 0xffffff00 broadcast 10.252.38.255
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        supported media:
                media autoselect
                media 1000baseT
                media 1000baseT mediaopt full-duplex
                media 100baseTX mediaopt full-duplex
                media 100baseTX
                media 10baseT/UTP mediaopt full-duplex
                media 10baseT/UTP
igb2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9014
        options=403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO>
        capabilities=507bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO>
        ether 00:1b:21:bb:2b:24
        inet 10.252.28.3 netmask 0xffffff00 broadcast 10.252.28.255
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        supported media:
                media autoselect
                media 1000baseT
                media 1000baseT mediaopt full-duplex
                media 100baseTX mediaopt full-duplex
                media 100baseTX
                media 10baseT/UTP mediaopt full-duplex
                media 10baseT/UTP
igb3: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO>
        capabilities=507bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO>
        ether 00:25:90:3b:3f:c2
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        supported media:
                media autoselect
                media 1000baseT
                media 1000baseT mediaopt full-duplex
                media 100baseTX mediaopt full-duplex
                media 100baseTX
                media 10baseT/UTP mediaopt full-duplex
                media 10baseT/UTP
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
        capabilities=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x5
        inet 127.0.0.1 netmask 0xff000000
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO>
        capabilities=507bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO>
        ether 00:25:90:3b:3f:c2
        inet 10.252.18.3 netmask 0xffffff00 broadcast 10.252.18.255
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect
        status: active
        supported media:
                media autoselect
        laggproto lacp lagghash l2,l3,l4
        laggport: igb0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
        laggport: igb3 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>


Code:
extract /etc/ctl.conf
portal-group pg0 {
        discovery-auth-group no-authentication

        discovery-filter "portal-name"
        listen 10.252.28.3:3260
        listen 10.252.38.3:3260
        }
auth-group vmware {
        auth-type none
        initiator-name "iqn.2002-07.au.com.company:vmware-host01"
        initiator-name "iqn.2002-07.au.com.company:vmware-host02"
        initiator-name "iqn.2002-07.au.com.company:vmware-host03"
        initiator-name "iqn.2002-07.au.com.company:vmware-host03-temp"
        }
target iqn.2002-07.au.com.company:san3.vmware {
        alias "VMWare"
        portal-group pg0

        auth-group vmware

        lun 0 {
                serial "3000000111"
                device-id "san3__tank1__vol_vmware01"
                path /dev/zvol/tank1/vol_vmware01
                blocksize 512
                size 4T
                option unmap on
                option vendor "FreeBSD"
                option product "iSCSI Disk"
                option revision "0123"
                option naa 0x3cdc137065d89be7
                }
        lun 1 {
                serial "3000000112"
                device-id "san3__tank1__vol_vmware02"
                path /dev/zvol/tank1/vol_vmware02
                blocksize 512
                size 2T
                option unmap on
                option vendor "FreeBSD"
                option product "iSCSI Disk"
                option revision "0123"
                option naa 0x75e791391626356b
                }
        lun 2 {
                serial "3000000113"
                device-id "san3__tank1__vol_vmware03"
                path /dev/zvol/tank1/vol_vmware03
                blocksize 512
                size 2T
                option unmap on
                option vendor "FreeBSD"
                option product "iSCSI Disk"
                option revision "0123"
                option naa 0x156fa47af070775a
                }
        }
 
Last edited:
D

dlavigne

Guest
Were you able to figure this out? If not, which build version of FreeNAS (from System -> Information)?
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
What version of FreeNAS are you using?

How full is your iSCSI zvol? iSCSI performance craters if your utilization percentage gets too high (>50-60%, depending).

How long has this machine been in service? I ask because using jumbo packets can be a problem, as noted in "The Jumbo Notes" thread. If you've been using jumbo packets for some time, then they're less likely to be the source of the problem. But if this is a new build, you might try dropping back to a standard MTU.
 

yanky83

Dabbler
Joined
Mar 1, 2016
Messages
15
Thanks for the interest.

I actually forgot to mention, its not actually FreeNAS, its plain FreeBSD 11.3 - sorry.

No zvols are over 60%, and the pool being at 32%

Had seen the jumbo thread. All devises along the path do jumbo (ping checked), but I wouldnt be surprised if they would cause some unexpected buffer/memory issues somewhere within the system.

The unpleasant thing is, that it happens only every now and then, assuming after a period of high consistent load. Most of the time its relatively fine.

Thanks a lot!
 

yanky83

Dabbler
Joined
Mar 1, 2016
Messages
15
I couldnt validate it being the cause, but it smells like it might be something. 11 is around the corner - lets hope it gets fixed.

Thanks for the hint.
 
Status
Not open for further replies.
Top