A newly installed FreeNAS froze and barely could be restarted.
This is used as nfs-server for 2 vmware hosts, uses only nfs
as sharing, i small number of filesystems. Disk total is 29TB
The sequence as we experienced started monday morning.
A migration of a number of vmware hosts was started (from another
fileserver), this caused some load.
During this transer the descision was taken to remove 2 large files
that was created "to have something to scrub on". In total 15TB.
These files was removed by 'rm BIG BIG2' from a shell.
The 'rm' did not immediatly returned to a shell prompt, which did
not concerned me as i could continue with other tasks.
Some minutes later vmware hosts began to fail.
The shell worked , the gui worked but was very slow.
Trying to mount a filesystem from another machine failed :
( fs01 10.220.55.18 is the failing server )
[root@fs02] ~# mount 10.220.55.18:/mnt/vol0/vmware-vmdk02 /tmp/fs01
[tcp] 10.220.55.18:/mnt/vol0/vmware-vmdk02: NFSPROC_NULL: RPC: Unable to receive; errno = Connection reset by peer
^C
[root@fs02] ~# mount 10.220.55.18:/mnt/vol0/vmware-vmdk02 /tmp/fs01
[tcp] 10.220.55.18:/mnt/vol0/vmware-vmdk02: RPCPROG_NFS: RPC: Program not registered
From a working GUI we tried to stp/start NFS services, it never
completed. A reboot from the console was attempted.
This came to syncing disks but never reboot. Hard reset was done
after several minutes.
After boot it hang on 'Mounting local file systems:
^T showed
load 1.1 cmd zpool: 884 [zio->io_cvl]
for a while, then
load 3.1 cmd zpool: 884 [tx->tx_sync_done_cv]
which after a long wait stopped increasing used process time. At that point
we did control-alt-del and the box rebooted .
Next boot also showed a long time with [zio->io_cvl]
but later it became [spa->spa_scrub_in_cv] and finally the system came alive.
The large files were gone, but everything else seemd to be correct.
What happened ? And will it come back ?
Particulars: this is a FreeNAS 9.3
System Information
Hostname fs01.vtd.volvo.se
Build FreeNAS-9.3-STABLE-201503071634
Platform Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz
Memory 65476MB
hardware is supermicro 4U ZFS Server Xeon E5-2600
CPU: Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz (2100.04-MHz K8-class CPU)
FreeBSD/SMP: Multiprocessor System Detected: 24 CPUs
FreeBSD/SMP: 2 package(s) x 6 core(s) x 2 SMT threads
igb0: <Intel(R) PRO/1000 Network Connection version - 2.4.0>
igb1: <Intel(R) PRO/1000 Network Connection version - 2.4.0>
igb2: <Intel(R) PRO/1000 Network Connection version - 2.4.0>
igb3: <Intel(R) PRO/1000 Network Connection version - 2.4.0>
ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.15>
ahci0: <Intel Patsburg AHCI SATA controller>
mps2: <LSI SAS2308>
mps2: Firmware: 20.00.00.00, Driver: 16.00.00.00-fbsd
mps2: IOCCapabilities: 5285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
Ada0 at mps0 bus 0 scbus0 target 0 lun 0
da0: <SEAGATE ST3000NM0023 0004> Fixed Direct Access SCSI-6 device
da0: Serial Number Z1Z4QYS40000C451AW6G
da0: 600.000MB/s transfers
da0: Command Queueing enabled
da0: 2861588MB (5860533168 512 byte sectors: 255H 63S/T 364801C)
repeat to da13
booted from :
da15 at umass-sim0 bus 0 scbus11 target 0 lun 0
da15: <Kingston DataTraveler 3.0 PMAP> Removable Direct Access SCSI-6 device
da15: Serial Number 08606E6D414CBEA1470C93FD
da15: 40.000MB/s transfers
da15: 30008MB (61457664 512 byte sectors: 255H 63S/T 3825C)
da15: quirks=0x2<NO_6_BYTE>
which is a mirror with :
da16 at umass-sim1 bus 1 scbus12 target 0 lun 0
da16: <Kingston DataTraveler 3.0 PMAP> Removable Direct Access SCSI-6 device
da16: Serial Number 08606E6D418ABEA14717F4DE
da16: 40.000MB/s transfers
da16: 30008MB (61457664 512 byte sectors: 255H 63S/T 3825C)
da16: quirks=0x2<NO_6_BYTE>
log & sil on :
ada0 at ahcich0 bus 0 scbus3 target 0 lun 0
ada0: <INTEL SSDSC2BA100G3 5DV10270> ATA-9 SATA 3.x device
ada0: Serial Number BTTV434403FP100FGN
ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes)
ada0: Command Queueing enabled
ada0: 95396MB (195371568 512 byte sectors: 16H 63S/T 16383C)
ada0: Previously was known as ad4
ada1 at ahcich1 bus 0 scbus4 target 0 lun 0
ada1: <SAMSUNG MZ7WD960HMHP-00003 DXV8C03Q> ATA-9 SATA 3.x device
ada1: Serial Number S1E4NYAFA01902
ada1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 915715MB (1875385008 512 byte sectors: 16H 63S/T 16383C)
ada1: Previously was known as ad6
This is used as nfs-server for 2 vmware hosts, uses only nfs
as sharing, i small number of filesystems. Disk total is 29TB
The sequence as we experienced started monday morning.
A migration of a number of vmware hosts was started (from another
fileserver), this caused some load.
During this transer the descision was taken to remove 2 large files
that was created "to have something to scrub on". In total 15TB.
These files was removed by 'rm BIG BIG2' from a shell.
The 'rm' did not immediatly returned to a shell prompt, which did
not concerned me as i could continue with other tasks.
Some minutes later vmware hosts began to fail.
The shell worked , the gui worked but was very slow.
Trying to mount a filesystem from another machine failed :
( fs01 10.220.55.18 is the failing server )
[root@fs02] ~# mount 10.220.55.18:/mnt/vol0/vmware-vmdk02 /tmp/fs01
[tcp] 10.220.55.18:/mnt/vol0/vmware-vmdk02: NFSPROC_NULL: RPC: Unable to receive; errno = Connection reset by peer
^C
[root@fs02] ~# mount 10.220.55.18:/mnt/vol0/vmware-vmdk02 /tmp/fs01
[tcp] 10.220.55.18:/mnt/vol0/vmware-vmdk02: RPCPROG_NFS: RPC: Program not registered
From a working GUI we tried to stp/start NFS services, it never
completed. A reboot from the console was attempted.
This came to syncing disks but never reboot. Hard reset was done
after several minutes.
After boot it hang on 'Mounting local file systems:
^T showed
load 1.1 cmd zpool: 884 [zio->io_cvl]
for a while, then
load 3.1 cmd zpool: 884 [tx->tx_sync_done_cv]
which after a long wait stopped increasing used process time. At that point
we did control-alt-del and the box rebooted .
Next boot also showed a long time with [zio->io_cvl]
but later it became [spa->spa_scrub_in_cv] and finally the system came alive.
The large files were gone, but everything else seemd to be correct.
What happened ? And will it come back ?
Particulars: this is a FreeNAS 9.3
System Information
Hostname fs01.vtd.volvo.se
Build FreeNAS-9.3-STABLE-201503071634
Platform Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz
Memory 65476MB
hardware is supermicro 4U ZFS Server Xeon E5-2600
CPU: Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz (2100.04-MHz K8-class CPU)
FreeBSD/SMP: Multiprocessor System Detected: 24 CPUs
FreeBSD/SMP: 2 package(s) x 6 core(s) x 2 SMT threads
igb0: <Intel(R) PRO/1000 Network Connection version - 2.4.0>
igb1: <Intel(R) PRO/1000 Network Connection version - 2.4.0>
igb2: <Intel(R) PRO/1000 Network Connection version - 2.4.0>
igb3: <Intel(R) PRO/1000 Network Connection version - 2.4.0>
ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.15>
ahci0: <Intel Patsburg AHCI SATA controller>
mps2: <LSI SAS2308>
mps2: Firmware: 20.00.00.00, Driver: 16.00.00.00-fbsd
mps2: IOCCapabilities: 5285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
Ada0 at mps0 bus 0 scbus0 target 0 lun 0
da0: <SEAGATE ST3000NM0023 0004> Fixed Direct Access SCSI-6 device
da0: Serial Number Z1Z4QYS40000C451AW6G
da0: 600.000MB/s transfers
da0: Command Queueing enabled
da0: 2861588MB (5860533168 512 byte sectors: 255H 63S/T 364801C)
repeat to da13
booted from :
da15 at umass-sim0 bus 0 scbus11 target 0 lun 0
da15: <Kingston DataTraveler 3.0 PMAP> Removable Direct Access SCSI-6 device
da15: Serial Number 08606E6D414CBEA1470C93FD
da15: 40.000MB/s transfers
da15: 30008MB (61457664 512 byte sectors: 255H 63S/T 3825C)
da15: quirks=0x2<NO_6_BYTE>
which is a mirror with :
da16 at umass-sim1 bus 1 scbus12 target 0 lun 0
da16: <Kingston DataTraveler 3.0 PMAP> Removable Direct Access SCSI-6 device
da16: Serial Number 08606E6D418ABEA14717F4DE
da16: 40.000MB/s transfers
da16: 30008MB (61457664 512 byte sectors: 255H 63S/T 3825C)
da16: quirks=0x2<NO_6_BYTE>
log & sil on :
ada0 at ahcich0 bus 0 scbus3 target 0 lun 0
ada0: <INTEL SSDSC2BA100G3 5DV10270> ATA-9 SATA 3.x device
ada0: Serial Number BTTV434403FP100FGN
ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes)
ada0: Command Queueing enabled
ada0: 95396MB (195371568 512 byte sectors: 16H 63S/T 16383C)
ada0: Previously was known as ad4
ada1 at ahcich1 bus 0 scbus4 target 0 lun 0
ada1: <SAMSUNG MZ7WD960HMHP-00003 DXV8C03Q> ATA-9 SATA 3.x device
ada1: Serial Number S1E4NYAFA01902
ada1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 915715MB (1875385008 512 byte sectors: 16H 63S/T 16383C)
ada1: Previously was known as ad6
Last edited by a moderator: