Intermittently Unresponsive

tripodal · Oct 13, 2020

I've deployed a system using SMB to store and replicate ~30TB of backup data from one site to another. The system seems to get into a state where, after large datasets or snapshots are removed it grows increasingly unresponsive. The amount of time it spends freeing is unreasonable, as it's taking nearly an hour to 'free' <100gb. While it's in the midst of this, SMB goes frequently unresponsive and the system has hard locked twice now. I'm relatively new at diagnosing this, but this process previously worked on an older system with far less horse power and shucked USB drives. So i wonder if its a but in 11.3, and am considering downgrading to 11.2.

root@ken-zyw-nas02[~]# zpool list -o freeing
FREEING
0
877G

Version: FreeNAS-11.3-U5

Dell 720xd 192gb ram
14x 1tb Dell SSd
7x 2tb samsung SSD
Perc in HBA mode.

There is no L2ARC, nor a slog.
boot is 2x dell sas drives.

tripodal · Oct 13, 2020

Of course, shortly after posting this the freeing process has sped up dramatically.
FREEING
0
591G

tripodal · Oct 13, 2020

That task has finished, however SMB is repeatedly going offline, and gstat shows this; not sure what the pool is doing.

dT: 1.065s w: 1.000s
L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
64 61 0 0 0.0 0 0 0.0 131.0| da0
64 60 0 0 0.0 0 0 0.0 91.7| da1

[2020/10/13 10:57:41.852983, 0] ../../source3/smbd/server.c:1788(main)
smbd version 4.10.18 started.

[2020/10/13 11:08:34.816887, 0] ../../source3/smbd/server.c:1788(main)
smbd version 4.10.18 started.

tripodal · Oct 16, 2020

While I haven't found a solution an update.
Shortly after deleting a large snapshot. the pool becomes unresponsive to further IO.

If I run this command, it never returns

dd if=/dev/zero of=/mnt/backups/test.txt bs=32k count=1

Gstat shows this:

dT: 1.032s w: 1.000s
L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
64 63 0 0 0.0 0 0 0.0 164.7| da0
64 62 0 0 0.0 0 0 0.0 83.7| da1
64 63 0 0 0.0 0 0 0.0 167.3| da2
64 63 0 0 0.0 0 0 0.0 167.1| da3
64 123 0 0 0.0 0 0 0.0 152.3| da4
64 63 0 0 0.0 0 0 0.0 169.8| da5
64 123 0 0 0.0 0 0 0.0 171.0| da6
64 62 0 0 0.0 0 0 0.0 128.9| da7
64 62 0 0 0.0 0 0 0.0 82.6| da8
64 62 0 0 0.0 0 0 0.0 127.3| da9
64 62 0 0 0.0 0 0 0.0 126.2| da10
64 62 0 0 0.0 0 0 0.0 85.4| da11
64 123 0 0 0.0 0 0 0.0 169.9| da12
64 62 0 0 0.0 0 0 0.0 128.8| da13
64 62 0 0 0.0 0 0 0.0 84.9| da14
64 123 0 0 0.0 0 0 0.0 170.9| da15
64 123 0 0 0.0 0 0 0.0 172.0| da16
64 62 0 0 0.0 0 0 0.0 86.4| da17
64 62 0 0 0.0 0 0 0.0 127.2| da18
64 62 0 0 0.0 0 0 0.0 83.7| da19
64 62 0 0 0.0 0 0 0.0 86.4| da20

zpool list -o freeing has approximately 5tb to go; however both the webUI and teh SSH interface stop responding.

tripodal · Nov 16, 2020

I managed to work around this issue by using dataset encryption instead of pool encryption. performance is similar and snapshot cleanup happens in moments rather than hours.

Important Announcement for the TrueNAS Community.

Intermittently Unresponsive

tripodal

Dabbler

tripodal

Dabbler

tripodal

Dabbler

tripodal

Dabbler

tripodal

Dabbler

Similar threads

Important Announcement for the TrueNAS Community.

Intermittently Unresponsive

tripodal

Dabbler

tripodal

Dabbler

tripodal

Dabbler

tripodal

Dabbler

tripodal

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Intermittently Unresponsive"

Similar threads