Intermittently Unresponsive

tripodal

Dabbler
Joined
Oct 8, 2020
Messages
19
I've deployed a system using SMB to store and replicate ~30TB of backup data from one site to another. The system seems to get into a state where, after large datasets or snapshots are removed it grows increasingly unresponsive. The amount of time it spends freeing is unreasonable, as it's taking nearly an hour to 'free' <100gb. While it's in the midst of this, SMB goes frequently unresponsive and the system has hard locked twice now. I'm relatively new at diagnosing this, but this process previously worked on an older system with far less horse power and shucked USB drives. So i wonder if its a but in 11.3, and am considering downgrading to 11.2.

root@ken-zyw-nas02[~]# zpool list -o freeing
FREEING
0
877G


Version: FreeNAS-11.3-U5

Dell 720xd 192gb ram
14x 1tb Dell SSd
7x 2tb samsung SSD
Perc in HBA mode.

There is no L2ARC, nor a slog.
boot is 2x dell sas drives.
 

tripodal

Dabbler
Joined
Oct 8, 2020
Messages
19
Of course, shortly after posting this the freeing process has sped up dramatically.
FREEING
0
591G
 

tripodal

Dabbler
Joined
Oct 8, 2020
Messages
19
That task has finished, however SMB is repeatedly going offline, and gstat shows this; not sure what the pool is doing.

dT: 1.065s w: 1.000s
L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
64 61 0 0 0.0 0 0 0.0 131.0| da0
64 60 0 0 0.0 0 0 0.0 91.7| da1


[2020/10/13 10:57:41.852983, 0] ../../source3/smbd/server.c:1788(main)
smbd version 4.10.18 started.

[2020/10/13 11:08:34.816887, 0] ../../source3/smbd/server.c:1788(main)
smbd version 4.10.18 started.
 

tripodal

Dabbler
Joined
Oct 8, 2020
Messages
19
While I haven't found a solution an update.
Shortly after deleting a large snapshot. the pool becomes unresponsive to further IO.

If I run this command, it never returns

dd if=/dev/zero of=/mnt/backups/test.txt bs=32k count=1

Gstat shows this:

dT: 1.032s w: 1.000s
L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
64 63 0 0 0.0 0 0 0.0 164.7| da0
64 62 0 0 0.0 0 0 0.0 83.7| da1
64 63 0 0 0.0 0 0 0.0 167.3| da2
64 63 0 0 0.0 0 0 0.0 167.1| da3
64 123 0 0 0.0 0 0 0.0 152.3| da4
64 63 0 0 0.0 0 0 0.0 169.8| da5
64 123 0 0 0.0 0 0 0.0 171.0| da6
64 62 0 0 0.0 0 0 0.0 128.9| da7
64 62 0 0 0.0 0 0 0.0 82.6| da8
64 62 0 0 0.0 0 0 0.0 127.3| da9
64 62 0 0 0.0 0 0 0.0 126.2| da10
64 62 0 0 0.0 0 0 0.0 85.4| da11
64 123 0 0 0.0 0 0 0.0 169.9| da12
64 62 0 0 0.0 0 0 0.0 128.8| da13
64 62 0 0 0.0 0 0 0.0 84.9| da14
64 123 0 0 0.0 0 0 0.0 170.9| da15
64 123 0 0 0.0 0 0 0.0 172.0| da16
64 62 0 0 0.0 0 0 0.0 86.4| da17
64 62 0 0 0.0 0 0 0.0 127.2| da18
64 62 0 0 0.0 0 0 0.0 83.7| da19
64 62 0 0 0.0 0 0 0.0 86.4| da20

zpool list -o freeing has approximately 5tb to go; however both the webUI and teh SSH interface stop responding.
 

tripodal

Dabbler
Joined
Oct 8, 2020
Messages
19
I managed to work around this issue by using dataset encryption instead of pool encryption. performance is similar and snapshot cleanup happens in moments rather than hours.
 
Top