zfs userspace takes too long to be executed

Joined
Mar 12, 2024
Messages
4
Hi All, I'm Carlos, and I'm having issues with one of my TrueNas systems, where execute a simple 'zfs userspace' command takes more than 1 minute.

Here is the the information about my server:

uname -a Linux bblhome 6.1.63-production+truenas #2 SMP PREEMPT_DYNAMIC Mon Dec 18 19:34:42 UTC 2023 x86_64 GNU/Linux

# lsb_release -a No LSB modules are available. Distributor ID: Debian Description: Debian GNU/Linux 12 (bookworm) Release: 12 Codename: bookworm

# zpool status pool: boot-pool state: ONLINE scan: scrub repaired 0B in 00:00:25 with 0 errors on Mon Mar 11 03:45:28 2024 config: NAME STATE READ WRITE CKSUM boot-pool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 sdb3 ONLINE 0 0 0 sda3 ONLINE 0 0 0 errors: No known data errors pool: home state: ONLINE scan: scrub repaired 0B in 00:06:40 with 0 errors on Sun Feb 18 00:06:42 2024 config: NAME STATE READ WRITE CKSUM home ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ce4b235c-1f08-49c2-a981-b81819e11da0 ONLINE 0 0 0 ec3d84f1-f008-4b9a-99a6-c905a82008fe ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 e0a20c4e-41cb-4829-9f8d-4a876fa28ac2 ONLINE 0 0 0 1a4be869-d567-43ab-88fe-e676e1bdef5d ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 b4cd97d3-914b-4668-9bce-9cfd60dc379a ONLINE 0 0 0 69c66f0c-f9a9-4caa-936e-8002a1e5cb19 ONLINE 0 0 0 errors: No known data errors

And this is the result for 'time' on 'zfs userspace pool'

... zfs userspace home 0.03s user 0.05s system 0% cpu 2:44.40 total

I has the LDAP directory service enabled on this system and the pool is shared over NFS.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
How large is your pool, what devices are you using, what are the specifications of the rest of the system, ...?
 
Joined
Mar 12, 2024
Messages
4
Hi Eric, thanks for your response:

The server is a Dell Inc. PowerEdge R630, with Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz 128GB ram (DDR4) and 10Gb Ehernet.

The pools:
Code:
NAME        SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
boot-pool   556G  2.43G   554G        -         -     0%     0%  1.00x    ONLINE  -
home       10.9T   264G  10.6T        -         -     3%     2%  1.00x    ONLINE  /mnt


The disk setup:
Code:
# lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
sda           8:0    0 558.8G  0 disk
├─sda1        8:1    0     1M  0 part
├─sda2        8:2    0   512M  0 part
└─sda3        8:3    0 558.3G  0 part
sdb           8:16   0 558.8G  0 disk
├─sdb1        8:17   0     1M  0 part
├─sdb2        8:18   0   512M  0 part
└─sdb3        8:19   0 558.3G  0 part
sdc           8:32   0   3.6T  0 disk
├─sdc1        8:33   0     2G  0 part
│ └─md127     9:127  0     2G  0 raid1
│   └─md127 253:2    0     2G  0 crypt [SWAP]
└─sdc2        8:34   0   3.6T  0 part
sdd           8:48   0   3.6T  0 disk
├─sdd1        8:49   0     2G  0 part
│ └─md126     9:126  0     2G  0 raid1
│   └─md126 253:1    0     2G  0 crypt [SWAP]
└─sdd2        8:50   0   3.6T  0 part
sde           8:64   0   3.6T  0 disk
├─sde1        8:65   0     2G  0 part
│ └─md126     9:126  0     2G  0 raid1
│   └─md126 253:1    0     2G  0 crypt [SWAP]
└─sde2        8:66   0   3.6T  0 part
sdf           8:80   0   3.6T  0 disk
├─sdf1        8:81   0     2G  0 part
│ └─md127     9:127  0     2G  0 raid1
│   └─md127 253:2    0     2G  0 crypt [SWAP]
└─sdf2        8:82   0   3.6T  0 part
sdg           8:96   0   3.6T  0 disk
├─sdg1        8:97   0     2G  0 part
│ └─md125     9:125  0     2G  0 raid1
│   └─md125 253:0    0     2G  0 crypt [SWAP]
└─sdg2        8:98   0   3.6T  0 part
sdh           8:112  0   3.6T  0 disk
├─sdh1        8:113  0     2G  0 part
│ └─md125     9:125  0     2G  0 raid1
│   └─md125 253:0    0     2G  0 crypt [SWAP]
└─sdh2        8:114  0   3.6T  0 part


For OS we are using 2 HDD 500GB HUC10906_CLAR600, and the rest of the disks are Samsung_SSD_870_EVO_4TB

Anything else missing?
 
Joined
Mar 12, 2024
Messages
4
Code:
...
00:11.4 SATA controller: Intel Corporation C610/X99 series chipset sSATA Controller [AHCI mode] (rev 05)
...
00:1f.2 SATA controller: Intel Corporation C610/X99 series chipset 6-Port SATA Controller [AHCI mode] (rev 05)
...
02:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS-3 3108 [Invader] (rev 02)
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Samsung_SSD_870_EVO_4TB
Watch out for very early failures on those things. Supposedly fixed in recent units, but it's one of those things where it's hard to trust them. Look for excessive NAND writes. And be sure to scrub often.

02:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS-3 3108 [Invader] (rev 02)
Are you running an H730P Mini? Sure looks like you are, and it's the only red flag I see. The MegaRAID firmware is terrible at doing basic HBA things. Definitely replace it with an HBA330 Mini - and hopefully performance will improve.

This seems a bit worse than I'd have expected, but userspace is something of an expensive operation, so maybe it's just down to it being an extreme case combined with a crap SAS controller.
 
Top