I wonder why mdadm for SWAP instead of zfs

JOduMonT

Dabbler
Joined
Jan 27, 2015
Messages
29
Out of curiosity; I was wondering why the boot of TrueNAS SCALE 21.08-BETA.1 was slow; then I discovered it rebuilt a soft-raid via mdadm at every boot.
from what I understood it is the SWAP.

So; Why TrueNAS is using mdadm ?
since zfs could do everything mdadm could do

and Why this soft-raid is not define in /etc/mdadm/mdadm.conf
# mdadm.conf
#
# !NB! Run update-initramfs -u after updating this file.
# !NB! This will ensure that initramfs has an uptodate copy.
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default (built-in), scan all partitions (/proc/partitions) and all
# containers for MD superblocks. alternatively, specify devices to scan, using
# wildcards if desired.
#DEVICE partitions containers

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# instruct the monitoring daemon where to send mail alerts
MAILADDR root

# This configuration was auto-generated on Mon, 30 Aug 2021 23:23:38 +0000 by mkconf

When I installed my system with 1 NVME + 2 SSD of 256GB; the soft-raid was pointing at the 2 SSD
then later I plugged in 2 USB of 1TB for the purpose of testing importing and exporting encrypted zpool and data from TN Scale to my Ubuntu Laptop

and at every boot (on the NT SCALE) it seems to rebuild, or at least seeking for, the soft-raid randomly
and now my mdadm is a mix of 1 SSD and 1 USB (my USB drive was plugged during the last boot)
NOTE: both zpool the SSD one and the USB one as been created via the webui TN interface.
Model: ATA Samsung SSD 860 (scsi)
Disk /dev/sda: 256GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number Start End Size File system Name Flags
1 65.5kB 2147MB 2147MB swap
2 2148MB 256GB 254GB zfs


Model: ATA Samsung SSD 860 (scsi)
Disk /dev/sdb: 256GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number Start End Size File system Name Flags
1 65.5kB 2147MB 2147MB swap
2 2148MB 256GB 254GB zfs


Model: WD My Passport 0748 (scsi)
Disk /dev/sdc: 1000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number Start End Size File system Name Flags
1 65.5kB 2147MB 2147MB swap
2 2148MB 1000GB 998GB zfs


Model: Linux device-mapper (crypt) (dm)
Disk /dev/mapper/md127: 2147MB
Sector size (logical/physical): 512B/512B
Partition Table: loop
Disk Flags:

Number Start End Size File system Flags
1 0.00B 2147MB 2147MB linux-swap(v1)


Model: WDC WDS256G1X0C-00ENX0 (nvme)
Disk /dev/nvme0n1: 256GB
Sector size (logical/physical): 512B/512B

Partition Table: gpt
Disk Flags:

Number Start End Size File system Name Flags
1 20.5kB 1069kB 1049kB bios_grub
2 1069kB 538MB 537MB fat32 boot, esp
4 538MB 17.7GB 17.2GB swap
3 17.7GB 256GB 238GB zfs


Model: Linux Software RAID Array (md)
Disk /dev/md127: 2147MB
Sector size (logical/physical): 512B/512B
Partition Table: unknown
Disk Flags:

Model: Unknown (unknown)
Disk /dev/zd0: 42.9GB
Sector size (logical/physical): 512B/16384B
Partition Table: gpt
Disk Flags:

Number Start End Size File system Name Flags
1 1049kB 106MB 105MB fat32 EFI system partition boot, esp
2 106MB 123MB 16.8MB Microsoft reserved partition msftres
3 123MB 42.4GB 42.3GB ntfs Basic data partition msftdata
4 42.4GB 42.9GB 533MB ntfs hidden, diag
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,176
Easy answer: Swapping to ZFS is not reliable and introduces headache-inducing dependencies that tend to slow things down to a crawl.
Most nice ZFS features would go unused anyway, so there's little point in bothering.
 

JOduMonT

Dabbler
Joined
Jan 27, 2015
Messages
29
@Ericloewe thanks for your nice answer which explain very well my 1st question and by that I understand the swap is made dynamically at every boot ? if yes how I could force this one to only use my SSD ?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,176
Destroying the swap partition on devices you do not want to have backing your swap hould do the trick.
 

JOduMonT

Dabbler
Joined
Jan 27, 2015
Messages
29
I don't know if this still the reality and applicable in the context of kubernetes on TrueNAS,
but apparently swap should be disabled for kubernetes specially when it becomes a cluster because it has a hard time to manage the pod with SWAP.

What do you think of this ?
Does ix-system is aware of that ?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,176
I don't know, but it seems that "alpha support" has been added. Of course, I don't know what that entails.

Is some sort of limit on physical RAM available to containers a thing? If so, that would be a practical solution and more useful than getting rid of swap. Note that swap is there mostly for weird edge cases, not for any serious use.
 

JOduMonT

Dabbler
Joined
Jan 27, 2015
Messages
29
Yes I'm with you on not counting on the swap;
Just a last point before the end :)

I reinstalled TrueNas SCALE in a VM just to compare; and during the installation it asks if you want to create a swap on the boot device (which I did), so I really don't get why I have a RAID swap shooting at any zpool.
 

JOduMonT

Dabbler
Joined
Jan 27, 2015
Messages
29
So I explored a little more and even on an installation of TrueNas SCALE without SWAP
when I create a zfs pool the drives are partitioned like this:
- part1: 2GiB with CODE 8200 which is Linux SWAP (at least in Linux)
- part2: REST with CODE BF01 which is Solaris /usr & Mac ZFS

and then the way /etc/default/mdadm is declared that make a RAID of those SWAP partitions.
That is ode no ?
 

diggles

Cadet
Joined
Dec 17, 2021
Messages
1
since zfs could do everything mdadm could do
This statement is incorrect. As I have just found out the hard way, it is difficult to add more disks to a RAIDZ (Raid 5 style) ZFS pool. In mdadm, you can add a single disk to a RAID5 to increase the capacity. In ZFS (as far as I am aware) you need to add multiple disks to increase a pool size by creating another vdev.

Not helpful to your problem I'm aware however may help with understanding the different styles of disk management?
 

Glowtape

Dabbler
Joined
Apr 8, 2017
Messages
45
Seems like RAID-Z expansion is finally coming, tho.

Given how it's supposed to work, it'll use the full space of the added disk only if you rewrite old blocks eventually. Added space would go unused in cold storage.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,176
Given how it's supposed to work, it'll use the full space of the added disk only if you rewrite old blocks eventually.
No, that is not the case. What is the case is that old data will maintain the old parity ratio. E.g. given a large file and 1M blocks, the old data in a 6-wide RAIDZ2 would be divided across stripes of 6 chunks on-disk, two of which are parity. Add two disks [one at a time, until someone implements this option] and new writes for similar data will be divided into 8 chunks on-disk, two of which are parity.
So, in our example, old data has (at best, small blocks complicate things) 33% parity, whereas new data has 25% parity, which is better in terms of space efficiency.
 
Last edited:

Glowtape

Dabbler
Joined
Apr 8, 2017
Messages
45
Oh okay. I understood it like it adds a slice and its availability in used areas grows as stripes get rewritten. Checking up on this, they're going to reflow the data on expansion. Didn't know that.
 
Top