Removing cache drives from live pool

guemi

Dabbler
Joined
Apr 16, 2020
Messages
48
Aloha TrueNAS peoples :)


Currently we have a RAID-Z1 pool of 4 HDD-drives that also has 2 NVME RAID-Z0 drives as cache.

1622123738564.png


Due to performance I'd like to split these up and have one as a read cache and the other as a ZLOG device to speed up write speeds.



However, when I last disconnected one of the cache drives the POOL stopped responding, which in hindsight might not be so suprising and was more of a really stupid move by me, but I keep reading that disconnected a read cache during operation is fine.

The pool do host some ISCSI-connected drives with virtual machines on them, so quite sensitive data.


I cannot find an option where I can disconnect both drives at once, so how would I go about this? Do I need to take iscsi service offline and make sure nothing is writing / reading from the NAS when I want to do this?


Advice is most welcome :)
 

c77dk

Patron
Joined
Nov 27, 2019
Messages
468
Please fill in the details of which hardware and TrueNAS version you're running.

You should be able to remove l2arc without trouble, but mayby someone else have some ideas to what you experienced.

The details of hw will tell if the NVMe will be recommended as SLOG (unless there's some PLP you're no more safe than running sync=never) - both PLP and write endurance is important to look at.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Due to performance I'd like to split these up and have one as a read cache and the other as a ZLOG device to speed up write speeds.
If you're going to run SLOG in front of a RAIDZ1 pool, you're probably not going to get what you expect.

Please read through this before deciding if you need SLOG: https://www.truenas.com/community/threads/the-path-to-success-for-block-storage.81165/

Also, if you just want faster copy speeds for large files, this isn't going to help as the SLOG will be unable to offload the 5-10 seconds of data it will hold to the pool (which is still the same speed as it is now) and will backoff IOPS until it can.

Maybe you'd just be better off running a mirrored pool of the NVME drives and have a replication job pushing the data to the RAIDZ1 pool for extra security.
 

guemi

Dabbler
Joined
Apr 16, 2020
Messages
48
The harddrives are Seagate Ironwolf Pro's and the cache M2's are Corsair MP510's

This isn't about safety, but speed.

We currently only get around 5-600 MB/s write speed to the pool over 10 gbit/s.
I feel we should get way higher than that, especially during sequential writes.

So if adding a faster SSD as SLOG won't do, we really just need an SSD pool then I guess.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
It depends on your bottleneck/demand.

If you're IOPS heavy, you will only be getting something like 300-600 IOPS from that pool of HDDs. Even if a SLOG can deliver 50'000 IOPS, it doesn't take many seconds for your pool disks to choke trying to catch up on that.

Let's face it, if you're expecting 10 Gbits of performance for block storage and you have a RAIDZ1 pool of 4 spinning disks behind that, you're never going to be happy. You need a bunch of mirrors to increase your pool IOPS capability and a high-performance SLOG if you're going to care about data loss.

If you don't care about data loss, just set sync=never and enjoy the performance of RAM speed transfers, don't bother with SLOG. You didn't say how much RAM you have, so there's a chance that RAM/ARC would run out before the pool could catch up with big transfers though... more RAM = good in that case.
 

guemi

Dabbler
Joined
Apr 16, 2020
Messages
48
It depends on your bottleneck/demand.

If you're IOPS heavy, you will only be getting something like 300-600 IOPS from that pool of HDDs. Even if a SLOG can deliver 50'000 IOPS, it doesn't take many seconds for your pool disks to choke trying to catch up on that.

Let's face it, if you're expecting 10 Gbits of performance for block storage and you have a RAIDZ1 pool of 4 spinning disks behind that, you're never going to be happy. You need a bunch of mirrors to increase your pool IOPS capability and a high-performance SLOG if you're going to care about data loss.

If you don't care about data loss, just set sync=never and enjoy the performance of RAM speed transfers, don't bother with SLOG. You didn't say how much RAM you have, so there's a chance that RAM/ARC would run out before the pool could catch up with big transfers though... more RAM = good in that case.


Ok so even if the SLOG is high capacity (In this case 1 TB), it's still going to slow down despite not being filled up with unwritten data just because HDD's cannot handle it?

I'd set sync=never but we only have 64 GB of ram in this bad boy, so that'd run out pretty fast.
 

guemi

Dabbler
Joined
Apr 16, 2020
Messages
48
However, this I don't understand - when doing a DD copy to test the speed, I get placing 4 GB/s on POOL01:

root@NF-NAS01[~]# dd if=/dev/zero of=/mnt/POOL01/ddfile bs=1024k count=20000
20000+0 records in
20000+0 records out
20971520000 bytes transferred in 4.732976 secs (4430937208 bytes/sec)


So it's clearly got way faster write speeds locally in the shell, why is this not seen over SMB / ISCSI?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
So it's clearly got way faster write speeds locally in the shell, why is this not seen over SMB / ISCSI?
This answer covers it pretty well:

Basically, it's due to the nature of the IO... depending on the client and the way it requests the data to be written.

Also, using dd to copy a bunch of (highly compressible) zeroes isn't a good test of anything.
 

guemi

Dabbler
Joined
Apr 16, 2020
Messages
48
All right.

Well, the issue is still at hand: How the hell to unmount the L2ARC without messing up the pool. We'll see if someone else can.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Well, the issue is still at hand: How the hell to unmount the L2ARC without messing up the pool
So the instructions go like this:
zpool remove [-np] pool device...
Removes the specified device from the pool. This command currently
only supports removing hot spares, cache, log devices and mirrored
top-level vdevs (mirror of leaf devices); but not raidz.

So for your case, you would probably be best advised to first run zpool status -v to see the gptids of the cache devices, then use them in the command like this:

zpool remove POOL01 gptid/XXXXXXXXXXXXXX gptid/YYYYYYYYYYYYYYYYY

I have tested that with a sparsefile pool and it looks OK to me under TrueNAS SCALE (with a copy job running on the pool at the time, it isn't interrupted), but I do recall something reported about issues with removal of cache on a live pool in the current version of CORE, so you might want to look into that.
EDIT: I searched around a bit for the reference on that and can't find anything... maybe I was imagining it. In any case, you need to satisfy yourself that you're not going to lose anything important, so stop your VMs if necessary.
 
Last edited:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
[mod note: I have no idea WTF "RAID-Z0 cache drives" are so I've retitled the article]
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
However, when I last disconnected one of the cache drives the POOL stopped responding,

By "disconnected" do you mean a physical hotplug event, or choosing the option in the UI to pull the drive?

I've detached L2ARCs from a pool (in the UI) several times without incident, although they've been SAS/SATA devices.
 

guemi

Dabbler
Joined
Apr 16, 2020
Messages
48
[mod note: I have no idea WTF "RAID-Z0 cache drives" are so I've retitled the article]

Simply 2 M2 drives that are running together in a RAID-Z / RAID0 i.e spanned together.



By "disconnected" do you mean a physical hotplug event, or choosing the option in the UI to pull the drive?

I've detached L2ARCs from a pool (in the UI) several times without incident, although they've been SAS/SATA devices.


Storage > Pool > Status > One of the L2ARC drives > Remove
 

guemi

Dabbler
Joined
Apr 16, 2020
Messages
48
So the instructions go like this:
zpool remove [-np] pool device...
Removes the specified device from the pool. This command currently
only supports removing hot spares, cache, log devices and mirrored
top-level vdevs (mirror of leaf devices); but not raidz.

So for your case, you would probably be best advised to first run zpool status -v to see the gptids of the cache devices, then use them in the command like this:

zpool remove POOL01 gptid/XXXXXXXXXXXXXX gptid/YYYYYYYYYYYYYYYYY

I have tested that with a sparsefile pool and it looks OK to me under TrueNAS SCALE (with a copy job running on the pool at the time, it isn't interrupted), but I do recall something reported about issues with removal of cache on a live pool in the current version of CORE, so you might want to look into that.
EDIT: I searched around a bit for the reference on that and can't find anything... maybe I was imagining it. In any case, you need to satisfy yourself that you're not going to lose anything important, so stop your VMs if necessary.


Very strange, this is essentially what I did - but via the UI.


Perhaps a one time thing. But I'll just plan for it going belly up again in a maintenance window.
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
Simply 2 M2 drives that are running together in a RAID-Z / RAID0 i.e spanned together.
You can't make a RAIZ1 with two drives. RAID 0 does not exist in ZFS; the corresponding geometry is a "stripe", and is definitively not the same as RAIDZ.
If you want help, you need to use the proper terminology to describe your setting and what you want to achieve.
 

guemi

Dabbler
Joined
Apr 16, 2020
Messages
48
You can't make a RAIZ1 with two drives. RAID 0 does not exist in ZFS; the corresponding geometry is a "stripe", and is definitively not the same as RAIDZ.
If you want help, you need to use the proper terminology to describe your setting and what you want to achieve.


Sure.

M2 drives are striped together to form the read cache of the RAID-Z1 pool.
TrueNAS does not give me the option to remove both at the same time, and last time I tried removing one of them - caused some kind of error because ISCSI and SMB stopped working all together and we had to reboot the NAS to get back into operation.

So: How to remove these drives from the POOL so I can readd one as L2ARC and the other as SLOG?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
M2 drives are striped together to form the read cache of the RAID-Z1 pool.

Correct. RAIDZ is only supported for data vdevs, not for metadata vdevs. ZFS doesn't support RAID0 at all; striping might seem like "the same thing" but it isn't.

TrueNAS does not give me the option to remove both at the same time,

The underlying ZFS commands do not allow removal of multiple devices at the same time, as far as I recall.

and last time I tried removing one of them - caused some kind of error because ISCSI and SMB stopped working all together and we had to reboot the NAS to get back into operation.

That seems curious. I would think that's an error/bug/problem, because that should be a nonobjectionable change. If you can duplicate that, it would be interesting to follow up on.

So: How to remove these drives from the POOL so I can readd one as L2ARC and the other as SLOG?

A SSD that was suitable for use as L2ARC is very unlikely to be an appropriate SLOG device. A SLOG needs power loss protection or some other mechanism to guarantee POSIX-compatible committed writes. This usually means SLC SSD or Optane or enterprise-grade-endurance SSD.

You might want to look at my "Some insights into SLOG" sticky and then there is also a very good SLOG device thread around here somewhere, but I am already late for something here, so I leave that up to someone else to point out...
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Storage > Pool > Status > One of the L2ARC drives > Remove
This is the correct and supported path. Unless TrueNAS is doing something like attempting to signal a "hotplug removal" to the device - which it shouldn't be - I don't see why a "drop L2ARC" would cause a lockup. Your running VM workloads will take a hit if they were relying on the L2ARC, especially with the underlying pool vdevs being Z1, but it shouldn't have killed anything on the TrueNAS end. I've even pulled NVMe SLOG (through the UI) without ill effect to the server, but the guest workload was certainly upset.

Perhaps a one time thing. But I'll just plan for it going belly up again in a maintenance window.
Good preventative measure for sure. If it does go down again, I'd suggest trying to get a debug capture and submit a formal bug.

Re: the RAIDZ/stripe nomenclature, it might seem like pedantry but there is a big difference, especially where drive removals are concerned.

The underlying ZFS commands do not allow removal of multiple devices at the same time, as far as I recall.
I'm reasonably sure you can remove all members of a cache or log vdev specifically by targeting the top-level leaf, but it's been a while since I tested that in practice.


An SLOG needs power loss protection or some other mechanism to guarantee POSIX-compatible committed writes. This usually means SLC SSD or Optane or enterprise-grade-endurance SSD.
Technically not true, it just needs to not lie about having PLP; but devices without PLP tend to have insufficient performance to act as SLOGs, making it a good shortcut for "don't bother unless it has PLP."


You might want to look at my "Some insights into SLOG" sticky and then there is also a very good SLOG device thread around here somewhere, but I am already late for something here, so I leave that up to someone else to point out...
Should be in my signature, although I'm on mobile right now. Though when I turn my screen to landscape, I can see them.

Edit: https://www.truenas.com/community/threads/slog-benchmarking-and-finding-the-best-slog.63521/
 

guemi

Dabbler
Joined
Apr 16, 2020
Messages
48
The underlying ZFS commands do not allow removal of multiple devices at the same time, as far as I recall.

That seems curious. I would think that's an error/bug/problem, because that should be a nonobjectionable change. If you can duplicate that, it would be interesting to follow up on.


That's what I gathered too when I google'd, and I was like "Awesome - I can do this now" and bam, incident created at 14.05 on a monday :P

I'll try again, but prepare for a belly up. In the interest of bug hunting, is there any logging / profiling settings you'd want me to turn on before I do so, in case it is indeed a bug and that might help you track it?

Version: TrueNAS-12.0-U1

A SSD that was suitable for use as L2ARC is very unlikely to be an appropriate SLOG device. A SLOG needs power loss protection or some other mechanism to guarantee POSIX-compatible committed writes. This usually means SLC SSD or Optane or enterprise-grade-endurance SSD.

You might want to look at my "Some insights into SLOG" sticky and then there is also a very good SLOG device thread around here somewhere, but I am already late for something here, so I leave that up to someone else to point out...

I've read that thread and fully aware that it isn't the safest option, but we have dual PSU and UPS with Diesel backup so in the event of an actual abrupt machine shutdown, some lost data written in that moment isn't a concern or a problem.

The only servers running directly on the NAS via ISCSI now are only servers that don't write a lot of data (Application servers) or file servers in which worst case, we'll have to revert to the last 4 hour backup.
Speed is a concern here, not as much data safety.

But I appreciate the pointers :)
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I've read that thread and fully aware that it isn't the safest option, but we have dual PSU and UPS with Diesel backup so in the event of an actual abrupt machine shutdown, some lost data written in that moment isn't a concern or a problem.

Then why bother with SLOG at all? It's slowing your writes down without providing a benefit. Just turn off sync writes and be done with it if "some lost data [...] isn't a concern or a problem."
 
Top