Removing cache drives from live pool

guemi · May 27, 2021

Aloha TrueNAS peoples :)

Currently we have a RAID-Z1 pool of 4 HDD-drives that also has 2 NVME RAID-Z0 drives as cache.

Due to performance I'd like to split these up and have one as a read cache and the other as a ZLOG device to speed up write speeds.

However, when I last disconnected one of the cache drives the POOL stopped responding, which in hindsight might not be so suprising and was more of a really stupid move by me, but I keep reading that disconnected a read cache during operation is fine.

The pool do host some ISCSI-connected drives with virtual machines on them, so quite sensitive data.

I cannot find an option where I can disconnect both drives at once, so how would I go about this? Do I need to take iscsi service offline and make sure nothing is writing / reading from the NAS when I want to do this?

Advice is most welcome :)

c77dk · May 27, 2021

Please fill in the details of which hardware and TrueNAS version you're running.

You should be able to remove l2arc without trouble, but mayby someone else have some ideas to what you experienced.

The details of hw will tell if the NVMe will be recommended as SLOG (unless there's some PLP you're no more safe than running sync=never) - both PLP and write endurance is important to look at.

sretalla · May 27, 2021

guemi said:
Due to performance I'd like to split these up and have one as a read cache and the other as a ZLOG device to speed up write speeds.

If you're going to run SLOG in front of a RAIDZ1 pool, you're probably not going to get what you expect.

Please read through this before deciding if you need SLOG: https://www.truenas.com/community/threads/the-path-to-success-for-block-storage.81165/

Also, if you just want faster copy speeds for large files, this isn't going to help as the SLOG will be unable to offload the 5-10 seconds of data it will hold to the pool (which is still the same speed as it is now) and will backoff IOPS until it can.

Maybe you'd just be better off running a mirrored pool of the NVME drives and have a replication job pushing the data to the RAIDZ1 pool for extra security.

guemi · May 27, 2021

The harddrives are Seagate Ironwolf Pro's and the cache M2's are Corsair MP510's

This isn't about safety, but speed.

We currently only get around 5-600 MB/s write speed to the pool over 10 gbit/s.
I feel we should get way higher than that, especially during sequential writes.

So if adding a faster SSD as SLOG won't do, we really just need an SSD pool then I guess.

sretalla · May 28, 2021

It depends on your bottleneck/demand.

If you're IOPS heavy, you will only be getting something like 300-600 IOPS from that pool of HDDs. Even if a SLOG can deliver 50'000 IOPS, it doesn't take many seconds for your pool disks to choke trying to catch up on that.

Let's face it, if you're expecting 10 Gbits of performance for block storage and you have a RAIDZ1 pool of 4 spinning disks behind that, you're never going to be happy. You need a bunch of mirrors to increase your pool IOPS capability and a high-performance SLOG if you're going to care about data loss.

If you don't care about data loss, just set sync=never and enjoy the performance of RAM speed transfers, don't bother with SLOG. You didn't say how much RAM you have, so there's a chance that RAM/ARC would run out before the pool could catch up with big transfers though... more RAM = good in that case.

guemi · May 28, 2021

sretalla said:
It depends on your bottleneck/demand.

If you're IOPS heavy, you will only be getting something like 300-600 IOPS from that pool of HDDs. Even if a SLOG can deliver 50'000 IOPS, it doesn't take many seconds for your pool disks to choke trying to catch up on that.

Let's face it, if you're expecting 10 Gbits of performance for block storage and you have a RAIDZ1 pool of 4 spinning disks behind that, you're never going to be happy. You need a bunch of mirrors to increase your pool IOPS capability and a high-performance SLOG if you're going to care about data loss.

If you don't care about data loss, just set sync=never and enjoy the performance of RAM speed transfers, don't bother with SLOG. You didn't say how much RAM you have, so there's a chance that RAM/ARC would run out before the pool could catch up with big transfers though... more RAM = good in that case.

Ok so even if the SLOG is high capacity (In this case 1 TB), it's still going to slow down despite not being filled up with unwritten data just because HDD's cannot handle it?

I'd set sync=never but we only have 64 GB of ram in this bad boy, so that'd run out pretty fast.

guemi · May 28, 2021

However, this I don't understand - when doing a DD copy to test the speed, I get placing 4 GB/s on POOL01:

root@NF-NAS01[~]# dd if=/dev/zero of=/mnt/POOL01/ddfile bs=1024k count=20000
20000+0 records in
20000+0 records out
20971520000 bytes transferred in 4.732976 secs (4430937208 bytes/sec)

So it's clearly got way faster write speeds locally in the shell, why is this not seen over SMB / ISCSI?

sretalla · May 28, 2021

guemi said:
So it's clearly got way faster write speeds locally in the shell, why is this not seen over SMB / ISCSI?

This answer covers it pretty well:

Need to quadruple or better my ZIL performance ...

Good evening TrueNAS afficionados, This may have been approached from another angle, but if so I didn't see it. I have a TrueNAS 12 configured on a chassis with E5 processor, 128 GB RAM, 9 SAS drives in a zraid 3, and 500 gb SSD ZIL, backing a production virtual machine cluster on a 10 gbit...

www.truenas.com

Basically, it's due to the nature of the IO... depending on the client and the way it requests the data to be written.

Also, using dd to copy a bunch of (highly compressible) zeroes isn't a good test of anything.

guemi · May 28, 2021

All right.

Well, the issue is still at hand: How the hell to unmount the L2ARC without messing up the pool. We'll see if someone else can.

sretalla · May 28, 2021

guemi said:
Well, the issue is still at hand: How the hell to unmount the L2ARC without messing up the pool

So the instructions go like this:
zpool remove [-np] pool device...
Removes the specified device from the pool. This command currently
only supports removing hot spares, cache, log devices and mirrored
top-level vdevs (mirror of leaf devices); but not raidz.

So for your case, you would probably be best advised to first run zpool status -v to see the gptids of the cache devices, then use them in the command like this:

zpool remove POOL01 gptid/XXXXXXXXXXXXXX gptid/YYYYYYYYYYYYYYYYY

I have tested that with a sparsefile pool and it looks OK to me under TrueNAS SCALE (with a copy job running on the pool at the time, it isn't interrupted), but I do recall something reported about issues with removal of cache on a live pool in the current version of CORE, so you might want to look into that.
EDIT: I searched around a bit for the reference on that and can't find anything... maybe I was imagining it. In any case, you need to satisfy yourself that you're not going to lose anything important, so stop your VMs if necessary.

jgreco · May 28, 2021

[mod note: I have no idea WTF "RAID-Z0 cache drives" are so I've retitled the article]

HoneyBadger · May 28, 2021

guemi said:
However, when I last disconnected one of the cache drives the POOL stopped responding,

By "disconnected" do you mean a physical hotplug event, or choosing the option in the UI to pull the drive?

I've detached L2ARCs from a pool (in the UI) several times without incident, although they've been SAS/SATA devices.

guemi · May 30, 2021

jgreco said:
[mod note: I have no idea WTF "RAID-Z0 cache drives" are so I've retitled the article]

Simply 2 M2 drives that are running together in a RAID-Z / RAID0 i.e spanned together.

HoneyBadger said:
By "disconnected" do you mean a physical hotplug event, or choosing the option in the UI to pull the drive?

I've detached L2ARCs from a pool (in the UI) several times without incident, although they've been SAS/SATA devices.

Storage > Pool > Status > One of the L2ARC drives > Remove

guemi · May 30, 2021

sretalla said:
So the instructions go like this:
zpool remove [-np] pool device...
Removes the specified device from the pool. This command currently
only supports removing hot spares, cache, log devices and mirrored
top-level vdevs (mirror of leaf devices); but not raidz.

So for your case, you would probably be best advised to first run zpool status -v to see the gptids of the cache devices, then use them in the command like this:

zpool remove POOL01 gptid/XXXXXXXXXXXXXX gptid/YYYYYYYYYYYYYYYYY

I have tested that with a sparsefile pool and it looks OK to me under TrueNAS SCALE (with a copy job running on the pool at the time, it isn't interrupted), but I do recall something reported about issues with removal of cache on a live pool in the current version of CORE, so you might want to look into that.
EDIT: I searched around a bit for the reference on that and can't find anything... maybe I was imagining it. In any case, you need to satisfy yourself that you're not going to lose anything important, so stop your VMs if necessary.

Very strange, this is essentially what I did - but via the UI.

Perhaps a one time thing. But I'll just plan for it going belly up again in a maintenance window.

Etorix · May 30, 2021

guemi said:
Simply 2 M2 drives that are running together in a RAID-Z / RAID0 i.e spanned together.

You can't make a RAIZ1 with two drives. RAID 0 does not exist in ZFS; the corresponding geometry is a "stripe", and is definitively not the same as RAIDZ.
If you want help, you need to use the proper terminology to describe your setting and what you want to achieve.

Terminology and Abbreviations Primer

We realize that new users have a lot to learn when they come to FreeNAS. There's a certain amount of confusion added to discussions when users pick random/approximate terms to describe things. I've spent a lot of time quietly trying to translate terms on the reader's side when reading posts...

www.truenas.com

Slideshow explaining VDev, zpool, ZIL and L2ARC for noobs!

Slideshow explaining VDev, zpool, ZIL and L2ARC and other newbie mistakes! I've put together a Powerpoint presentation(and PDF) that gives some useful info for newbies to FreeNAS. I decided to create this slideshow because in the last 5 months I've been on this forum I've seen a lot of people...

www.truenas.com

guemi · May 30, 2021

Etorix said:
You can't make a RAIZ1 with two drives. RAID 0 does not exist in ZFS; the corresponding geometry is a "stripe", and is definitively not the same as RAIDZ.
If you want help, you need to use the proper terminology to describe your setting and what you want to achieve.

Terminology and Abbreviations Primer

We realize that new users have a lot to learn when they come to FreeNAS. There's a certain amount of confusion added to discussions when users pick random/approximate terms to describe things. I've spent a lot of time quietly trying to translate terms on the reader's side when reading posts...

www.truenas.com

Slideshow explaining VDev, zpool, ZIL and L2ARC for noobs!

Slideshow explaining VDev, zpool, ZIL and L2ARC and other newbie mistakes! I've put together a Powerpoint presentation(and PDF) that gives some useful info for newbies to FreeNAS. I decided to create this slideshow because in the last 5 months I've been on this forum I've seen a lot of people...

www.truenas.com

Sure.

M2 drives are striped together to form the read cache of the RAID-Z1 pool.
TrueNAS does not give me the option to remove both at the same time, and last time I tried removing one of them - caused some kind of error because ISCSI and SMB stopped working all together and we had to reboot the NAS to get back into operation.

So: How to remove these drives from the POOL so I can readd one as L2ARC and the other as SLOG?

jgreco · May 30, 2021

guemi said:
M2 drives are striped together to form the read cache of the RAID-Z1 pool.

Correct. RAIDZ is only supported for data vdevs, not for metadata vdevs. ZFS doesn't support RAID0 at all; striping might seem like "the same thing" but it isn't.

TrueNAS does not give me the option to remove both at the same time,

The underlying ZFS commands do not allow removal of multiple devices at the same time, as far as I recall.

and last time I tried removing one of them - caused some kind of error because ISCSI and SMB stopped working all together and we had to reboot the NAS to get back into operation.

That seems curious. I would think that's an error/bug/problem, because that should be a nonobjectionable change. If you can duplicate that, it would be interesting to follow up on.

So: How to remove these drives from the POOL so I can readd one as L2ARC and the other as SLOG?

A SSD that was suitable for use as L2ARC is very unlikely to be an appropriate SLOG device. A SLOG needs power loss protection or some other mechanism to guarantee POSIX-compatible committed writes. This usually means SLC SSD or Optane or enterprise-grade-endurance SSD.

You might want to look at my "Some insights into SLOG" sticky and then there is also a very good SLOG device thread around here somewhere, but I am already late for something here, so I leave that up to someone else to point out...

HoneyBadger · May 30, 2021

guemi said:
Storage > Pool > Status > One of the L2ARC drives > Remove

This is the correct and supported path. Unless TrueNAS is doing something like attempting to signal a "hotplug removal" to the device - which it shouldn't be - I don't see why a "drop L2ARC" would cause a lockup. Your running VM workloads will take a hit if they were relying on the L2ARC, especially with the underlying pool vdevs being Z1, but it shouldn't have killed anything on the TrueNAS end. I've even pulled NVMe SLOG (through the UI) without ill effect to the server, but the guest workload was certainly upset.

guemi said:
Perhaps a one time thing. But I'll just plan for it going belly up again in a maintenance window.

Good preventative measure for sure. If it does go down again, I'd suggest trying to get a debug capture and submit a formal bug.

Re: the RAIDZ/stripe nomenclature, it might seem like pedantry but there is a big difference, especially where drive removals are concerned.

jgreco said:
The underlying ZFS commands do not allow removal of multiple devices at the same time, as far as I recall.

I'm reasonably sure you can remove all members of a cache or log vdev specifically by targeting the top-level leaf, but it's been a while since I tested that in practice.

jgreco said:
An SLOG needs power loss protection or some other mechanism to guarantee POSIX-compatible committed writes. This usually means SLC SSD or Optane or enterprise-grade-endurance SSD.

Technically not true, it just needs to not lie about having PLP; but devices without PLP tend to have insufficient performance to act as SLOGs, making it a good shortcut for "don't bother unless it has PLP."

jgreco said:
You might want to look at my "Some insights into SLOG" sticky and then there is also a very good SLOG device thread around here somewhere, but I am already late for something here, so I leave that up to someone else to point out...

Should be in my signature, although I'm on mobile right now. Though when I turn my screen to landscape, I can see them.

Edit: https://www.truenas.com/community/threads/slog-benchmarking-and-finding-the-best-slog.63521/

guemi · May 30, 2021

jgreco said:
The underlying ZFS commands do not allow removal of multiple devices at the same time, as far as I recall.

That seems curious. I would think that's an error/bug/problem, because that should be a nonobjectionable change. If you can duplicate that, it would be interesting to follow up on.

That's what I gathered too when I google'd, and I was like "Awesome - I can do this now" and bam, incident created at 14.05 on a monday :P

I'll try again, but prepare for a belly up. In the interest of bug hunting, is there any logging / profiling settings you'd want me to turn on before I do so, in case it is indeed a bug and that might help you track it?

Version: TrueNAS-12.0-U1

jgreco said:
A SSD that was suitable for use as L2ARC is very unlikely to be an appropriate SLOG device. A SLOG needs power loss protection or some other mechanism to guarantee POSIX-compatible committed writes. This usually means SLC SSD or Optane or enterprise-grade-endurance SSD.

You might want to look at my "Some insights into SLOG" sticky and then there is also a very good SLOG device thread around here somewhere, but I am already late for something here, so I leave that up to someone else to point out...

I've read that thread and fully aware that it isn't the safest option, but we have dual PSU and UPS with Diesel backup so in the event of an actual abrupt machine shutdown, some lost data written in that moment isn't a concern or a problem.

The only servers running directly on the NAS via ISCSI now are only servers that don't write a lot of data (Application servers) or file servers in which worst case, we'll have to revert to the last 4 hour backup.
Speed is a concern here, not as much data safety.

But I appreciate the pointers :)

jgreco · May 31, 2021

guemi said:
I've read that thread and fully aware that it isn't the safest option, but we have dual PSU and UPS with Diesel backup so in the event of an actual abrupt machine shutdown, some lost data written in that moment isn't a concern or a problem.

Then why bother with SLOG at all? It's slowing your writes down without providing a benefit. Just turn off sync writes and be done with it if "some lost data [...] isn't a concern or a problem."

Important Announcement for the TrueNAS Community.

Removing cache drives from live pool

Dabbler

Patron

Powered by Neutrality

Dabbler

Powered by Neutrality

Dabbler

Dabbler

Powered by Neutrality

Dabbler

Powered by Neutrality

Resident Grinch

actually does care

Dabbler

Dabbler

Wizard

Dabbler

Resident Grinch

actually does care

Dabbler

Resident Grinch

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Removing cache drives from live pool"

Similar threads