After Bluefin upgrade, iSCSI share speed drops after 2GB copied

Bann

Dabbler
Joined
Jan 14, 2023
Messages
12
SCALE uses mdadm for swap disks. not sure why they decided to send these emails out, as long as you have enough RAM, the swap on disk won't even be used. it seems like the state of the swap partitions is sometimes checked on startup, found to be incomplete, and an email sent out.

as long as ZFS or smart are not reporting errors, these appear to be false alerts.

note that, while not really needed for this particular question, forum rules are to include your hardware. an easy way to do this is to put it in a signature that will be available with any post.
Actually, I found something wrong with my iSCSI share after upgrading.
The write speed will down to almost 0, after about 2GiB data been sent. Then it will increase.
My RAM is over 70GiB, almost 70% free.
1673809256027.png

But the SMB works will, which on the same pool.
The size of data seems just like the size of swap.
My guess is the iSCSI share uses swap disk instead of RAM somehow.
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
My guess is the iSCSI share uses swap disk instead of RAM somehow.
the disk swap is only used when there is no RAM available. this should never occur. I don't see anyone's hardware so it's difficult to say.
the swap on disk is primary there as an emergency RAM substitute for the occaisonally catastrophic failures where zfs needs all the ram in the world to try and fix something on the pool. as long as that never happens, the disk swap should never be touched.
 

Bann

Dabbler
Joined
Jan 14, 2023
Messages
12
the disk swap is only used when there is no RAM available. this should never occur. I don't see anyone's hardware so it's difficult to say.
the swap on disk is primary there as an emergency RAM substitute for the occaisonally catastrophic failures where zfs needs all the ram in the world to try and fix something on the pool. as long as that never happens, the disk swap should never be touched.
Thanks for your reply.
I understand it shouldn't happen, I just find this problem and I tried all the ways I know to solve it(like rebuild the iSCSI, export the pool and rebuild it), but failed. (I'm new to linux and TrueNas) And all I konw about 2GiB is the swap disk size, so I made my guess.
I use H740P raid card to build a raid5 disk, passthrough from ESXI to TrueNAS.
Before Bluefin, I used 22.02.4 Angelfish, this iSCSI worked well.
Maybe you could give me some methods to solve this.
Anyway, thanks for your help!
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
I use H740P raid card to build a raid5 disk, passthrough from ESXI to TrueNAS.
That's a terrible thing to do if you enjoy keeping your data:
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Actually, I found something wrong with my iSCSI share after upgrading.
The write speed will down to almost 0, after about 2GiB data been sent. Then it will increase.
My RAM is over 70GiB, almost 70% free.
View attachment 62479
But the SMB works will, which on the same pool.
The size of data seems just like the size of swap.
My guess is the iSCSI share uses swap disk instead of RAM somehow.

This shows symptoms of behaving like the ZFS write throttle in action. What is your SMB peak speed, and can you show a similar file-copy graph?
 

Bann

Dabbler
Joined
Jan 14, 2023
Messages
12
That's a terrible thing to do if you enjoy keeping your data:
Yeah, I realized that, but it's not a big deal because I'm still trying to find the best way to use my NAS, and the disk is almost empty, so I still have chance to switch to HBA.
 

Bann

Dabbler
Joined
Jan 14, 2023
Messages
12
This shows symptoms of behaving like the ZFS write throttle in action. What is your SMB peak speed, and can you show a similar file-copy graph?
Here it is, same file.
1673884714814.png

It's 1.07GiB/s, almost reach the limit of my network.
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
I use H740P raid card to build a raid5 disk, passthrough from ESXI to TrueNAS.
wow. there is so much wrong here.
 

Bann

Dabbler
Joined
Jan 14, 2023
Messages
12
wow. there is so much wrong here.
Maybe there is a misunderstanding, I actually passthrough H740P to TrueNAS, and I still have s.m.a.r.t. function through megaraid command.
But yes, I know that's not a good idea now~
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
unfortunately, this pretty much means that all the "Should" and "should not" statements above no longer reliably apply. having set it up in such a problematic way means it could be doing all kinds of unexpected things.

your best path to resolving these issues at this point is to re-architect your build to a known good working configuration, as well as give us your full hardware spec and, because you have virtualized it, your virtualization layout.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Maybe there is a misunderstanding, I actually passthrough H740P to TrueNAS, and I still have s.m.a.r.t. function through megaraid command.
But yes, I know that's not a good idea now~

What are the specifications of the physical hardware, and of the virtual machine?

Other posters are correct in that the correct hardware for this use case is an HBA, but if you have already performed PCIe passthrough of the card to TrueNAS you have an advantage in that your physical host is likely set up to be able to just have an HBA dropped in. On the other hand, if you have created a second virtual disk on the H740p and assigned that directly as an RDM passthrough, you may not have as much flexibility.

Regarding the performance being limited - it may be a result from the different recordsizes and ability of your RAID card to handle the large amount of smaller I/O generated by the iSCSI ZVOL. With an SMB share, the incoming writes for large files will be split into a default size of 128K, which will likely be striped across all drives on the hardware RAID card. iSCSI uses a default 16K maximum recordsize, which may be causing a large amount of read-modify-write cycles and stressing the RAID card's cache. This is one of the reasons that RAID cards aren't recommended.
 

Bann

Dabbler
Joined
Jan 14, 2023
Messages
12
What are the specifications of the physical hardware, and of the virtual machine?​

Other posters are correct in that the correct hardware for this use case is an HBA, but if you have already performed PCIe passthrough of the card to TrueNAS you have an advantage in that your physical host is likely set up to be able to just have an HBA dropped in. On the other hand, if you have created a second virtual disk on the H740p and assigned that directly as an RDM passthrough, you may not have as much flexibility.

Regarding the performance being limited - it may be a result from the different recordsizes and ability of your RAID card to handle the large amount of smaller I/O generated by the iSCSI ZVOL. With an SMB share, the incoming writes for large files will be split into a default size of 128K, which will likely be striped across all drives on the hardware RAID card. iSCSI uses a default 16K maximum recordsize, which may be causing a large amount of read-modify-write cycles and stressing the RAID card's cache. This is one of the reasons that RAID cards aren't recommended.
Platform:
Motherboard: H12SSL-i​
CPU: EPYC 7452QS(32C 64T)​
RAM: 128GB (8 x SAMSUNG 16GB 2666MHz ECC RDIMM)​
Network: INTEL X540T2​
SAS Extend Card: ADAPTEC AEC_82885T​
Raid Card: DELL H740P​
NVMe: Samsung 980 PRO 1TB; 2 x Samsung 970 PRO 512GB;​
Stroage: 5 x 16TB Seagate EXOS X18 (Another 5 on the way)​
System:
ESXI 7.0 Update 3​
For TrueNAS:​
40 vCPU​
80GB RAM​
40+50GB System Stroage​
Passthrough​
2 x NVMe Controller (2 x 970 Pro 512GB)​
PERC H740P Adapter ( RAID5 by 5 x 16TB hard drive)​
2 x 10G port from INTEL X540T2​
I created the Pool with the raid5 disk, with one 970 pro set as cache.
Then I created vzol for iSCSI, dataset for SMB on this pool.
1673958220443.png

1673957977116.png

1673958148394.png
 
Last edited:

Bann

Dabbler
Joined
Jan 14, 2023
Messages
12
unfortunately, this pretty much means that all the "Should" and "should not" statements above no longer reliably apply. having set it up in such a problematic way means it could be doing all kinds of unexpected things.

your best path to resolving these issues at this point is to re-architect your build to a known good working configuration, as well as give us your full hardware spec and, because you have virtualized it, your virtualization layout.
Well, that maybe the best way for me to solve this.
Good thing is my disk is still empty right now, to re-architect it is not a big deal.
But I still want to use TrueNAS virtual environment, I'll read the threads related to it first. :smile:
Thanks.
 

Bann

Dabbler
Joined
Jan 14, 2023
Messages
12
With an SMB share, the incoming writes for large files will be split into a default size of 128K, which will likely be striped across all drives on the hardware RAID card. iSCSI uses a default 16K maximum recordsize, which may be causing a large amount of read-modify-write cycles and stressing the RAID card's cache. This is one of the reasons that RAID cards aren't recommended.
So, is there a way to resize the maximum recordsize of iSCSI?
 

Bann

Dabbler
Joined
Jan 14, 2023
Messages
12
When you add a new Zvol, scroll down, click on Advanced Options, you'll find the block size. :)
Thanks!
I just switched H740p to eHBA mode, all 5 disk now could be managed by TrueNAS.
Then I created the pool (raidz with nvme cache), and Zvol with 128K block size.
And the problem is still there.o_O But the speed drops after 5.5GB now.
The RAM should get data first, then the cache, seems something wrong in my build.:confused:
 
Last edited:

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I just switched H740p to eHBA mode, all 5 disk now could be managed by TrueNAS.

While it's an improvement over the RAID disks, "HBA Mode" on a RAID controller is still not quite equal to a true HBA, as it may still be using the RAID card driver, which isn't as well-tested and mature as the HBA drivers in TrueNAS.

The RAM should get data first, then the cache, seems something wrong in my build.:confused:

For incoming writes, while RAM does "get the data first" in a sense, it's important to understand how the "cache" devices work, or don't work, in ZFS. The vdev labeled "cache" is for read caching only - it has no involvement in the write path. The "log" vdev does have involvement, but only for "synchronous writes" - those that come in and insist upon being committed to stable storage before the server returns the "OK" up the chain.

I wrote a fairly lengthy post that goes into some technical detail on the write throttle behavior, and I'll link it below, but the short answer is that for sustained performance, you will never be able to go faster than your actual data vdevs can write out.


Is there a particular reason you are looking into the iSCSI performance here? For the use case of storing and sharing large files, SMB or NFS are generally considered superior.
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
H740p to eHBA
according the DELL's docs the eHBA mode is a hybrid RAID/HBA mode that presents configured disks as RAID an unconfigured disks as HBA. how well this works, and works with ZFS, I cannot say, but the specs wording says it's a valid and correct way to use ZFS.
The RAM should get data first, then the cache, s
there is no disk write cache. as a quick summary, writes go:
async: RAM > DISK
SYNC: RAM+(ZIL OR LOG), then from RAM to disk the normal way. the ZIL/SLOG is only read on a crash to reconstruct the write(s) that were missed.

you need to do some more reading. the build appears fine, it's your understanding that is incomplete.

be cautious about RAID5/RAIDz1 with such large disks though. have a backup at the least.
 
Last edited:

Bann

Dabbler
Joined
Jan 14, 2023
Messages
12
Is there a particular reason you are looking into the iSCSI performance here? For the use case of storing and sharing large files, SMB or NFS are generally considered superior.
Well, I'm trying different ways to build my NAS, iSCSI is friendly to software usage, that's why its on my list.
So, if I want to use iSCSI with high read and write performance, I should use NVMe ssds to build the pool?
 

Bann

Dabbler
Joined
Jan 14, 2023
Messages
12
according the DELL's docs the eHBA mode is a hybrid RAID/HBA mode that presents configured disks as RAID an unconfigured disks as HBA. how well this works, and works with ZFS, I cannot say, but the specs wording says it's a valid and correct way to use ZFS.

there is no disk write cache. as a quick summary, writes go:
async: RAM > DISK
SYNC: RAM+(ZIL OR LOG), then from RAM to disk the normal way. the ZIL/SLOG is only used on a crash to reconstruct the write.

you need to do some more reading. the build appears fine, it's your understanding that is incomplete.

be cautious about RAID5/RAIDz1 with such large disks though. have a backup at the least.
Thanks for your explanation, I'll do some more readings about the ZFS filesystem.
 
Top