How to save these changes that will last a reboot?

Itay1778 · Dec 15, 2022

Hi, I use iSCSI with Proxmox but it causes a lot of "errors" in TrueNAS: read: connection lost
And according to what I've seen on the forum here and in the Proxmox forum it seems to be a known problem.
And to solve this you need to edit the /etc/local/syslog-ng.conf file
and add these lines:

Code:

#
# proxmox filters
#
filter f_cut_ctld01 { message("ctld") and message("IP_Proxmox: read: connection lost"); };


log { source(src); filter(f_cut_ctld01); flags(final); };

I add it after message filters
Don't know if it matters or not but that's what I've seen others do.
Credit: https://forum.proxmox.com/threads/i...seconds-to-freenas-solution.21205/post-431900

That stops the errors from appearing.
My problem is that it is not saved when I reboot TrueNAS and every time I have to edit the file...
How do I make these changes persist even after a reboot?

sretalla · Dec 16, 2022

If it's not available anywhere in the UI, then you'll have to resort to using a post-init script and using sed to insert your text on each boot.

But if you're going to do that, you might want to also raise a feature request to allow it in the UI.

jgreco · Dec 16, 2022

Itay1778 said:
That stops the errors from appearing.

Stopping the errors from appearing seems a poor idea. When your car engine "low oil" light goes on, the solution isn't to just keep driving and cover up the light with black tape. The message suggests that iSCSI is frequently timing out and having to reconnect, which is a bad state of affairs. This should be addressed, especially in an iSCSI environment.

This usually happens because people are trying to do iSCSI on systems that are not dimensioned appropriately for the task. Usually people are doing weird things, such as RAIDZ instead of mirrors, or less than 64GB of RAM, or using gigabit ethernet but expecting "more", or other reasons as outlined in the article on block storage.

The path to success for block storage

It seems like I haven't written a sticky for awhile, but just in the last week I've had to cover this topic several times. ZFS does two different things very well. One is storage of large sequentially-written files, such as archives, logs, or data files, where the file does not have the middle...

www.truenas.com

Itay1778 · Dec 22, 2022

jgreco said:
Stopping the errors from appearing seems a poor idea. When your car engine "low oil" light goes on, the solution isn't to just keep driving and cover up the light with black tape. The message suggests that iSCSI is frequently timing out and having to reconnect, which is a bad state of affairs. This should be addressed, especially in an iSCSI environment.

This usually happens because people are trying to do iSCSI on systems that are not dimensioned appropriately for the task. Usually people are doing weird things, such as RAIDZ instead of mirrors, or less than 64GB of RAM, or using gigabit ethernet but expecting "more", or other reasons as outlined in the article on block storage.

The path to success for block storage

It seems like I haven't written a sticky for awhile, but just in the last week I've had to cover this topic several times. ZFS does two different things very well. One is storage of large sequentially-written files, such as archives, logs, or data files, where the file does not have the middle...

www.truenas.com

Yes, I understand, but it turns out that this is a known problem in Proxmox and TrueNAS for a long time.
If you do a short Google search with this error, you will see on both the Proxmox and TrueNAS forums that a lot of people have this problem, and according to what I was able to understand, it is because Proxmox checks the connection every few seconds and this causes this message to appear. All the VMs I run work perfectly fine and there doesn't seem to be a problem, of course, if there is a possibility to really solve this problem it would be better but there doesn't seem to be one.

jgreco · Dec 22, 2022

Itay1778 said:
it turns out that this is a known problem in Proxmox and TrueNAS for a long time.

No, it isn't. It's a known problem caused by your misdesigning a crappy ZFS pool that doesn't perform sufficiently to keep up with the requirements of iSCSI. That article I pointed you at, before, on the path to success for block storage, it specifically discusses how to appropriately design a ZFS filer for iSCSI. Let's look at yours.

Pool: 3X 3TB WD Red (CMR)

Betcha this is RAIDZ1. See my article, Point #2) You need to use mirrors for performance. Then keep right on rolling down to Point #3) Plan to use lots of vdevs. And then I'm going to go out on a limb, the privilege I get for having debugged these issues for people for many years, and I'm going to make what is almost certainly an accurate guess that you are also violating Point #6) Keep the pool occupancy rate low.

Moving on.

RAM: 32GB 1866Mhz ECC RDIMM

7) It is best to have at a bare minimum 64GB RAM to do block storage.
This speaks for itself.

Itay1778 said:
you will see on both the Proxmox and TrueNAS forums that a lot of people have this problem

Yes, there are a lot of people who feel that their situations are magic and that they are exempt from needing to follow the rules. They are not exempt. iSCSI is a demanding protocol and as I said it takes substantial resources to make it work. The specification for iSCSI itself includes a connection drop for connections that are nonresponsive for five seconds, this is the problem you're running into, and it is just part of the protocol. You need to get your system to respond appropriately.

Itay1778 said:
what I was able to understand,

Well, we've now expanded your understanding, I trust.

Itay1778 said:
because Proxmox checks the connection every few seconds and this causes this message to appear.

That is not what causes the messages to appear. Your NAS responding slowly, and then the Proxmox iSCSI initiator dropping the connection, is what makes the message appear.

Itay1778 said:
if there is a possibility to really solve this problem it would be better but there doesn't seem to be one.

I always love handing someone the answer to their problem and then being told that there isn't an answer to their problem.

Itay1778 · Dec 22, 2022

jgreco said:
No, it isn't. It's a known problem caused by your misdesigning a crappy ZFS pool that doesn't perform sufficiently to keep up with the requirements of iSCSI. That article I pointed you at, before, on the path to success for block storage, it specifically discusses how to appropriately design a ZFS filer for iSCSI. Let's look at yours.

Betcha this is RAIDZ1. See my article, Point #2) You need to use mirrors for performance. Then keep right on rolling down to Point #3) Plan to use lots of vdevs. And then I'm going to go out on a limb, the privilege I get for having debugged these issues for people for many years, and I'm going to make what is almost certainly an accurate guess that you are also violating Point #6) Keep the pool occupancy rate low.

Moving on.

7) It is best to have at a bare minimum 64GB RAM to do block storage.
This speaks for itself.

Yes, there are a lot of people who feel that their situations are magic and that they are exempt from needing to follow the rules. They are not exempt. iSCSI is a demanding protocol and as I said it takes substantial resources to make it work. The specification for iSCSI itself includes a connection drop for connections that are nonresponsive for five seconds, this is the problem you're running into, and it is just part of the protocol. You need to get your system to respond appropriately.

Well, we've now expanded your understanding, I trust.

That is not what causes the messages to appear. Your NAS responding slowly, and then the Proxmox iSCSI initiator dropping the connection, is what makes the message appear.

I always love handing someone the answer to their problem and then being told that there isn't an answer to their problem.

The 3X WD Red
It's not the iSCSI pool, I have a separate pool just for that (I forgot to update my signature) with fast disks, which is also RaidZ1, but the performance I get is over 1Gb, so that's not the problem.
And storage is at 50 percent on the pool. Proxmox has a file system on the iSCSI so I only have one Zvol.
And as for 32GB, I have a lot of free RAM, it doesn't use it that much...
I'm only running 2 VMs at the moment so there's no reason for it to be a slowness issue or anything similar that would be creating these alerts.

jgreco · Dec 22, 2022

So, in other words, you're in that group of people who feel that their situations are magic and that they are exempt from needing to follow the rules.

Itay1778 · Dec 23, 2022

jgreco said:
So, in other words, you're in that group of people who feel that their situations are magic and that they are exempt from needing to follow the rules.

Why say it that way?
I respect everyone here, I asked a simple question and I'm not the only one who asked it. This is a problem with Proxmox that affects the way it connects to TrueNAS via iSCSI. This is not a problem with TrueNAS, instead of saying that I built a crap pool and telling me that I don't know what I'm doing, you can do a short search and see that there is something in what I'm saying.
Here is a link to the Proxmox forum where everyone there has this problem. And that's since the days of FreeNAS 9.3.

iSCSI Reconnecting every 10 seconds to FreeNAS solution

Hi folks, Proxmox seems to be connecting and reconnection every 10 seconds to my FreeNAS machine Feb 28 10:26:11 nas ctld[57937]: 10.0.0.1: read: connection lost Feb 28 10:26:11 nas ctld[2904]: child process 57937 terminated with exit status 1 Feb 28 10:26:21 nas ctld[57947]: 10.0.0.1: read...

forum.proxmox.com

When you told me in my other post that I should upgrade the TrueNAS because my hardware was not good and I had to change everything for ECC RAM I listened to you and did it
But now seeing your answer here insults and invalidates me and my knowledge. I would expect more from a moderator.

Jailer · Dec 23, 2022

Itay1778 said:
I would expect more from a moderator.

What is this more that you expect, incorrect information? You should listen to a guy who has decades of experience in doing what you are attempting to do.

jgreco · Dec 23, 2022

Itay1778 said:
Why say it that way?

Because you obviously don't believe it applies to you, and I've been dealing with people like that for many years. It is a harsh but valid assessment.

Itay1778 said:
you can do a short search and see that there is something in what I'm saying.

I certainly can, but speaking as the person who has probably done more work than anyone else here on these forums on this issue, I also don't NEED to do a search. Most roads lead back to me and the resources I've produced over the years.

I'm intimately familiar with the "problem"; you can refer to past discussions of Bug #1531 by searching for "bug 1531" and my handle, to find dozens of discussions. There *is* something to what you're saying. I knew about it a decade ago, and have studied it extensively. When ZFS is nonresponsive, five seconds or more, you will get iSCSI disconnects. This is most prevalent with ZFS choking on large numbers of writes, the actual thing that 1531 discusses, but also goes to other problems where ZFS is not able to service requests quickly. This is exacerbated because ZFS is a copy-on-write filesystem, which means that there are pathological cases where it becomes exceedingly difficult to respond to a stack of requests even given the five seconds that iSCSI allows.

This means that iSCSI needs to be resourced more heavily than other protocols; you need more ARC and more free disk space, and if you want high performance, you'd also want to strongly consider even MORE ARC and lots of L2ARC.

Resource - Why iSCSI often requires more resources for the same result

iSCSI is a SAN protocol. NFS, CIFS, etc., are NAS protocols. For a NAS protocol, the client sends a command to the filer, such as "open this file", or "read ten blocks", or "remove this file." On the filer, the local NAS protocol daemon...

www.truenas.com

The path to success for block storage

It seems like I haven't written a sticky for awhile, but just in the last week I've had to cover this topic several times. ZFS does two different things very well. One is storage of large sequentially-written files, such as archives, logs, or data files, where the file does not have the middle...

www.truenas.com

Some differences between RAIDZ and mirrors, and why we use mirrors for block storage

ZFS is a complicated, powerful system. Unfortunately, it isn't actually magic, and there's a lot of opportunity for disappointment if you don't understand what's going on. RAIDZ (including Z2, Z3) is good for storing large sequential files. ZFS will allocate long, contiguous stretches of disk...

www.ixsystems.com

Some insights into SLOG/ZIL with ZFS on FreeNAS

What is the ZIL? POSIX provides a facility for the system or an application to make sure that data requested to be written is actually committed to stable storage: a synchronous write request. Upon completion of a sync write request, the underlying filesystem is supposed to guarantee that a...

www.ixsystems.com

And a whole bunch of other stuff.

Itay1778 said:
This is a problem with Proxmox that affects the way it connects to TrueNAS via iSCSI.

No, I don't think so. It looks like RFC3720 (more recently RFC7143) compliant NOP Timeout and NOP Interval handling. This behaviour is described in RFC7143 sec 7.14, etc.; I would agree that it is a trainwreck. But the point is that the iSCSI timeouts basically need to happen at an interval shorter than SCSI timeouts (often as low as 7 seconds), which is how I tend to explain the derivation of the 5 second timeout to newcomers.

Itay1778 said:
Here is a link to the Proxmox forum where everyone there has this problem. And that's since the days of FreeNAS 9.3.

Yeah. Proxmox forum users set up poorly designed FreeNAS systems and then complain about it. **YAWN**. Just another day ending in 'y'. I'm pretty sure I explained this to several of them, but I'm too lazy to do the searches to prove it. It's a holiday here.

You want to prove that it's a Proxmox problem to me? Have one of the Proxmox developers go on the forum and admit it. I doubt it'll happen, because I'm pretty certain their initiator is standards compliant. That's probably the nicest thing I've had to say about Proxmox this week, but facts are facts.

Itay1778 said:
But now seeing your answer here insults and invalidates me and my knowledge.

Well, I certainly do intend to invalidate your knowledge. I feel like a decade of correct knowledge to the contrary overrules your incorrect knowledge. Sorry if you feel insulted by that; my goal is to spread accurate and useful information. That includes not leaving incorrect information to fester buried in threads on this forum that other people might someday search for and run across.

If it turns out that information I provide is wrong, I am happy to be corrected and will have a nice crow sandwich for lunch. But I gotta warn you, one of the things that has made me successful in this profession is that I do not talk about stuff except where I am confident of the facts, this is why you will see me taking on stuff like iSCSI but I won't touch SMB with a ten foot pole.

Itay1778 said:
I would expect more from a moderator.

Noted. However, moderators here are not paid for the role, and their primary function is to approve posts and enforce rules. From a content and opinion point of view, I am just another community member.

Itay1778 · Dec 23, 2022

jgreco said:
Because you obviously don't believe it applies to you, and I've been dealing with people like that for many years. It is a harsh but valid assessment.

Thanks for this detailed answer, I appreciate it. It's not that I don't believe you or think you're wrong and I'm right. Absolutely not, and if I am really wrong then I would love to learn how to correct this mistake.

jgreco said:
certainly can, but speaking as the person who has probably done more work than anyone else here on these forums on this issue, I also don't NEED to do a search. Most roads lead back to me and the resources I've produced over the years.

I'm intimately familiar with the "problem"; you can refer to past discussions of Bug #1531 by searching for "bug 1531" and my handle, to find dozens of discussions. There *is* something to what you're saying. I knew about it a decade ago, and have studied it extensively. When ZFS is nonresponsive, five seconds or more, you will get iSCSI disconnects. This is most prevalent with ZFS choking on large numbers of writes, the actual thing that 1531 discusses, but also goes to other problems where ZFS is not able to service requests quickly. This is exacerbated because ZFS is a copy-on-write filesystem, which means that there are pathological cases where it becomes exceedingly difficult to respond to a stack of requests even given the five seconds that iSCSI allows.

This means that iSCSI needs to be resourced more heavily than other protocols; you need more ARC and more free disk space, and if you want high performance, you'd also want to strongly consider even MORE ARC and lots of L2ARC.

Resource - Why iSCSI often requires more resources for the same result

iSCSI is a SAN protocol. NFS, CIFS, etc., are NAS protocols. For a NAS protocol, the client sends a command to the filer, such as "open this file", or "read ten blocks", or "remove this file." On the filer, the local NAS protocol daemon...

www.truenas.com

I read some of the links you attached. Apart from the 64GB of RAM I don't think I have any other problem (and I don't think this is a problem either because I don't require a lot of traffic and use of the protocol at the moment) and the disks and their use is relatively low including their latency The performance of RAIDZ1 is currently enough for me, it is more than the 1Gb connection intended for iSCSI.

jgreco said:
You want to prove that it's a Proxmox problem to me? Have one of the Proxmox developers go on the forum and admit it.

If this is actually a problem with the way I configured my TrueNAS and iSCSI, which from what I understand is what you are saying. So I will try to connect it to a Windows Host or even an ESXi VM and see if the message appears. According to people who tried it then they saw that it does not appear. Only in connection with Proxmox.

Itay1778 · Dec 23, 2022

Itay1778 said:
So I will try to connect it to a Windows Host

jgreco said:
No, I don't think so.

Update: I connected the iSCSI to Windows Host and moved files around, there is no error message...and everything is look good.
explanation?

jgreco · Dec 23, 2022

Itay1778 said:
which from what I understand is what you are saying. [...] Only in connection with Proxmox.

This is the point that doesn't make sense. Both TrueNAS and Proxmox have been around for more than a decade. On the TrueNAS side, iSCSI moved from the istgt userland daemon to the new kernel-based ctld iSCSI some years ago, when mav was sponsored to do that work.

Now, typically, this happens when the NOP packets used to check connection state are not handled properly, or some other timeout-inducing event happens. We usually see this where users have set up iSCSI on top of a RAIDZ1 vdev, because a RAIDZ1 vdev of four disks will have about 1/2 the write IOPS and 1/4 the read IOPS of a proper mirror setup (and it only gets worse as the RAIDZ widens). This also relates to the ARC size; too small an ARC leads to problems allocating space for writes, too little freespace also results in that, all of which slows down the iSCSI.

Maybe the thing to do here is to Report a Bug, up in the top bar. I don't have an iSCSI system free right now that I can use to look into this. I'm guessing that iXsystems does. It would be best to have someone investigate why connections are resetting and address the root cause here.

sretalla · Dec 27, 2022

Itay1778 said:
Apart from the 64GB of RAM I don't think I have any other problem (and I don't think this is a problem either because I don't require a lot of traffic and use of the protocol at the moment) and the disks and their use is relatively low including their latency The performance of RAIDZ1 is currently enough for me, it is more than the 1Gb connection intended for iSCSI.

A small suggestion...

If you think your pool is good enough for the demands proxmox is putting on it, perhaps create a test pool with just a single SSD, share that out with iSCSI, replacing your HDD pool (temporarily) and see if you can get the errors to happen...

I'm almost certain you will not be able to reproduce the erros and it will prove that it's your pool design (3-way RAIDZ1) that's killing you. Remembering RAIDZ1 means the IOPS of a single disk, which would be something like 100-300... compare that to any recent SSD with 50'000+ IOPS and you'll see that's the bottleneck we're eliminating here.

Happy to be wrong, but I've seen it discussed here so many times.

Important Announcement for the TrueNAS Community.

How to save these changes that will last a reboot?

Itay1778

Patron

sretalla

Powered by Neutrality

jgreco

Resident Grinch

The path to success for block storage

Itay1778

Patron

The path to success for block storage

jgreco

Resident Grinch

Itay1778

Patron

jgreco

Resident Grinch

Itay1778

Patron

iSCSI Reconnecting every 10 seconds to FreeNAS solution

Jailer

Not strong, but bad

jgreco

Resident Grinch

Resource - Why iSCSI often requires more resources for the same result

The path to success for block storage

Some differences between RAIDZ and mirrors, and why we use mirrors for block storage

Some insights into SLOG/ZIL with ZFS on FreeNAS

Itay1778

Patron

Resource - Why iSCSI often requires more resources for the same result

Itay1778

Patron

jgreco

Resident Grinch

sretalla

Powered by Neutrality

Similar threads

Important Announcement for the TrueNAS Community.

How to save these changes that will last a reboot?

Patron

Powered by Neutrality

Resident Grinch

Patron

Resident Grinch

Patron

Resident Grinch

Patron

Not strong, but bad

Resident Grinch

Patron

Patron

Resident Grinch

Powered by Neutrality

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "How to save these changes that will last a reboot?"

Similar threads