How to save these changes that will last a reboot?

Itay1778

Patron
Joined
Jan 29, 2018
Messages
269
Hi, I use iSCSI with Proxmox but it causes a lot of "errors" in TrueNAS: read: connection lost
And according to what I've seen on the forum here and in the Proxmox forum it seems to be a known problem.
And to solve this you need to edit the /etc/local/syslog-ng.conf file
and add these lines:
Code:
#
# proxmox filters
#
filter f_cut_ctld01 { message("ctld") and message("IP_Proxmox: read: connection lost"); };


log { source(src); filter(f_cut_ctld01); flags(final); };

I add it after message filters
Don't know if it matters or not but that's what I've seen others do.
Credit: https://forum.proxmox.com/threads/i...seconds-to-freenas-solution.21205/post-431900

That stops the errors from appearing.
My problem is that it is not saved when I reboot TrueNAS and every time I have to edit the file...
How do I make these changes persist even after a reboot?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
If it's not available anywhere in the UI, then you'll have to resort to using a post-init script and using sed to insert your text on each boot.

But if you're going to do that, you might want to also raise a feature request to allow it in the UI.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
That stops the errors from appearing.

Stopping the errors from appearing seems a poor idea. When your car engine "low oil" light goes on, the solution isn't to just keep driving and cover up the light with black tape. The message suggests that iSCSI is frequently timing out and having to reconnect, which is a bad state of affairs. This should be addressed, especially in an iSCSI environment.

This usually happens because people are trying to do iSCSI on systems that are not dimensioned appropriately for the task. Usually people are doing weird things, such as RAIDZ instead of mirrors, or less than 64GB of RAM, or using gigabit ethernet but expecting "more", or other reasons as outlined in the article on block storage.

 

Itay1778

Patron
Joined
Jan 29, 2018
Messages
269
Stopping the errors from appearing seems a poor idea. When your car engine "low oil" light goes on, the solution isn't to just keep driving and cover up the light with black tape. The message suggests that iSCSI is frequently timing out and having to reconnect, which is a bad state of affairs. This should be addressed, especially in an iSCSI environment.

This usually happens because people are trying to do iSCSI on systems that are not dimensioned appropriately for the task. Usually people are doing weird things, such as RAIDZ instead of mirrors, or less than 64GB of RAM, or using gigabit ethernet but expecting "more", or other reasons as outlined in the article on block storage.

Yes, I understand, but it turns out that this is a known problem in Proxmox and TrueNAS for a long time.
If you do a short Google search with this error, you will see on both the Proxmox and TrueNAS forums that a lot of people have this problem, and according to what I was able to understand, it is because Proxmox checks the connection every few seconds and this causes this message to appear. All the VMs I run work perfectly fine and there doesn't seem to be a problem, of course, if there is a possibility to really solve this problem it would be better but there doesn't seem to be one.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
it turns out that this is a known problem in Proxmox and TrueNAS for a long time.

No, it isn't. It's a known problem caused by your misdesigning a crappy ZFS pool that doesn't perform sufficiently to keep up with the requirements of iSCSI. That article I pointed you at, before, on the path to success for block storage, it specifically discusses how to appropriately design a ZFS filer for iSCSI. Let's look at yours.

Pool: 3X 3TB WD Red (CMR)

Betcha this is RAIDZ1. See my article, Point #2) You need to use mirrors for performance. Then keep right on rolling down to Point #3) Plan to use lots of vdevs. And then I'm going to go out on a limb, the privilege I get for having debugged these issues for people for many years, and I'm going to make what is almost certainly an accurate guess that you are also violating Point #6) Keep the pool occupancy rate low.

Moving on.

RAM: 32GB 1866Mhz ECC RDIMM

7) It is best to have at a bare minimum 64GB RAM to do block storage.
This speaks for itself.

you will see on both the Proxmox and TrueNAS forums that a lot of people have this problem

Yes, there are a lot of people who feel that their situations are magic and that they are exempt from needing to follow the rules. They are not exempt. iSCSI is a demanding protocol and as I said it takes substantial resources to make it work. The specification for iSCSI itself includes a connection drop for connections that are nonresponsive for five seconds, this is the problem you're running into, and it is just part of the protocol. You need to get your system to respond appropriately.

what I was able to understand,

Well, we've now expanded your understanding, I trust.

because Proxmox checks the connection every few seconds and this causes this message to appear.

That is not what causes the messages to appear. Your NAS responding slowly, and then the Proxmox iSCSI initiator dropping the connection, is what makes the message appear.

if there is a possibility to really solve this problem it would be better but there doesn't seem to be one.

I always love handing someone the answer to their problem and then being told that there isn't an answer to their problem.
 

Itay1778

Patron
Joined
Jan 29, 2018
Messages
269
No, it isn't. It's a known problem caused by your misdesigning a crappy ZFS pool that doesn't perform sufficiently to keep up with the requirements of iSCSI. That article I pointed you at, before, on the path to success for block storage, it specifically discusses how to appropriately design a ZFS filer for iSCSI. Let's look at yours.



Betcha this is RAIDZ1. See my article, Point #2) You need to use mirrors for performance. Then keep right on rolling down to Point #3) Plan to use lots of vdevs. And then I'm going to go out on a limb, the privilege I get for having debugged these issues for people for many years, and I'm going to make what is almost certainly an accurate guess that you are also violating Point #6) Keep the pool occupancy rate low.

Moving on.



7) It is best to have at a bare minimum 64GB RAM to do block storage.
This speaks for itself.



Yes, there are a lot of people who feel that their situations are magic and that they are exempt from needing to follow the rules. They are not exempt. iSCSI is a demanding protocol and as I said it takes substantial resources to make it work. The specification for iSCSI itself includes a connection drop for connections that are nonresponsive for five seconds, this is the problem you're running into, and it is just part of the protocol. You need to get your system to respond appropriately.



Well, we've now expanded your understanding, I trust.



That is not what causes the messages to appear. Your NAS responding slowly, and then the Proxmox iSCSI initiator dropping the connection, is what makes the message appear.



I always love handing someone the answer to their problem and then being told that there isn't an answer to their problem.
The 3X WD Red
It's not the iSCSI pool, I have a separate pool just for that (I forgot to update my signature) with fast disks, which is also RaidZ1, but the performance I get is over 1Gb, so that's not the problem.
And storage is at 50 percent on the pool. Proxmox has a file system on the iSCSI so I only have one Zvol.
And as for 32GB, I have a lot of free RAM, it doesn't use it that much...
I'm only running 2 VMs at the moment so there's no reason for it to be a slowness issue or anything similar that would be creating these alerts.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
So, in other words, you're in that group of people who feel that their situations are magic and that they are exempt from needing to follow the rules.
 

Itay1778

Patron
Joined
Jan 29, 2018
Messages
269
So, in other words, you're in that group of people who feel that their situations are magic and that they are exempt from needing to follow the rules.
Why say it that way?
I respect everyone here, I asked a simple question and I'm not the only one who asked it. This is a problem with Proxmox that affects the way it connects to TrueNAS via iSCSI. This is not a problem with TrueNAS, instead of saying that I built a crap pool and telling me that I don't know what I'm doing, you can do a short search and see that there is something in what I'm saying.
Here is a link to the Proxmox forum where everyone there has this problem. And that's since the days of FreeNAS 9.3.

When you told me in my other post that I should upgrade the TrueNAS because my hardware was not good and I had to change everything for ECC RAM I listened to you and did it
But now seeing your answer here insults and invalidates me and my knowledge. I would expect more from a moderator.
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
I would expect more from a moderator.
What is this more that you expect, incorrect information? You should listen to a guy who has decades of experience in doing what you are attempting to do.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Why say it that way?

Because you obviously don't believe it applies to you, and I've been dealing with people like that for many years. It is a harsh but valid assessment.

you can do a short search and see that there is something in what I'm saying.

I certainly can, but speaking as the person who has probably done more work than anyone else here on these forums on this issue, I also don't NEED to do a search. Most roads lead back to me and the resources I've produced over the years.

I'm intimately familiar with the "problem"; you can refer to past discussions of Bug #1531 by searching for "bug 1531" and my handle, to find dozens of discussions. There *is* something to what you're saying. I knew about it a decade ago, and have studied it extensively. When ZFS is nonresponsive, five seconds or more, you will get iSCSI disconnects. This is most prevalent with ZFS choking on large numbers of writes, the actual thing that 1531 discusses, but also goes to other problems where ZFS is not able to service requests quickly. This is exacerbated because ZFS is a copy-on-write filesystem, which means that there are pathological cases where it becomes exceedingly difficult to respond to a stack of requests even given the five seconds that iSCSI allows.

This means that iSCSI needs to be resourced more heavily than other protocols; you need more ARC and more free disk space, and if you want high performance, you'd also want to strongly consider even MORE ARC and lots of L2ARC.





And a whole bunch of other stuff.

This is a problem with Proxmox that affects the way it connects to TrueNAS via iSCSI.

No, I don't think so. It looks like RFC3720 (more recently RFC7143) compliant NOP Timeout and NOP Interval handling. This behaviour is described in RFC7143 sec 7.14, etc.; I would agree that it is a trainwreck. But the point is that the iSCSI timeouts basically need to happen at an interval shorter than SCSI timeouts (often as low as 7 seconds), which is how I tend to explain the derivation of the 5 second timeout to newcomers.

Here is a link to the Proxmox forum where everyone there has this problem. And that's since the days of FreeNAS 9.3.

Yeah. Proxmox forum users set up poorly designed FreeNAS systems and then complain about it. **YAWN**. Just another day ending in 'y'. I'm pretty sure I explained this to several of them, but I'm too lazy to do the searches to prove it. It's a holiday here.

You want to prove that it's a Proxmox problem to me? Have one of the Proxmox developers go on the forum and admit it. I doubt it'll happen, because I'm pretty certain their initiator is standards compliant. That's probably the nicest thing I've had to say about Proxmox this week, but facts are facts.

But now seeing your answer here insults and invalidates me and my knowledge.

Well, I certainly do intend to invalidate your knowledge. I feel like a decade of correct knowledge to the contrary overrules your incorrect knowledge. Sorry if you feel insulted by that; my goal is to spread accurate and useful information. That includes not leaving incorrect information to fester buried in threads on this forum that other people might someday search for and run across.

If it turns out that information I provide is wrong, I am happy to be corrected and will have a nice crow sandwich for lunch. But I gotta warn you, one of the things that has made me successful in this profession is that I do not talk about stuff except where I am confident of the facts, this is why you will see me taking on stuff like iSCSI but I won't touch SMB with a ten foot pole.

I would expect more from a moderator.

Noted. However, moderators here are not paid for the role, and their primary function is to approve posts and enforce rules. From a content and opinion point of view, I am just another community member.
 

Itay1778

Patron
Joined
Jan 29, 2018
Messages
269
Because you obviously don't believe it applies to you, and I've been dealing with people like that for many years. It is a harsh but valid assessment.
Thanks for this detailed answer, I appreciate it. It's not that I don't believe you or think you're wrong and I'm right. Absolutely not, and if I am really wrong then I would love to learn how to correct this mistake.
certainly can, but speaking as the person who has probably done more work than anyone else here on these forums on this issue, I also don't NEED to do a search. Most roads lead back to me and the resources I've produced over the years.

I'm intimately familiar with the "problem"; you can refer to past discussions of Bug #1531 by searching for "bug 1531" and my handle, to find dozens of discussions. There *is* something to what you're saying. I knew about it a decade ago, and have studied it extensively. When ZFS is nonresponsive, five seconds or more, you will get iSCSI disconnects. This is most prevalent with ZFS choking on large numbers of writes, the actual thing that 1531 discusses, but also goes to other problems where ZFS is not able to service requests quickly. This is exacerbated because ZFS is a copy-on-write filesystem, which means that there are pathological cases where it becomes exceedingly difficult to respond to a stack of requests even given the five seconds that iSCSI allows.

This means that iSCSI needs to be resourced more heavily than other protocols; you need more ARC and more free disk space, and if you want high performance, you'd also want to strongly consider even MORE ARC and lots of L2ARC.

I read some of the links you attached. Apart from the 64GB of RAM I don't think I have any other problem (and I don't think this is a problem either because I don't require a lot of traffic and use of the protocol at the moment) and the disks and their use is relatively low including their latency The performance of RAIDZ1 is currently enough for me, it is more than the 1Gb connection intended for iSCSI.
You want to prove that it's a Proxmox problem to me? Have one of the Proxmox developers go on the forum and admit it.
If this is actually a problem with the way I configured my TrueNAS and iSCSI, which from what I understand is what you are saying. So I will try to connect it to a Windows Host or even an ESXi VM and see if the message appears. According to people who tried it then they saw that it does not appear. Only in connection with Proxmox.
 

Itay1778

Patron
Joined
Jan 29, 2018
Messages
269

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
which from what I understand is what you are saying. [...] Only in connection with Proxmox.

This is the point that doesn't make sense. Both TrueNAS and Proxmox have been around for more than a decade. On the TrueNAS side, iSCSI moved from the istgt userland daemon to the new kernel-based ctld iSCSI some years ago, when mav was sponsored to do that work.

Now, typically, this happens when the NOP packets used to check connection state are not handled properly, or some other timeout-inducing event happens. We usually see this where users have set up iSCSI on top of a RAIDZ1 vdev, because a RAIDZ1 vdev of four disks will have about 1/2 the write IOPS and 1/4 the read IOPS of a proper mirror setup (and it only gets worse as the RAIDZ widens). This also relates to the ARC size; too small an ARC leads to problems allocating space for writes, too little freespace also results in that, all of which slows down the iSCSI.

Maybe the thing to do here is to Report a Bug, up in the top bar. I don't have an iSCSI system free right now that I can use to look into this. I'm guessing that iXsystems does. It would be best to have someone investigate why connections are resetting and address the root cause here.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Apart from the 64GB of RAM I don't think I have any other problem (and I don't think this is a problem either because I don't require a lot of traffic and use of the protocol at the moment) and the disks and their use is relatively low including their latency The performance of RAIDZ1 is currently enough for me, it is more than the 1Gb connection intended for iSCSI.
A small suggestion...

If you think your pool is good enough for the demands proxmox is putting on it, perhaps create a test pool with just a single SSD, share that out with iSCSI, replacing your HDD pool (temporarily) and see if you can get the errors to happen...

I'm almost certain you will not be able to reproduce the erros and it will prove that it's your pool design (3-way RAIDZ1) that's killing you. Remembering RAIDZ1 means the IOPS of a single disk, which would be something like 100-300... compare that to any recent SSD with 50'000+ IOPS and you'll see that's the bottleneck we're eliminating here.

Happy to be wrong, but I've seen it discussed here so many times.
 
Top