TrueCommand 2 Issues

Heracles

Wizard
Joined
Feb 2, 2018
Messages
1,401
Problem here... Bug TC-1812 is back in TrueCommand 2.1.1....
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,466
I'm not sure if it's the TC-1913 issue or something else, but importing certs doesn't seem to be working for me under 2.1.1 (and likely prior; the issue's existed for me since around 16 Nov 21, though I hadn't dug into it earlier).

TC is running under Docker in a Ubuntu 18 VM. acme.sh gets certs and deploys them to the container using the API as discussed in this thread (which involves a single API call using curl -l -g --data '{"args" : { "pem" : "$(cat fullchain.pem)", "key" : "$(cat privkey.pem)" } }' -u "username:password" -X GET http://[IP_ADDRESS]/api/ssl/cert_import, and that being the renew hook for that cert). It's been working well from June of 2020 until ~16 Nov 21. But at that time, a renewed cert was issued, but installing the cert failed.

When I logged into the server this afternoon to try to get it working again and ran that command (using actual credentials and full paths to the key/cert), I now get an error message back from TrueCommand: {"error":"invalid character '\\n' in string literal"}#. I assume it's griping about the fact that the cert/key files contain newline characters--as will always be the case.

So that's a problem. But worse is that when I paste in the cert and key files through the TrueCommand GUI, it appears to take them, and it saves them to disk (/data/truecommand/server.crt.custom and server.key.custom), but it doesn't actually serve them; it continues serving the old, expired cert. Can cert handling really be this broken?
 

Heracles

Wizard
Joined
Feb 2, 2018
Messages
1,401
Any reason why TC-2027 as still not been reviewed weeks after its creation ? Because it is a known issue that is back, it should be easy to fix. Original was rated High, acknowledging it is a significant one...

@morganL, @aervin, any clue why is that ? Any intention to get that one fixed once and for all ?
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,691
Any reason why TC-2027 as still not been reviewed weeks after its creation ? Because it is a known issue that is back, it should be easy to fix. Original was rated High, acknowledging it is a significant one...

@morganL, @aervin, any clue why is that ? Any intention to get that one fixed once and for all ?
The team is working on TC2.2 and very busy... it should get reviewed prior to that. Is anyone else having the issue?
 

Heracles

Wizard
Joined
Feb 2, 2018
Messages
1,401
Is anyone else having the issue?
Technically everyone has the issue.... As of now, no one has DNS redundancy. How many end up blocked on their primary DNS like me is probably a very low percentage but how many think they have DNS redundancy when in fact they do not, probably most if not everyone...
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,691
Technically everyone has the issue.... As of now, no one has DNS redundancy. How many end up blocked on their primary DNS like me is probably a very low percentage but how many think they have DNS redundancy when in fact they do not, probably most if not everyone...

Doesn't it depend on how you deploy the TC container... what approach are you using?

Can you use the host DNS?
Is there a workaround with docker compose or Kubernetes?

TrueCommand itself is not a high availability application.... the infrastructure needs to protect it.
 

Heracles

Wizard
Joined
Feb 2, 2018
Messages
1,401
Doesn't it depend on how you deploy the TC container...

Nope. The way I deployed my container created a failure on primary DNS (my bad) but, as basically everyone around the globe, I configured 2 DNS servers to ensure a service as critical as DNS will keep working even when one of my DNS fails. As such, TrueCommand should not goes down when its primary DNS does not work (the bug).

what approach are you using?

Can you use the host DNS?

The details are in TC-1812. The bug is created because the docker host is using its own external IP address as a DNS but docker's routing makes the DNS reply comes from the internal docker IP. The result is the DNS reply is not from the expected IP and is discarded for security reason.

Is there a workaround with docker compose or Kubernetes?

No. The workaround I mentioned in TC-1812 is how to avoid the failure on the primary DNS so TrueCommand works again from its single leg. Should the DNS fails by itself for any reason, TrueCommand is still not able to use its DNS redundancy.
TrueCommand itself is not a high availability application.... the infrastructure needs to protect it.

I never saw a single software / operating system / appliance that was not designed to handle at least 2 DNS servers. DNS is so critical for everything, the capability to probe more than one is not considered as high availability. It is standard operation.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,466
error message back from TrueCommand: {"error":"invalid character '\\n' in string literal"}#.
Reported as https://jira.ixsystems.com/browse/TC-2063
But worse is that when I paste in the cert and key files through the TrueCommand GUI, it appears to take them, and it saves them to disk (/data/truecommand/server.crt.custom and server.key.custom), but it doesn't actually serve them; it continues serving the old, expired cert.
Reported as https://jira.ixsystems.com/browse/TC-2064
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,691
The details are in TC-1812. The bug is created because the docker host is using its own external IP address as a DNS but docker's routing makes the DNS reply comes from the internal docker IP. The result is the DNS reply is not from the expected IP and is discarded for security reason.
So, the bug is only when the secondary DNS address is within the same docker complex?
Its really a limitation of docker networking and the lack of IP address transparency. TrueCommand is just acting securely.
Agree its annoying, but it is not necessarily the situation for most users. I'd encourage others with this case to upvote the issue.

I never saw a single software / operating system / appliance that was not designed to handle at least 2 DNS servers. DNS is so critical for everything, the capability to probe more than one is not considered as high availability. It is standard operation.

Agreed.... I think its likely that the test scenario with DNS within docker network is not part of the standard tests.
What is the docker host in this case?
Do other Apps accepts DNS responses from another IP address?
 

Heracles

Wizard
Joined
Feb 2, 2018
Messages
1,401
So, the bug is only when the secondary DNS address is within the same docker complex?
No, the bug is that TrueCommand tries only its first DNS and never probes its secondary DNS.

Its really a limitation of docker networking and the lack of IP address transparency.

No. What makes MY primary DNS fails is a docker networking problem. A primary DNS may fail for many reasons.... For TrueCommand, it does not matter WHY its primary DNS failed. Whenever the primary fails, it MUST probes its secondary DNS.

Do other Apps accepts DNS responses from another IP address?

Yes. From inside TrueCommand's container when TrueCommand fails to prove its secondary DNS, OpenSSL succeeds to contact my TrueNAS server calling it by its name. It takes a few seconds for the primary DNS request to fail and the second to succeed but after these few seconds, OpenSSL does connect my server by its name when TrueCommand can not.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,691
No, the bug is that TrueCommand tries only its first DNS and never probes its secondary DNS.



No. What makes MY primary DNS fails is a docker networking problem. A primary DNS may fail for many reasons.... For TrueCommand, it does not matter WHY its primary DNS failed. Whenever the primary fails, it MUST probes its secondary DNS.



Yes. From inside TrueCommand's container when TrueCommand fails to prove its secondary DNS, OpenSSL succeeds to contact my TrueNAS server calling it by its name. It takes a few seconds for the primary DNS request to fail and the second to succeed but after these few seconds, OpenSSL does connect my server by its name when TrueCommand can not.
Thanks.. my misunderstanding. The first DNS was failing due to docker networking, but the second DNS failing is due to TrueCommand,

I assume that is the case for either host-based DNs information or container-based?
 

Heracles

Wizard
Joined
Feb 2, 2018
Messages
1,401
Thanks.. my misunderstanding.

No problem.

but the second DNS failing is due to TrueCommand,

Indeed, because TrueCommand does not even try to use it.

I assume that is the case for either host-based DNs information or container-based?

Indeed, I do not see why TrueCommand would give a shot at its second DNS for a different kind of failure on the primary. I did not tried to push the debug any further than that (a primary DNS completely down ; a primary DNS's IP address up but deamon down ; ICMP unreachable against UDP 53 ; damaged DNSSec value ; ....). There are many ways a DNS may fail and I left to you the choice to test more failure scenarios if you wish or just ensure that every DNS will be probed (there can be more than 2) at least once before giving up on a query.
 

ralphys

Cadet
Joined
May 10, 2022
Messages
1
Howdy all,

I'd like to know if there is someone else experiencing high CPU usage while running TrueCommand in docker.

I'm running TrueCommand v2.1.1 & middleware v2.1.1-20220329 in Ubuntu Server 20.04. I have 9 TrueNAS Core systems connected to TrueCommand in Portainer for testing. I have disabled backups, logs are retained for just a month and everything else is pretty much default config.

I've noticed that every 5 minutes or so there is quite a burst in CPU and I/Os on the docker host. influxd seems to tax the box for about 30 seconds repeatedly; continuously. TrueCommand is the only container behaving like this.

Screen Shot 2022-05-10 at 11.52.38 PM.png


Screen Shot 2022-05-10 at 11.53.14 PM.png


It also seems to be getting worst over time. Anyone could certainly tell from the screenshot below when TrueCommand was added to Portainer:


Screen Shot 2022-05-11 at 12.16.55 AM.png


I understand that screenshots do not provide troubleshooting logs but it does give an idea of what I'm experiencing with TrueCommand.

Any suggestion is highly appreciated.
 
Top