Register for the iXsystems Community to get an ad-free experience and exclusive discounts in our eBay Store.

TrueCommand 1.3 is Available

Basil Hendroff

Neophyte Sage
Joined
Jan 4, 2014
Messages
1,253
The SMR drives alerts were much easier to do on TrueCommand. Its got a more powerful 2 factor alerting engine and we wanted to make it available ASAP and for users with previous releases of software.... not just 12.0. If you only have one system, there is the option of manually checking the drives.
According to this StackExchange thread, dmidecode -t 17 can be used to determine whether or not ECC RAM is being used on a FreeBSD system. In the same way that TC 1.3 is able to alert the end-user about SMR drives, I wonder if it's useful for a future version of TC (or TrueNAS) to alert the end-user that non-ECC RAM is being used in a system? Like an SMR drive, there's nothing to prevent its use, but it's strongly discouraged. Of course, the CPU may not support ECC RAM either, but at least there's an alert that FreeNAS/TrueNAS is not being run in a recommended setting.
 
Last edited:

Basil Hendroff

Neophyte Sage
Joined
Jan 4, 2014
Messages
1,253
I suspect this is a valid arrangement with multiple management terminals...

screenshot.364.png


...but this is not...

screenshot.361.png


My question is... does upgrading to a newer version of TC result in changes to the TC database so that a rollback to an earlier version of TC is no longer possible?
 
Last edited:

kenmoore

TrueCommand Project Lead
iXsystems
Joined
May 1, 2019
Messages
45
@Basil Hendroff : In answer to your questions (I will tag them with numbers below)

1. Multiple TC instances/versions using the same database
Answer: No, that is not recommended. I have seen a couple instances of database corruption when I startup multiple containers using the same data directory. I think it is something related to how docker does the directory->container passthrough, but I am not sure about the exact cause.

2. Rolling a 1.3 TC container back to 1.2
Answer: This is actually possible, and I do this regularly when testing out nightly/production versions of TrueCommand locally. We are very careful to not institute breaking changes in the TC database layout, specifically so that it is backwards compatible as much as possible. Now I would not assume/rely on this though, we can never predict the future and a breaking change might be needed some time in a later version.

3. ECC RAM alert
Answer: This is an interesting possibility, so I went ahead and created a ticket for this request here: https://jira.ixsystems.com/browse/TC-1477
We will talk about this internally and see if this is something we can put together for you in a future version.

4. Memory Statistics
Answer: This is a very tricky topic, as you can tell by all the various ways memory usage gets reported across different platforms. Basically it comes down to the question "What are you using your system for?". If you are using it purely as a file server, then you typically want as much of your memory used at all times (preferably by the ZFS cache, with a high cache hit rate). If you are using it as a VM/application platform, or some kind of hybrid mix, then that picture gets all muddied and people have a tendency to get alarmed over memory stats which are not actually a problem.

What we have found so far with TrueCommand is that the memory statistics on the dashboard typically caused more questions for people than answers, because those memory statistics do not easily translate into actionable information (aside from the total memory size, which you referenced earlier). After working with experienced sysadmins and IT teams, we decided to shift the dashboard metrics to a layered framework based on priorities:
1. Multi-system dashboard card: Show the top-priority information for the system - things that can result in production down situations or degraded performance.
2. Single-system dashboard: Open up the next layer of the system metrics for enhanced diagnostics (Example: Is the high CPU utilization a temporary state, or has it been this way for a while?)
3. Detailed system analysis via reports (typically as the result of an alert): Basically inspect any/all of the metrics about the system surrounding the time of the alert in order to gain insight into the causes/solutions for the issue.

We found that the memory statistics were more often used as part of the system analysis or post-alert inspection rather than a good source of top-level information for system admins, so we dropped it from the dashboard cards in 1.3 to prevent it from causing confusion. I do think a case could be made to add that back into the expanded dashboard metrics (probably as another time chart, similar to the storage growth or usage charts), so I went ahead and made an improvement ticket to track this change for a future version of TrueCommand: https://jira.ixsystems.com/browse/TC-1478

Sorry for the long reply, but I hope this answered all your questions!
 
Last edited:

Basil Hendroff

Neophyte Sage
Joined
Jan 4, 2014
Messages
1,253
@kenmoore Thank you for your considered response. You've clarified a number of points for me and given me some insight into the design of TC 1.3. Thank you also for considering my suggestions and for raising the relevant ticket requests.

Basically it comes down to the question "What are you using your system for?". If you are using it purely as a file server, then you typically want as much of your memory used at all times (preferably by the ZFS cache, with a high cache hit rate).
Thanks for this useful tip! I'll bear this in mind.

When I first saw the colour scheme for TC 1.3, like @Patrick M. Hausen in post #3, my first thought was 'WTF!', but I now understand the clever use of subdued colours. Using brighter colours for issues, makes it visually very easy to identify a server experiencing problems within a cluster of servers.
 
Last edited:

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
252
According to this StackExchange thread, dmidecode -t 17 can be used to determine whether or not ECC RAM is being used on a FreeBSD system. In the same way that TC 1.3 is able to alert the end-user about SMR drives, I wonder if it's useful for a future version of TC (or TrueNAS) to alert the end-user that non-ECC RAM is being used in a system? Like an SMR drive, there's nothing to prevent its use, but it's strongly discouraged. Of course, the CPU may not support ECC RAM either, but at least there's an alert that FreeNAS/TrueNAS is not being run in a recommended setting.
An SMR drive issue could cause a VDEV and pool failure and loss of a lot of data... it could be very catastrophic and predictable. We built the logic so that we can in future flag any other bad drive issues.

An ECC RAM issue is very rare on small systems and may cause a file corruption or a system reboot. It can waste a lot of time diagnosing the issue and so we recommend and support systems with ECC. If you have this issue, let us know and make the suggestion. There area lot of issues that we would like to detect early, but we are prioritizing based on events we see in deployed systems.
 

Basil Hendroff

Neophyte Sage
Joined
Jan 4, 2014
Messages
1,253
An ECC RAM issue is very rare on small systems and may cause a file corruption or a system reboot.
I believe you meant to say 'A non-ECC RAM issue is very rare on small systems and may cause a file corruption or a system reboot.'

It can waste a lot of time diagnosing the issue and so we recommend and support systems with ECC.
I wholeheartedly agree and herein lies the problem for the community. Many novices begin their journey with FreeNAS on h/w that isn't server-grade. While experimenting with FreeNAS, that's okay. The issues start when they begin to depend on that h/w to run FreeNAS. It's a potential showstopper for forum support. Consider the following conversation:

'FreeNAS has corrupted my file!' The conversation goes one of two ways now... 'It's possible that your use of non-ECC RAM has caused the corruption so I'm not going to waste my time diagnosing the issue any further until you address this.' or... 'I see you're using ECC-RAM. Let's investigate further'.

There are a lot of issues that we would like to detect early, but we are prioritizing based on events we see in deployed systems.
The SMR drive issue caught the ZFS community by surprise and IX Systems are to be commended for making it a priority to detect SMR drives in deployed TrueNAS systems. It would indeed be surprising if deployed systems that IX Systems were involved with used anything but ECC RAM. From this perspective, early detection of non-ECC RAM use would not even get a look in. However, it's not surprising to see FreeNAS community builds on non-server grade hardware.

The ability of TC 1.3 to detect SMR drives got me thinking, from a community perspective, that the ability to warn the user that non-server grade components (such as non-ECC RAM) were in use, might be an interesting consideration for TC (or TrueNAS Core) in the future.
 
Last edited:

Basil Hendroff

Neophyte Sage
Joined
Jan 4, 2014
Messages
1,253

Basil Hendroff

Neophyte Sage
Joined
Jan 4, 2014
Messages
1,253
I thought it might be a theme thing, and I thought I remembered a theme setting, but in a little bit of clicking around and checking the manual (which has yet to be updated, BTW) I didn't find it.
The colour palette is in the top left-hand corner.
screenshot.378.png
 

KevDog

Senior Member
Joined
Nov 26, 2016
Messages
401
This is useful. TC 1.3 picked up that I had an SMR drive on one of my servers.

Disk is nowhere to be seen on the Resources % tile on any of my servers. Also, are flat lines expected on an active system because that's what I'm observing?

View attachment 39953

On all systems with SMB and NFS active, the Clients tile shows SMB zero and NFS is nowhere to be seen.

View attachment 39954
Just scratching my head -- based on what you posted with your graphs? How did you conclude you had a SMR drive?
 

Basil Hendroff

Neophyte Sage
Joined
Jan 4, 2014
Messages
1,253
Just scratching my head -- based on what you posted with your graphs? How did you conclude you had a SMR drive?
Not from the graphs, but from an alert. The ability to detect SMR drives appears to be built into TC.
screenshot.824.png
 

KevDog

Senior Member
Joined
Nov 26, 2016
Messages
401
Top