@Basil Hendroff : In answer to your questions (I will tag them with numbers below)
1. Multiple TC instances/versions using the same database
Answer: No, that is not recommended. I have seen a couple instances of database corruption when I startup multiple containers using the same data directory. I think it is something related to how docker does the directory->container passthrough, but I am not sure about the exact cause.
2. Rolling a 1.3 TC container back to 1.2
Answer: This is actually possible, and I do this regularly when testing out nightly/production versions of TrueCommand locally. We are very careful to not institute breaking changes in the TC database layout, specifically so that it is backwards compatible as much as possible. Now I would not assume/rely on this though, we can never predict the future and a breaking change might be needed some time in a later version.
3. ECC RAM alert
Answer: This is an interesting possibility, so I went ahead and created a ticket for this request here:
https://jira.ixsystems.com/browse/TC-1477
We will talk about this internally and see if this is something we can put together for you in a future version.
4. Memory Statistics
Answer: This is a very tricky topic, as you can tell by all the various ways memory usage gets reported across different platforms. Basically it comes down to the question "What are you using your system for?". If you are using it purely as a file server, then you typically want as much of your memory used at all times (preferably by the ZFS cache, with a high cache hit rate). If you are using it as a VM/application platform, or some kind of hybrid mix, then that picture gets all muddied and people have a tendency to get alarmed over memory stats which are not actually a problem.
What we have found so far with TrueCommand is that the memory statistics on the dashboard typically caused more questions for people than answers, because those memory statistics do not easily translate into actionable information (aside from the total memory size, which you referenced earlier). After working with experienced sysadmins and IT teams, we decided to shift the dashboard metrics to a layered framework based on priorities:
1. Multi-system dashboard card: Show the top-priority information for the system - things that can result in production down situations or degraded performance.
2. Single-system dashboard: Open up the next layer of the system metrics for enhanced diagnostics (Example: Is the high CPU utilization a temporary state, or has it been this way for a while?)
3. Detailed system analysis via reports (typically as the result of an alert): Basically inspect any/all of the metrics about the system surrounding the time of the alert in order to gain insight into the causes/solutions for the issue.
We found that the memory statistics were more often used as part of the system analysis or post-alert inspection rather than a good source of top-level information for system admins, so we dropped it from the dashboard cards in 1.3 to prevent it from causing confusion. I do think a case could be made to add that back into the expanded dashboard metrics (probably as another time chart, similar to the storage growth or usage charts), so I went ahead and made an improvement ticket to track this change for a future version of TrueCommand:
https://jira.ixsystems.com/browse/TC-1478
Sorry for the long reply, but I hope this answered all your questions!