CPU stats not working in 12.0 U4

Status
Not open for further replies.

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
I don't think that enabling SSH necessarily is a good idea. It increases the attack surface.
You don't expose your TrueNAS to any untrusted network, do you? SSH with public/private key authentication is probably the most secure remote administration technology existing. You can create a non-root user and use su as an additional step.
 

feanorian

Cadet
Joined
Dec 25, 2018
Messages
9
EDIT 2: Given some are having issues installing this, here are the steps:
  1. Unzip the archive and transfer main.7827c30e6733c013d061.js.patched to /usr/local/www/webui (this should be a BINARY transfer)
  2. Rename original file: mv main.7827c30e6733c013d061.js main.7827c30e6733c013d061.js.orig
  3. Rename patch: mv main.7827c30e6733c013d061.js.patched main.7827c30e6733c013d061.js

Thank you for this. Note for everybody else--depending on how you uploaded the file, may need to fix permissions by running chown root:wheel main.7827c30e6733c013d061.js and chmod 644 main.7827c30e6733c013d061.js while having the privileges to do so.
 

Robertr

Dabbler
Joined
Sep 22, 2017
Messages
31
I uploaded with ssh from my mac, changed names and fixed permissions but it's still not working even after rebooting.
No biggie but since the server is mostly idle I often use the CPU stats on the dashboard to quickly see if something's going on.
Stats are working on the report page.

EDIT: Downloaded straight to folder with wget, unzipped, permissions are correct.. still not working.
stats.png
 
Last edited:

adrianwi

Guru
Joined
Oct 15, 2013
Messages
1,231
My understanding is that one bug fix caused a secondary unexpected impact.
Most testing is done at the API level and the storage protocol level...we automate as much as possible. These bugs cause real problems.
UI testing for "visible" issues. are more difficult to automate and we rely on the community. These bugs are annoying.
Unusual hardware issues are very dependent on user community. These bugs impact most users very rarely.
So, it didn't get through "testing".. it got through "automated testing" and was caught at "community testing".
Thanks to the community.. we now have a temporary fix and a permanent fix in 12.0-U5
The more major issue of python-middleware crashes (which occur very rarely) is also in 12.0-U5
That was another issue were automated testing did not identify the issue... only the community.

Thanks for the detailed reply. I understand that most of the testing must be automated, but surely there's a small degree of user testing?

Does the login page load?
Can you login?
Does the dashboard load?
etc...
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
I feel like this is what I was experiencing during a disk replacement operation, that ended in the loss of a pool. For details, refer to the thread Pool restoration journey TrueNAS 12. The python middleware crashes were reproducible. While I can't attribute the pool loss directly to these crashes, I'll be feeling very vulnerable replacing a disk until 12.0-U5 is released. I do have one more SMR disk to be swapped out. Is there an ETA for this release?
12.0-U4 is expected to have fewer problems (mitigates the issues)
However, 12.0-U5 seems to have resolved the primary issue.
There is a special test image available for testing.....12.0-U4 + bug fix - we appreciate the brave souls that want to test this.
12.0-U5 is scheduled for august.
I'd expect the SMR drives were the cause of the pool loss... that is why we reacted to them.
The middleware might crash during the event due to the issues caused by SMR, but that should not impact the basic ZFS functions.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Thanks for the detailed reply. I understand that most of the testing must be automated, but surely there's a small degree of user testing?
Of course, but testing with all combinations of live traffic is very time consuming. As indicated, I think this was created by another bug fix and so didn't get the full testing. Even so, its very difficult to catch these without a major expansion of the test window. Its better for the community that we get 12.0-U4 out and fix the issue with a patch (available) and in 12.0-U5.
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Its better for the community that we get 12.0-U4 out and fix the issue with a patch (available) and in 12.0-U5.
Would you consider pushing a U4.1 out the door with just that single fix? That would definitely make the product look better if shown to team leaders etc.
 

revengineer

Contributor
Joined
Oct 27, 2019
Messages
193
IMHO we should encourage people to learn how to use Putty and WinSCP or the Windows Powershell or mac OS terminal equivalents exclusively.
I do use WinSCP for transfers and it does have a pull down menu for transfer settings (Default, ASCII, Binary). Perhaps it is not necessary or ignored for some protocols. I never tested this, so maybe it is just a left over habbit from the good old times. :smile:
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
I do use WinSCP for transfers and it does have a pull down menu for transfer settings (Default, ASCII, Binary)
WinSCP to my knowledge also supports FTP, which would require the setting. A copy via SSH is always binary.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Would you consider pushing a U4.1 out the door with just that single fix? That would definitely make the product look better if shown to team leaders etc.

Hi Patrick, for the two bugs discussed, we have made available patches or versions with the fixes.
Our general policy is that we would do a U4.1 release if these were bugs that impacted data reliability or system availability in a significant way. Neither of the bugs meet that criteria. So, we opted for just making the patched versions available.

As of now, the version which fixes the middleware crashing bug (the major one) has not been validated sufficiently in the wild...so it still has some risk of both not fixing the major issue or causing another issue. For anyone that has seen this issue, we would like you to test with it.

You could ask why we have that policy? Each release or sub-release requires about 2 weeks of intensive testing and consumes resources from 10 to 20 people plus a large automated lab. That effort detracts from further bug fixes (e.g U5 has 150 tickets) and feature development and test (e.g TrueNAS SCALE 21.06 and 21.08). Doing a U4.1 would have pushed SCALE 21.06 out by 2 weeks. So, our goal is to maximize productivity and grow our business, while keeping community data safe.

If possible, we would bring in the 12.0-U5 date.... 45 issues complete, but another 100 to do. Code freeze is mid-July and then the 2 week test cycle starts.
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Hi Patrick, for the two bugs discussed, we have made available patches or versions with the fixes.
I didn't know that. How do I install these fixes the official way? All I know is this forum thread with a member kindly supplying a fixed .js file for the dashboard bug.

Thanks!
Patrick
 

revengineer

Contributor
Joined
Oct 27, 2019
Messages
193
How do I install these fixes the official way? All I know is this forum thread with a member kindly supplying a fixed .js file for the dashboard bug.
Agreed, official patches in these situations would be helpful. The code posted on github and linked to my jira ticket is not immediately useful because it needs to be compiled into some way and nobody knows how to do this. I inquired about this in another thread and all I heard was crickets.
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
Of course, but testing with all combinations of live traffic is very time consuming. As indicated, I think this was created by another bug fix and so didn't get the full testing. Even so, its very difficult to catch these without a major expansion of the test window. Its better for the community that we get 12.0-U4 out and fix the issue with a patch (available) and in 12.0-U5.
Thanks for the explanations… but it feels strange to think that testing is so "automated" that it could let slip a bug which would have been caught by any human tester looking at the front page of the user interface. Not exactly hidden or hard to spot…

On the last post, I'd like to disagree: Better wait and ship a product which has been checked by actually using it than pushing a release to meet an arbitrary schedule. And such a visually annoying bug should be fixed as soon as possible; as @Patrick M. Hausen pointed out, it affects how TrueNAS looks, and thus the prospects of attracting further users/customers. SCALE 21.06 can wait—especially if it follows the same scheme, and the moment to jump on SCALE 21.x, in confidence that it is a mature plateform, if when SCALE 22 ships.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
I didn't know that. How do I install these fixes the official way? All I know is this forum thread with a member kindly supplying a fixed .js file for the dashboard bug.

The official way is 12.0-U5 (not the answer you want....but its the only safe answer)
The github with changes is here: https://github.com/truenas/webui/pull/5536

@revengineer seems to have posted a process that works. Does it not work? We don't recommend people do this on any production systems, but we appreciate people testing these on dev/test systems.

As indicated, its not the major bug we would fix with a 4.1. The middleware crashing bug is higher priority. There are many other small issues.
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
@morganL Why are you minifying the JS at all? Is it that performance critical? Shouldn't enabling compression at the web server level be enough? Not minifying would make manual patching way easier ...
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Thanks for the explanations… but it feels strange to think that testing is so "automated" that it could let slip a bug which would have been caught by any human tester looking at the front page of the user interface. Not exactly hidden or hard to spot…

On the last post, I'd like to disagree: Better wait and ship a product which has been checked by actually using it than pushing a release to meet an arbitrary schedule. And such a visually annoying bug should be fixed as soon as possible; as @Patrick M. Hausen pointed out, it affects how TrueNAS looks, and thus the prospects of attracting further users/customers. SCALE 21.06 can wait—especially if it follows the same scheme, and the moment to jump on SCALE 21.x, in confidence that it is a mature plateform, if when SCALE 22 ships.

Hello @Etorix , We're always looking for volunteers to be human testers....
However, you are mistaken that it wasn't tested. It was tested, but the bug was introduced after it was tested.

We agree visual bugs are annoying, but it's a pretty good situation if this is perceived as the worst bug of 12.0-U4 and there is a mechanism to fix it temporarily and it will be fixed in 12.0-U5. No data is at risk. For people who are happy to wait, they should wait for 12.0-U5. However, 12.0-U4 had over 100 other improvements, so most people will benefit from it.

You may not be using SCALE... the thousands that are using SCALE are very keen to get to 21.06 BETA next week. The only way we get to quality is by getting through each of the stages and keeping to a reasonable schedule for the community using it. The bug fixes and new features going into SCALE 21.06 are much more significant.

Rest assured, we do take data safety and access, very seriously. However, for other issues, we have to be pragmatic and efficient. We have to keep our paying customers happy to pay for the development and test team. Progress and quality are important.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
@morganL Why are you minifying the JS at all? Is it that performance critical? Shouldn't enabling compression at the web server level be enough? Not minifying would make manual patching way easier ...

It's an interesting question for @Kris Moore
I would say that simplifying manual patching is not a priority.... it is just used for development and emergencies.
 

gunnahafta

Dabbler
Joined
Nov 5, 2018
Messages
32
Thanks. Confirm the patched file from @revengineer without a reboot.

Normally this wouldn't be a huge deal but I've been having CPU temp issues and just overhauled the cooling so really needed those easy access to view stats.
 

hescominsoon

Patron
Joined
Jul 27, 2016
Messages
456
I feel like this is what I was experiencing during a disk replacement operation, that ended in the loss of a pool. For details, refer to the thread Pool restoration journey TrueNAS 12. The python middleware crashes were reproducible. While I can't attribute the pool loss directly to these crashes, I'll be feeling very vulnerable replacing a disk until 12.0-U5 is released. I do have one more SMR disk to be swapped out. Is there an ETA for this release?
STH(Server the HOme) did an extensive testing of SMR and ZFS. The fact that rebuilds took forever should have given anyone pause. Given the extra stress put ont he drives during a z rebuild AND the fact that SMr drives then totally thrash themselves during a ebuild due to them having to go back and rewrite tracks thqat overlap once the rebuild starts...SMR drives..IMO...should be detected and HIGTHLY warned against in the UI(if not blacklisted)...i know..it's a user choice thing). I would highly suggest you do NOT install any more SMR drives. I would actually replace the drives with CMR drives. If your pool didn't have a backup this is the perfect time to do so.
 
Status
Not open for further replies.
Top