multi_report.sh version for Core and Scale 3.0

joeschmuck · Jan 21, 2023

NickF said:
I Have a fairly complex setup for my system. I just ran an output.

Yes, it looks to be very complex. If you don't mind sharing your email address with me (not openly here), and just me so I can see all the data in the correct format and the drive data, I should be able to fix whatever is going on. Your setup intrigues me. What you would need to do is run the script with the -dump email switches and a question will pop up asking if you are sure and just answer the question. An email will be generated to you and me. As I have said before, I do not share email addresses with anyone. Without all the data, it would take a long time to guess how to fix it all. If you want to see what would be sent to me beforehand, run -dump, this is the data I would be getting, nothing more, nothing less. I will know your email address as the one used in the TO: line in the email, mine will be there as well. I can send you an email with any future correspondence. I you prefer not, just let me know and we can do a PM here in the forums.

NickF said:
I'm also not entirely sure what the "Wear Level counter" indicator means.

It means that when you that when you reach below 10% wear level value (90% of your SSD life has been used up) then you get a warning message.

awasb · Jan 22, 2023

joeschmuck said:
Thanks for the heads up. I just checked the script of course I didn't check for any fragmentation above 5%. The Hex math got me yet again. We now convert it to decimal (base 10) and things are working again, I hope for everyone. I've updated the file, please download it again. Version is now 2.0.1 and includes the fix. Hope this works for you. Please let us know.

Works as intended! Thanks for the fix!

joeschmuck · Jan 22, 2023

awasb said:
Works as intended! Thanks for the fix!

Thanks for the feedback. I have two more problems to fix related to Wear Level so another update will be coming out shortly, hopefully I don't break something in the process.

joeschmuck · Jan 22, 2023

Version 2.0.2 fixes the two issues since v2.0 was released, yesterday

I hope nothing else pops up but if it does, just let me know.

NickF · Jan 22, 2023

joeschmuck said:
Version 2.0.2 fixes the two issues since v2.0 was released, yesterday

I hope nothing else pops up but if it does, just let me know.

Thanks Joe,
Confirmed on my end. The issues I reported yesterday at fixed

awasb · Jan 22, 2023

Dito!

da_da · Jan 22, 2023

download/attachment needs to be corrected??? Can not seem to get the latest version.

joeschmuck · Jan 22, 2023

da_da said:
download/attachment needs to be corrected??? Can not seem to get the latest version.

What is wrong with this copy (4 posts above)? Please be specific because I downloaded it and it seems correct.

multi_report.sh version for Core and Scale

I Have a fairly complex setup for my system. I just ran an output. Yes, it looks to be very complex. If you don't mind sharing your email address with me (not openly here), and just me so I can see all the data in the correct format and the drive data, I should be able to fix whatever is going...

www.truenas.com

Deeda · Jan 23, 2023

joeschmuck said:
Version 2.0.2 fixes the two issues since v2.0 was released, yesterday

I hope nothing else pops up but if it does, just let me know.

Thanks for the update!

I just updated one of my TrueNAS servers that is having the issue with reporting incorrectly on the "Last Test Age". Unfortunately that error still seems to persist. Eg, as you can see from the attachments, it tells me that drive da0 has a last test age of 610, but using the -dump command shows that da0 has completed short and long tests.

joeschmuck · Jan 23, 2023

@Deeda Sent you a PM.

GrimmReaperNL · Jan 27, 2023

Hello joeschmuck, Thanks for the script. it was the only one I could find that would run on scale. I have a question though.
In the 'ZPool/ZFS Status Report Summary' it lists my boot- and only other pool 'TrueNAS', and it says I've never ran a scrub on on 'TrueNAS'.
I have scrubs setup for that pool though. I can't even add a scrub task for the boot-pool.
Is something wrong, or could this be because a scrub hasn't been ran since I've only just started using the scripts?

joeschmuck · Jan 27, 2023

GrimmReaperNL said:
Thanks for the script. it was the only one I could find that would run on scale.

Thanks for that. I'm working on a new rewrite right now but it should be functionally identical, but I hope it runs more reliable. BASH is not my friend.

GrimmReaperNL said:
and it says I've never ran a scrub on on 'TrueNAS'

The way to verify it's not the script is to run the command zpool status

The line(s) you are looking for specifically will look like this: (look for "scan:")
scan: scrub repaired 0B in 06:14:53 with 0 errors on Sun Jan 1 06:14:53 2023

The very end tells you the last time it was scrubbed.

If you think the script is wrong, please send me a private message (Conversation) and we can troubleshoot the issue. While I don't think the script is at fault, it could be. There seems to be all sorts of oddities.

NugentS · Jan 27, 2023

@GrimmReaperNL In System Settings/Boot - Stats/Settings (top right) you can set a scrub interval for the boot pool

GrimmReaperNL · Jan 27, 2023

joeschmuck said:
The way to verify it's not the script is to run the command zpool status

The line(s) you are looking for specifically will look like this: (look for "scan:")
scan: scrub repaired 0B in 06:14:53 with 0 errors on Sun Jan 1 06:14:53 2023

The very end tells you the last time it was scrubbed.

For the 'TrueNAS' pool the line looks like:
scan: resilvered 5.51T in 1 days 01:35:46 with 0 errors on Thu Jan 19 21:32:31 2023
So I guess I'll wait for my next scheduled scrub and see if that changes anything.

Boot-pool reads like your example.

Thanks for the quick reply!

joeschmuck · Jan 27, 2023

GrimmReaperNL said:
So I guess I'll wait for my next scheduled scrub and see if that changes anything

I sent you a PM, if we can share some information then I could see if it's a problem and if so, start on fixing it.

NickF · Jan 27, 2023

Hi joe,
Just a thought as you continue to improve this script, it would be neat if you could somehow query the statistical information that IX pulls into the graphs and display that same information somewhere in the report. Like CPU used percentage on average for the day, ARC hit rates for the day, disk busy for the day, etc.

I’ll mess around and see where that stuff is stored later and see if I could help

joeschmuck · Jan 27, 2023

NickF said:
Hi joe,
Just a thought as you continue to improve this script, it would be neat if you could somehow query the statistical information that IX pulls into the graphs and display that same information somewhere in the report. Like CPU used percentage on average for the day, ARC hit rates for the day, disk busy for the day, etc.

I’ll mess around and see where that stuff is stored later and see if I could help

Not trying to be mean but Nope. The main focus of this script is for hard drive awareness. A separate script would be best made to poll the other requested data. I've been asked to include UPS data. I'm not sure if I will, even though it's an easy thing to add. Here is my argument... What data will a person obtain from the UPS data? It's not like the script will magically run and send an email when it's on battery. TrueNAS also already does this. I don't mind the suggestions but there is a point where I'd like to remain. If it gets too big, someone is going to need to pay me a salary

joeschmuck · Jan 27, 2023

GrimmReaperNL said:
For the 'TrueNAS' pool the line looks like:
scan: resilvered 5.51T in 1 days 01:35:46 with 0 errors on Thu Jan 19 21:32:31 2023
So I guess I'll wait for my next scheduled scrub and see if that changes anything.

Boot-pool reads like your example.

Thanks for the quick reply!

Sent you an update but I'm not so sure the current script isn't correct. Your pool 'resilvered' last, it didn't 'scrub'. They are not really the same thing. But I did add 'resilvering' as an option now, but normal indications will happen after the next scrub.

joeschmuck · Feb 1, 2023

Here is Multi-Report v2.0.6 and I've made some minor adjustments for a few people so the reports work better, the bulk of the people will not notice a difference. I also reduced data collection when the -dump parameter is used, also started collecting 'zpool' status data in case someone has a script issue with Zpool data for me to troubleshoot.

Right now I'm looking for some additional data for my script simulator so I can take into account various drives (HDD, SSD, NVMe). The drives in particular I'm looking for right now for HDD's are those with Helium and for HDD's that are SCSI. These add a level of complexity to decoding. But I still need SSD's and NVMe drives. I have some data but not nearly enough. I can feed hundreds of drive data files into the simulator now and generate charts and reports, but it's in version 2.1 which is in heavy working. And I'm attempting to make the accuracy of the data 100% perfect, if possible.

If you have send me any -dump email data in the past week, you do not need to resend it. I have what I need and I thank you for the data, it has really helped me. Hopefully I will see more data flow in over the next week.

Anyway, if you find anything wrong, please say something.

-Joe

awasb · Feb 1, 2023

Works fine over here. Thank you, Joe!

Important Announcement for the TrueNAS Community.

multi_report.sh version for Core and Scale 3.0

Old Man

Patron

Old Man

Old Man

Attachments

Guru

Patron

Explorer

Old Man

Explorer

Attachments

Old Man

Explorer

Old Man

MVP

Explorer

Old Man

Guru

Old Man

Old Man

Old Man

Attachments

Patron

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "multi_report.sh version for Core and Scale"

Similar threads