Citadel - Build Plan and Log

ctag

Patron
Joined
Jun 16, 2017
Messages
225
So I can't seem to find a solution I actually like for offline backups, so I'm just rsyncing to an ext4 partition on the drive and calling it at that.

Code:
$ rsync -avz bns-citadel.local.csb.sh:/mnt/main-pool/data/ /mnt/tmp/data/
 

ctag

Patron
Joined
Jun 16, 2017
Messages
225
More boot pool notices.

2020-08-08_07-00.png
 

ctag

Patron
Joined
Jun 16, 2017
Messages
225
A couple hours after getting that email, I tried to access a jail on the server, and found it was unresponsive.

As far as I could tell the server was totally locked up. No SSH, no ACPI shutdown, and no EMG shutdown when I yanked the battery backup (just to check). I wound up hard-resetting the box, and then it would not boot. I swapped the USB ports that the boot thumbdrives were in, and then it booted.

Since there was a checksum error on da7p2 again, I re-ran the `zpool clear` from earlier.

I bought another "SanDisk Cruzer Fit CZ33 32GB USB 2.0 Low-Profile Flash Drive" to replace this one that's errored out twice now.
 

ctag

Patron
Joined
Jun 16, 2017
Messages
225
Replacement USB boot drive arrived today.

Noticed this when reading through the manual:
1597195056786.png

Good to know.

Since this is the first time I've done this, here's the process used:
  1. Shut down the box.
  2. Added the new usb disk. (good thing there's an extra USB port!)
  3. Booted box back up.
  4. Box won't boot.
  5. Reboot box.
  6. Still won't boot.
  7. Removed the new drive. Now box boots. (todo: find a monitor and check this someday)
  8. Hot plug the new drive and it shows up.
  9. Can't switch to the new drive because it's smaller.
  10. Give up to go whine about it online.
2020-08-12_18-31.png

2020-08-12_18-38.png

So I put the old, failing disk back into the mirror.

I super duper wish that I had somehow forced the original boot mirror to use 25 gigs instead of the whole ~30, so that this kind of discrepancy wouldn't be a show stopper.
 

ctag

Patron
Joined
Jun 16, 2017
Messages
225
The obvious solution here is to back up the config and reinstall FreeNAS on the new mirror disks. And I'll do that, but first I noticed this in the manual:
To mirror the operating system device, move to additional devices and press spacebar to select them also. If all of the selected devices are larger than 64 GiB and none are connected through USB, a 16 GiB swap partition is also created.
That sounds pretty swell to me. On top of that there are posts I've made pretty far back in this thread complaining about USB boot drives, so it's definitely time to move on.

I bought some drives to play with:
1597356388488.png

1597356427992.png

1597356448091.png

Given the price similarities I'm pretty sure they're all the same white label guts in different shells, but it will still be kinda fun to compare them. Hopefully by the time they arrive I can figure out how to do a mirror pool boot install with custom parition sizes so I can shrink it and not run into the too-few-sectors-no-bueno situation again.

I thought it would be cute to try using one of these drives as a SSD cache, but all of the wisdom I can find online points to maxing out RAM before even considering trying that. So while I have the case open I'm going to replace the 24GB with 48GB of ram. I think my motherboard would technically support 96GB of memory, but I can't find it for any sort of price I think I'd be willing to pay.
2020-08-13_15-40.png
 
Last edited:

ctag

Patron
Joined
Jun 16, 2017
Messages
225
While digging around looking for documentation, I discovered that there was a new bios upgrade available for my T7500 server. There was a "Linux" download, so I thought to myself "how hard can this be?" and decided to dig in. I found a Dell support page that describes upgrading in a Linux-only environment, and from it's reading the Linux .bin upgrade seemingly isn't applicable to this server at all because it isn't UEFI... Instead they recommend making a live FreeDOS USB stick and launching the Windows .exe upgrade from there. OK... I dorked around with the FreeDOS stuff until finally it both booted and had the .exe upgrade file available, only to find that the executable returns "not compatible with DOS environments" and exits. Fantastic job there, Dell.

I wound up making a Hirens Boot USB drive (via MoeUSB) and ran the upgrade from there, which seemed to work well enough.
IMG_20200814_123536.jpg


I also removed these weird SATA passthrough risers from the box. They kept the SATA cable's clip from being effective, and seemed designed for right-angle connectors anyway. One of them had a strange temperature probe thingy on it, which isn't configurable or visible from the BIOS and causes an alert on boot if disconnected. There's dispariginly little information about the probe or alert message online. I yanked it anyway.

Oh and here's a link to a shoddy operator's manual for my T7500.
 
Last edited:

ctag

Patron
Joined
Jun 16, 2017
Messages
225
Last night, after dorking around, I put the side of the server back on. For a couple months now I've been running it opened to keep temperatures down, but I took a chance to rearrange the hard disks to try and improve air flow.

I woke up to this email:
New alerts:
* Device: /dev/da0 [SAT], Temperature 44 Celsius reached critical limit of 44 Celsius (Min/Max 44!/44!).

When I try to see what happened... It wasn't captured on the logging...
1597499680019.png

Sigh. I guess the spike up to 44 just missed the sampling point?

Luckily it was captured in the syslog:
Aug 15 01:20:13 bns-citadel smartd[4100]: Device: /dev/da0 [SAT], Temperature 44 Celsius reached critical limit of 44 Celsius (Min/Max 44!/44!)
Aug 15 01:20:13 bns-citadel smartd[4100]: Device: /dev/da0 [SAT], Temperature 44 Celsius reached critical limit of 44 Celsius (Min/Max 44!/44!)
Aug 15 02:20:13 bns-citadel smartd[4100]: Device: /dev/da0 [SAT], Temperature 44 Celsius reached critical limit of 44 Celsius (Min/Max 44/44)
Aug 15 02:20:13 bns-citadel smartd[4100]: Device: /dev/da0 [SAT], Temperature 44 Celsius reached critical limit of 44 Celsius (Min/Max 44/44)
Aug 15 03:20:13 bns-citadel smartd[4100]: Device: /dev/da0 [SAT], Temperature 44 Celsius reached critical limit of 44 Celsius (Min/Max 44/44)
Aug 15 03:20:13 bns-citadel smartd[4100]: Device: /dev/da0 [SAT], Temperature 44 Celsius reached critical limit of 44 Celsius (Min/Max 44/44)
Aug 15 04:20:13 bns-citadel smartd[4100]: Device: /dev/da0 [SAT], Temperature 44 Celsius reached critical limit of 44 Celsius (Min/Max 44/44)
Aug 15 04:20:13 bns-citadel smartd[4100]: Device: /dev/da0 [SAT], Temperature 44 Celsius reached critical limit of 44 Celsius (Min/Max 44/44)
Aug 15 05:20:13 bns-citadel smartd[4100]: Device: /dev/da0 [SAT], Temperature 46 Celsius reached critical limit of 44 Celsius (Min/Max 44/46!)
Aug 15 05:20:13 bns-citadel smartd[4100]: Device: /dev/da0 [SAT], Temperature 46 Celsius reached critical limit of 44 Celsius (Min/Max 44/46!)
Aug 15 06:20:14 bns-citadel smartd[4100]: Device: /dev/da0 [SAT], Temperature 46 Celsius reached critical limit of 44 Celsius (Min/Max 44/46)
Aug 15 06:20:14 bns-citadel smartd[4100]: Device: /dev/da0 [SAT], Temperature 46 Celsius reached critical limit of 44 Celsius (Min/Max 44/46)
Aug 15 07:20:13 bns-citadel smartd[4100]: Device: /dev/da0 [SAT], Temperature 44 Celsius reached critical limit of 44 Celsius (Min/Max 44/46)
Aug 15 07:20:13 bns-citadel smartd[4100]: Device: /dev/da0 [SAT], Temperature 44 Celsius reached critical limit of 44 Celsius (Min/Max 44/46)
Aug 15 08:20:13 bns-citadel smartd[4100]: Device: /dev/da0 [SAT], Temperature 44 Celsius reached critical limit of 44 Celsius (Min/Max 44/46)
Aug 15 08:20:13 bns-citadel smartd[4100]: Device: /dev/da0 [SAT], Temperature 44 Celsius reached critical limit of 44 Celsius (Min/Max 44/46)

So... Not likely a spike, but instead a sustained temperature... That isn't visible in the graphs (which show a maximum temperature of 42.3C).

I'm pretty frustrated with the Reporting tab in general too. Having each disk temperature on a separate graph makes comparisons less intuitive and more clunky, and some of the graphs refuse to populate at all. And my old friend, the scrolling javascript bug, is back:

Keep an eye on the scrollbar. Using only the mouse scroll wheel to navigate:
 

Attachments

  • 1597499623293.png
    1597499623293.png
    15.5 KB · Views: 525

ctag

Patron
Joined
Jun 16, 2017
Messages
225
I've switched from the failing CyberPower UPS to a Tripp Lite "SMART1500LCDT" on this machine. I couldn't find the exact model on the drop down list, so I searched around and found that there was a bug report open for it (associated forum thread).

I went ahead and selected "SMART1500LCD" as the closest neighbor, and it appears to work fine. The system shuts down when expected after a power outage.
 

ctag

Patron
Joined
Jun 16, 2017
Messages
225
Boy howdy, it's been a busy couple of days. Bear with me here.

First of all, maybe things aren't so dandy on the UPS front:
NOTIFICATION: 'LOWBATT'

UPS: 'ups'

Statistics recovered:

1) Battery charge (percent)
battery.charge: 100

2) Battery runtime (seconds)
battery.runtime: 3331
You read that right, LOWBATT signal sent while (as far as I can tell) there was no outage, and the battery was at 100% charge... And also the server didn't shut down like it otherwise is supposed to for LOWBATT.

My new 48GB of RAM arrived! It's 6x 8GB sticks, and after completing the built-in memtest I found myself wondering "Gee, since I have this second CPU riser arriving soon.. And it has 6 additional RAM slots on it.. I wonder if I can combine my existing 24GB with this 48GB to achieve 72GB!" So I turned to the trusty (very) dusty manual for insight. On page 16 we see a review of supported RAM types, and then a breakdown chart of what RAM configurations are officially supported for both single and dual CPU systems:
1597806721814.png

As you can see, none of the configurations deal with 6 8-gig plus 6 4-gig sticks like I want. But further down in the article there are Memory Population Rules:
Dual CPU configurations (6 DIMM slots on MB plus 6 DIMM slots on Riser)
If configuration contains DIMMs of all the same size, populate in the following order: MB_DIMM1, Riser_DIMM1, MB_DIMM2,
Riser_DIMM2, MB_DIMM3, Riser_DIMM3, MB_DIMM4, Riser_DIMM4, MB_DIMM5, Riser_DIMM5, MB_DIMM6, Riser_DIMM6.
If configuration contains DIMMs of mixed sizes, populate the larger DIMMs in the dual-processor riser.
So it looks like I'm going to be placing all 6 4-gig DIMMs back with the original CPU, and all of the new 8-gig DIMMs on the riser card. I'm glad I checked, because otherwise I was going to do a split arrangement with 36GB next to each CPU.

Moving on, in addition to having an SSD boot mirror it seemed neat to try making a cheapo SSD pool to run jails. This was solely brought on by the advent of DRAM-less sub $25 SSDs on the market. If this works, I'll have some speedy storage for my jails, and then nightly backups of them to the WD Red zpool. All of those tester SSDs I ordered seemed about the same, but I like the Crucial BX500 ones best, because I can pry the plastic shell open and retrieve the PCB, which helps cut down on overheating (which they do during a simple badblocks run).

I had a doozy of a time trying to migrate my iocage jails from main-pool to the new SSD jail-pool. I wound up following these instructions to export each jail to a .zip file, but then had to use this workaround to unpack and `zfs recv` the jails because `iocage import` would consume all of the RAM+swap and crash the system. I also found these instructions using zfs send, and suspect they may be better for me, but did not use them this time.

Now that the jails are migrated and running, I can see a noticeable speedup in page load and reactions from Nextcloud. Awesome!
 
Last edited:

ctag

Patron
Joined
Jun 16, 2017
Messages
225
Installing the second CPU had a small hiccup. Initially the system failed to boot. It turns out the little plastic clip for the riser card gets worn out, and doesn't press the riser into it's motherboard socket well enough anymore. Putting the case side back on pushes on it and lets the system boot, but of course now the case side has to stay on.

It looks like I can use all of the RAM!
1598106824040.png

1598106960114.png
 

ctag

Patron
Joined
Jun 16, 2017
Messages
225
Just upgraded to 12.0.

Very happy to see the dashboard page now utilizes screen space much more than 11.3 did. Looks great.

Logged back in and got a critical alert with some trace from python after unlocking the main pool. Came here to post it but now it has vanished from the alert feed..

I did receive an email with the error though, so here it is:

* Failed to check for alert VolumeStatus:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/middlewared/plugins/alert.py", line 706, in __run_source
alerts = (await alert_source.check()) or []
File "/usr/local/lib/python3.8/site-packages/middlewared/alert/source/volume_status.py", line 31, in check
for vdev in await self.middleware.call("pool.flatten_topology", pool["topology"]):
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1238, in call
return await self._call(
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1206, in _call
return await self.run_in_executor(prepared_call.executor, methodobj, *prepared_call.args)
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1110, in run_in_executor
return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
File "/usr/local/lib/python3.8/site-packages/middlewared/utils/io_thread_pool_executor.py", line 25, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.8/site-packages/middlewared/plugins/pool.py", line 438, in flatten_topology
d = deque(sum(topology.values(), []))
AttributeError: 'NoneType' object has no attribute 'values'

It is mentioned in this thread, but I didn't see the discussion go anywhere.

And one of my jails is failing to start after the upgrade. I get an error about devfs_ruleset not existing from iocage. Changing the ruleset from 5 to 4 in the jail edit UI allows it to start, but I'm not familiar enough with devfs to know what that did.

Upgrading jails to 12.0 to match the base system. Saw this line scroll past:

1609262382079.png


So 12.0 was EOL before TrueNAS 12.0 shipped? There's something unsettling about that dynamic where by the time IX is able to get a release out to the NAS users (which shouldn't be ultra-fast or anything, and they release plenty often enough for me!) it is already obsolete and not receiving security patches.

Upgrading jail failed.

1609263952073.png


Appears to be the same as: https://www.truenas.com/community/threads/freenas-truenas-upgrade.89655/
But I'm not sure I want to go editing iocage's files to get around it..
 
Last edited:

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504

ctag

Patron
Joined
Jun 16, 2017
Messages
225
Thanks danb35!

I got confused looking at the dashboard
1609264074448.png


Instead of the shell
1609264097514.png



`iocage upgrade bns-xwing -r 12.2-RELEASE` still does not work though. AFAIK it's not urgent to have the jails match the base OS so I think I'm going to leave it until iocage is working again.
 

ctag

Patron
Joined
Jun 16, 2017
Messages
225
Trying to migrate a service from a jail to a bhyve VM so that it's newer version can have docker. Part of that migration is moving files, but I cannot seem to figure out what is going on with the VM disk, it isn't available mounted anywhere I've checked.

Went to the documentation to look for clues, but found this:
1609952904545.png


Is that an indication that VMs will be leaving the community version of TrueNAS?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Is that an indication that VMs will be leaving the community version of TrueNAS?
You're confusing the old TrueNAS 11 documentation with the TrueNAS 12 CORE documentation... TrueNAS and FreeNAS are only the same product from 12.0 and the older TrueNAS (11 and before) was the Enterprise product (Now TrueNAS 12.0 Enterprise), which carries that limitation.

Check the FreeNAS 11.3 documentation instead. (the TrueNAS 12.0 CORE documentation is sadly not great to work with and the products have enough in common that you should be able to figure it out).

I cannot seem to figure out what is going on with the VM disk, it isn't available mounted anywhere I've checked.
Since the guest VM disk is a ZVOL (block storage) TrueNAS can't see it and won't mount it.

You can connect from the VM to the host over NFS or SMB to copy files. The internal bridge for the VM should operate at 10Gbit, so should not be a bottleneck.
 

ctag

Patron
Joined
Jun 16, 2017
Messages
225
Thanks sretalla.

I've noticed that I'm logged out of the web UI much more quickly since upgrading to 12.0. Previously I would only have to log in after restarting my computer, but now I'll have to log back in after about an hour of inactivity. I'm interested to know if anyone else is seeing similar.

This was an issue in 11.3 as well, but was resolved.
 

ctag

Patron
Joined
Jun 16, 2017
Messages
225
Had to install Ubuntu 16.04 in Bhyve for some reason. Needed to follow the directions here to get it to work:

 

ctag

Patron
Joined
Jun 16, 2017
Messages
225
An update for 12.0-U2 is available. The release warning is still present though.
2021-02-12_09-26.png

I thought that was just a temporary bug?

And I've noticed the web UI has a dangerous habit of theming buttons to appear as though they are disabled outright. Either disable the button or don't, the 'primary' color setting already allows the designers to be opinionated and highlight the default choice.
2021-02-12_09-27.png


Here's another button that is not disabled:
2021-02-12_09-45.png


And an actually disabled button for reference:
2021-02-12_09-44.png


On a cursory glance I can't even find a disabled secondary button. It seems the web UI prefers to hide the button entirely if it is not actionable, which I don't believe is a bad rule in itself but it does lend to the maze-run that attempting common tasks can become.
 

ctag

Patron
Joined
Jun 16, 2017
Messages
225
Preemptively upgrading to mysql80 for my nextcloud jail. Following directions here: https://www.benhup.com/freebsd/mysql8-install-upgrade-update/

Bless whoever put that guide together, because databases totally suck to work with. Even with all of that information, I had errors from pkg about the jail being old, and then innodb errors that kept the service from starting. Someone filed a bug report that was summarily dismissed, but at least it lead to a workaround for this totally-a-bug.
 

ctag

Patron
Joined
Jun 16, 2017
Messages
225
Went through and brought the rest of the jails to 12.2 release. I remember this being a huge pain, but it went OK this time. Part of that was mysql/mariadb upgrades in a comparatively sane manner, which I don't believe I've fully appreciated previously. The one system still using postgres still requires acrobatics with full backups and imports to keep your data across upgrades.

 
Top