Citadel - Build Plan and Log

ctag · Jun 3, 2021

Upgraded to 12.0-U4 and got this notification:

Space usage for pool "freenas-boot" is 82%. Optimal pool performance requires used space remain below 80%.

And made a thread to ask for help growing the pool by a specific amount.

In other news, I am sometimes a big kid who can do things on my own.

I recently decided to decommission my desktop computer. I've had it since 2013 and haven't really been using it recently due to a combination of switching jobs, adopting two murder-mitten kittens, and moving across town. In fact, I plugged it in at the new house solely to do the weekly Archlinux updates and nothing else. So while my laptop is currently filling all of the personal computing needs I have, I've found that I'm pretty sentimentally attached to that specific installation of Archlinux, and want to keep it around until I'm ready to build another desktop.

It was a bit of a struggle, but I eventually wrangled the OS onto a bhyve VM! Now I can keep the OS rolling through updates without keeping the space-heater that was my old desktop turned on. Or having to power it up just for updates, which is too much bother.

Most of the migration was very straightforward with the help of the archwiki. But removing the LUKS encryption and setting up a bootloader for bhyve was a real pain. Eventually I was able to figure out removing the encryption module from mkinicpio.conf, and then had to do the GRUB hack for booting to work.

Code:

$ head /var/log/pacman.log
[2013-12-15 03:23] [PACMAN] Running 'pacman -r /mnt -Sy --cachedir=/mnt/var/cache/pacman/pkg base'

ctag · Apr 8, 2022

I was enjoying some vacation today when my phone dinged.

I hotspotted my phone and logged into the web UI to check it out. The disk error was accompanied by a temperature complaint:
Device: /dev/da1 [SAT], Temperature 44 Celsius reached critical limit of 44 Celsius (Min/Max 37/44!).

OK, well I've been ignoring those for years at the advice of some forum post. No problem if it finally caught up to me.

And then the Web UI went down.

And my phone dinged again.

So I'm no longer on vacation.

There's a cold-storage backup at a relative's house 200 miles away. I'd really, really rather not use it. Especially since it doesn't have my jails on it, which is an oversight I now see should be fixed.

I make it home and log in over SSH. Each time I successfully log in, the connection closes after a couple seconds. I try shutting down, and the system boots me and then does not shut down.

So I hold the power button until it's off, and climb back in the car to go buy a hard drive before the only store closes:

I got one of each, since I don't know if shucking is still a valid strategy.

sretalla · Apr 9, 2022

ctag said:
I don't know if shucking is still a valid strategy.

It is... usually saves you a bunch and the disks are usually good (just a gamble on what they decided to put in there as it can change without notice).

ctag · Apr 9, 2022

Thanks sretalla.

Got home from the store and powered things back on. Truenas came back, recognized all of the disks, and automatically resilvered... Once it was done I cleared a checksum error and manually scrubbed. Everything looks good.

I'm not sure what's suspect here. The cable, controller, or Truenas bugging out on me.
It's pretty scary to see Truenas cascade some non failure like that.

ctag · Apr 10, 2022

... That's a different disk. TrueNAS is really wigging out on me here. I'm not home, should I power it down until I get there?

ctag · Apr 12, 2022

OK, using 3-letter serial number abbreviations to recap:

On 04/08 drives 9LC and then R9C faulted and 9LC was automatically removed.
On 04/10 drive G3C was faulted and removed.

There's some interesting stuff in the SMART data. All of it attached.

Drive 9LC:

SMART Error Log Version: 1
ATA Error Count: 3
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 3 occurred at disk power-on lifetime: 36267 hours (1511 days + 3 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 41 00 00 00 00 00 Error: ICRC, ABRT at LBA = 0x00000000 = 0

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 20 00 02 00 00 40 00 30d+05:32:58.776 READ FPDMA QUEUED
60 01 00 01 00 00 40 00 30d+05:32:58.776 READ FPDMA QUEUED
60 01 00 00 00 00 40 00 30d+05:32:58.775 READ FPDMA QUEUED
60 01 00 01 00 00 40 00 30d+05:32:58.775 READ FPDMA QUEUED
60 01 00 00 00 00 40 00 30d+05:32:58.746 READ FPDMA QUEUED

The other two errors appeared to happen at the same time, and were similar.

Drive R9C:

SMART Error Log Version: 1
No Errors Logged

Drive G3C:

SMART Error Log Version: 1
ATA Error Count: 67 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 67 occurred at disk power-on lifetime: 36202 hours (1508 days + 10 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 41 00 00 00 00 00 Error: ICRC, ABRT at LBA = 0x00000000 = 0

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 00 80 03 40 40 00 1d+22:24:16.078 READ FPDMA QUEUED
2f 00 01 10 00 00 00 00 1d+22:24:16.070 READ LOG EXT
2f 00 01 10 00 00 00 00 1d+22:24:16.070 READ LOG EXT
60 00 00 80 03 40 40 00 1d+22:24:16.062 READ FPDMA QUEUED
ec 00 00 00 00 00 00 00 1d+22:24:16.062 IDENTIFY DEVICE

Error 66 occurred at disk power-on lifetime: 36202 hours (1508 days + 10 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 41 00 00 00 00 00 Error: ICRC, ABRT at LBA = 0x00000000 = 0

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 00 80 03 40 40 00 1d+22:24:16.070 READ FPDMA QUEUED
ec 00 00 00 00 00 00 00 1d+22:24:16.062 IDENTIFY DEVICE
2f 00 01 10 00 00 00 00 1d+22:24:16.061 READ LOG EXT
2f 00 01 10 00 00 00 00 1d+22:24:16.061 READ LOG EXT
60 00 00 80 03 40 40 00 1d+22:24:16.043 READ FPDMA QUEUED

.. Oh boy.

The disk issues appear lightly correlated to max temperature:

[berocs@bns-kharselim smart]$ grep "Temperature" *
2022-04-12-da0.txt:194 Temperature_Celsius 0x0002 216 216 000 Old_age Always - 30 (Min/Max 18/46)
2022-04-12-da1.txt:194 Temperature_Celsius 0x0002 224 224 000 Old_age Always - 29 (Min/Max 18/45)
2022-04-12-da2.txt:194 Temperature_Celsius 0x0002 224 224 000 Old_age Always - 29 (Min/Max 18/41)
2022-04-12-da3.txt:194 Temperature_Celsius 0x0002 224 224 000 Old_age Always - 29 (Min/Max 18/40)
2022-04-12-da4.txt:194 Temperature_Celsius 0x0002 224 224 000 Old_age Always - 29 (Min/Max 18/41)
2022-04-12-da5.txt:194 Temperature_Celsius 0x0002 224 224 000 Old_age Always - 29 (Min/Max 18/41)

da0 (9LC) and da1 (G3C) both saw 45C or higher, and both had this fault issue and logged SMART errors. da4 (R9C) is an outlier here, having no errors and a lower max temp.

The ICRC, ABRT errors can apparently be caused by cable problems. I'm dubious if this is the issue, since it happened to three separate disks, back to back to back.

If this isn't imminent drive failure, my thoughts are:

Possibly the low-profile SATA cables being kinked by the side of the case pushing on them near the connector. That would explain at least 9LC and G3C, both of which are located where that happens.
Possibly temperature related, since I had begun running the system with the side panel back in place. I've removed the side panel again now.

I took the drives out, of the case and swapped the SATA cables. I'm hoping if it's a pinch or connection issue, I'll have staved it off by re-seating all of the connectors.

And now it's back in the closet with the side panel removed.

ctag · May 16, 2022

In preparing to switch to Truenas Scale, I'm attempting to migrate my main pool from GELI to native ZFS encryption. It is not going great.

After reading around a bit I decided to give replication tasks a chance, and copied the mail pool onto a backup pool made out of spare disks lying around. I set encryption in the UI and selected "recursive" only to find that apparently this leaves every dataset under the main pool dataset unencrypted on the receiving end.

Now that I've rebuilt the main pool and need to copy the data back, it looks like I'll need to manually make a replication task for /each/ dataset? Otherwise sub-datasets won't be encrypted?

I feel like I'm missing something here.. This new native encryption isn't making a lot of sense.

ctag · May 26, 2022

Well, we made it. Citadel is now on Truenas Scale 22.02.1!

For the pool migration, I wound up taking the arduous manual route of making a replication task for each sub dataset and manually setting the encryption. I still feel icky from it, but it looks like all of the data made it back onto the main pool.

Working with the K8s / Docker setup has been a double edged sword. On one hand, the services that I've set back up so far are performing great, and run noticeably faster than before. On the other hand working with containers has reinvigorated my hatred of everything hip and new. I don't want to learn how traefik uses doohickies to splurm the traffic and k8s ingress classes jarwhiz the header packets for docker pods to paddywash. Why does every jode with a keyboard want to invent a new realm of abstraction in terminology?

ctag · May 27, 2022

Recovering Web-UI Ports

This is just so I have a reference next time I FUBAR things.

What went wrong

A few days ago it finally happened: I locked myself out of my Truenas box. I've occasionally worried about this exact issue happening, but have always waved it off as something that Truenas would be programmed to prevent, or something I wouldn't be enough of an idiot to walk into.

What I goofed was pretty straightforward: I gave a traefik app (pod? container? whatever...) ports 9443 and 9080 to route traeffic [sic] for me. Then I tried to follow a guide and have traefik also route the truenas web UI, which involves changing the Truenas web UI ports. Without thinking I set the truenas ports to 9080 and 9443 (which were numbers stuck in my head for the obvious reason) and hit "send it"

Soon thereafter I noticed my mistake, but it was too late. The web UI was returning "404 page not found" and everything was broken.

How to fix it

There is a hacky solution: those web UI ports are recorded in a sqlite database. If you can access that database file and change them, the system will come back (without having to wipe your entire config).

Get to this file: `/data/freenas-v1.db`
I created a backup first: `cp /data/freenas-v1.db /data/freenas-v1.db.bak`
And then `scp`ed it to my box and opened with an sqlite browser.

In the database browser, find the 'system_settings' table and then edit your 'stg_guiport' and 'stg_guihttpsport' to a better value.

Copy the newly edited file back in place, with the same ownership and permissions, cross your fingers, and reboot the computer. Well, press the power button and then cross your fingers, but you get my point.

ctag · May 27, 2022

Setting up Paperless app with Tika+Gotenberg document conversion

Truenas Scale comes with a sizeable community app repo, in the form of TrueCharts. I'm still dubious of the ephemeral-feeling storage on containers, but decided to check them out anyway, and installed Paperless-ng(x) from the Available apps list.

To my surprise, this instance of Paperless didn't support converting odt/docx documents into PDFs like my VM one did. Lets fix that.

Note: In my case I had to create a bridge network adapter and switch the apps to it in order for app-to-app networking to function. There's probably a better solution, YMMV.

Setting up Paperless

In the paperless config documentation we can see that we want to be using a specific gotenberg and tika on ports 3000 and 9998:

services:
webserver:
environment:
PAPERLESS_TIKA_ENABLED: 1
PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000
PAPERLESS_TIKA_ENDPOINT: http://tika:9998

gotenberg:
image: gotenberg/gotenberg:7.4
restart: unless-stopped
command:
- "gotenberg"
- "--chromium-disable-routes=true"

tika:
image: ghcr.io/paperless-ngx/tika:latest
restart: unless-stopped

We'll install those containers in a moment. To begin, edit the Paperless app to add those additional environment variables:

In the side bar scroll down to "Configure Image Environment" and add the following. Remember to insert your Truenas's IP address, and look closely at the ports.

We're using port 9997 for gotenberg because ports below 3000 are disallowed during custom docker creation.
Click save and let your app restart.

Installing Tika

Now we're going to create a custom "app" by clicking the "Launch Docker Image" button in the top right of the apps page:

In the creation panel, fill out the following:

Docker Image -> Image Repository: ghcr.io/paperless-ngx/tika
Image tag: latest
Port forwarding -> Add: 9998 for both Container and Node

That's it! Hit Save and let the app build.

Installing Gotenberg

Same deal. Hit "Launch Docker Image" to create another custom image, and fill out:

Docker Image -> Image Repository: gotenberg/gotenberg
Image Tag -> 7.4
Configure Container Command -> Add -> `gotenberg`
Configure Container Args -> Add -> `--chromium-disable-routes=true`
Port Forwarding -> Add:
- Container Port -> 3000
- Node Port -> 9997

Hit save again and let the app build.

... That should be it! Go to Paperless and try converting a document or two!

ctag · May 27, 2022

Accessing apps over network from Truenas host

One of the first obstacles I encountered when trying to migrate my old jail and VM services to apps in Truenas Scale, was the breach between my Truenas host and the apps. Likely for security reasons (?) the apps could not ping/ssh/reach the host over my LAN, or vice versa.

There's some talk on the forum about adding a bridge interface, and binding that to the existing network interface. Apparently this only works for some admins when using Truenas's boot-time CLI (/etc/netcli) to add the bridge. In my case, there was no option to add the bridge in my netcli menu. Instead, I created the bridge node in the UI, and then added the enpxxx interface to it. It took several tries before the change 'clicked' before the trial timeout, and I could save the setting for real. Afterwards I had to go back and select that new 'br0' interface in each app as the network connection.

Now I can reach the host from the app! This shouldn't be necessary during regular operations, but was definitely useful when migrating files and databases into the containers.

ctag · May 28, 2022

Took the networking closet offline yesterday to install a new electrical outlet. Happened to be down juuuust long enough for my ISP to rotate the IP address, breaking everything.

With my old DNS provider DDNS was just a curl call with a key. I've wasted a couple hours now and cannot seem to get Cloudflare DDNS working. Despite being shoehorned into using it with the credentialing UI, there's no Cloudflare DDNS provider in Truenas, and seeing the discussion here (dating back to 2016...) it likely never will. Other options I tried include a cron job on the host machine, and a script on a VM. Neither work.

ctag · May 28, 2022

Setting up Dynamic DNS with Cloudflare and ddclient app

There's an app in TrueCharts for ddclient, and it supports DDNS for Cloudflare. Only one small issue: it doesn't work. Let's fix that.

While installing the app, or editing the app after installing, go to "Security and Permissions" and enable "Show Advanced Security Settings." In the additional options that appear, un-select "ReadOnly Root Filesystem"

Aaaaand, you're done! The ddclient should start functioning normally now.

Editing ddclient.conf

To configure ddclient inside the container, open a shell from the Truenas UI.

Once inside, use 'vi' editor to change the settings in '/config/ddclient.conf'

There is no vim or nano. And no package manager that I could find.

Getting log files from the container

In Truenas's UI I found myself attempting to troubleshoot ddclient by looking at the logs, but every time I opened them the window was blank. Ugh.

Luckily, the Download button in the logs screen will return actual logs in an actual file. Good enough.

The errors I was seeing on a stock ddclient app installation:

2022-05-28T17:59:40.960582166Z [s6-init] making user provided files available at /var/run/s6/etc...exited 0.
2022-05-28T17:59:41.013695087Z [s6-init] ensuring user provided files have correct perms...exited 0.
2022-05-28T17:59:41.015785326Z [fix-attrs.d] applying ownership & permissions fixes...
2022-05-28T17:59:41.016943555Z [fix-attrs.d] done.
2022-05-28T17:59:41.018232387Z [cont-init.d] executing container initialization scripts...
2022-05-28T17:59:41.020387746Z [cont-init.d] 01-envfile: executing...
2022-05-28T17:59:41.029087490Z [cont-init.d] 01-envfile: exited 0.
2022-05-28T17:59:41.031276193Z [cont-init.d] 01-migrations: executing...
2022-05-28T17:59:41.032792162Z [migrations] started
2022-05-28T17:59:41.032838400Z [migrations] no migrations found
2022-05-28T17:59:41.033529225Z [cont-init.d] 01-migrations: exited 0.
2022-05-28T17:59:41.035682617Z [cont-init.d] 02-tamper-check: executing...
2022-05-28T17:59:41.048095239Z [cont-init.d] 02-tamper-check: exited 0.
2022-05-28T17:59:41.050028621Z [cont-init.d] 10-adduser: executing...
2022-05-28T17:59:41.056267685Z groupmod: /etc/group.265: Read-only file system
2022-05-28T17:59:41.056300359Z groupmod: cannot lock /etc/group; try again later.
2022-05-28T17:59:41.058666369Z usermod: /etc/passwd.266: Read-only file system
2022-05-28T17:59:41.058700580Z usermod: cannot lock /etc/passwd; try again later.
2022-05-28T17:59:41.058974730Z
2022-05-28T17:59:41.059003861Z -------------------------------------
2022-05-28T17:59:41.059019961Z _ ()
2022-05-28T17:59:41.059034836Z | | ___ _ __
2022-05-28T17:59:41.059049590Z | | / __| | | / \
2022-05-28T17:59:41.059064400Z | | \__ \ | | | () |
2022-05-28T17:59:41.059089773Z |_| |___/ |_| \__/
2022-05-28T17:59:41.059100774Z
2022-05-28T17:59:41.059110468Z
2022-05-28T17:59:41.059120067Z Brought to you by linuxserver.io
2022-05-28T17:59:41.059129945Z -------------------------------------
2022-05-28T17:59:41.059146111Z
2022-05-28T17:59:41.059156684Z To support LSIO projects visit:
2022-05-28T17:59:41.059166801Z https://www.linuxserver.io/donate/
2022-05-28T17:59:41.059176560Z -------------------------------------
2022-05-28T17:59:41.059186285Z GID/UID
2022-05-28T17:59:41.059195914Z -------------------------------------
2022-05-28T17:59:41.062093044Z
2022-05-28T17:59:41.062126282Z User uid: 911
2022-05-28T17:59:41.062143142Z User gid: 1001
2022-05-28T17:59:41.062158385Z -------------------------------------Editing ddclient.conf
2022-05-28T17:59:41.062173273Z
2022-05-28T17:59:41.065073037Z chown: changing ownership of '/app': Read-only file system
2022-05-28T17:59:41.067738525Z chown: changing ownership of '/defaults': Read-only file system
2022-05-28T17:59:41.068773420Z [cont-init.d] 10-adduser: exited 1.
2022-05-28T17:59:41.071007406Z [cont-init.d] 30-config: executing...
2022-05-28T17:59:41.077154989Z mkdir: cannot create directory ‘/var/cache/ddclient’: Read-only file system
2022-05-28T17:59:41.078625897Z cp: cannot create regular file '/ddclient.conf': Read-only file system
2022-05-28T17:59:41.080312576Z chown: cannot access '/var/cache/ddclient': No such file or directory
2022-05-28T17:59:41.080346864Z chown: cannot access '/ddclient.conf': No such file or directory
2022-05-28T17:59:41.083079682Z chmod: cannot access '/ddclient.conf': No such file or directory
2022-05-28T17:59:41.084149176Z [cont-init.d] 30-config: exited 1.
2022-05-28T17:59:41.086277721Z [cont-init.d] 90-custom-folders: executing...
2022-05-28T17:59:41.095000074Z [cont-init.d] 90-custom-folders: exited 0.
2022-05-28T17:59:41.097134661Z [cont-init.d] 99-custom-files: executing...
2022-05-28T17:59:41.109091768Z [custom-init] no custom files found exiting...
2022-05-28T17:59:41.109711951Z [cont-init.d] 99-custom-files: exited 0.
2022-05-28T17:59:41.111212080Z [cont-init.d] done.
2022-05-28T17:59:41.112803883Z [services.d] starting services
2022-05-28T17:59:41.122127504Z [services.d] done.
2022-05-28T17:59:41.124176963Z Setting up watches.
2022-05-28T17:59:41.124210863Z Watches established.
2022-05-28T17:59:41.298334835Z stat() on closed filehandle FD at /usr/bin/ddclient line 1167.
2022-05-28T17:59:41.298368870Z Use of uninitialized value $mode in bitwise and (&) at /usr/bin/ddclient line 1168.
2022-05-28T17:59:41.298399099Z readline() on closed filehandle FD at /usr/bin/ddclient line 1180.
2022-05-28T17:59:41.298995479Z stat() on closed filehandle FD at /usr/bin/ddclient line 1167.
2022-05-28T17:59:41.299028870Z Use of uninitialized value $mode in bitwise and (&) at /usr/bin/ddclient line 1168.
2022-05-28T17:59:41.299046482Z readline() on closed filehandle FD at /usr/bin/ddclient line 1180.
2022-05-28T17:59:41.299679968Z WARNING: file /ddclient.conf: Cannot open file '/ddclient.conf'. (No such file or directory)
2022-05-28T17:59:41.299714958Z WARNING: file /ddclient.conf: Cannot open file '/ddclient.conf'. (No such file or directory)
2022-05-28T17:59:42.282919060Z stat() on closed filehandle FD at /usr/bin/ddclient line 1167.
2022-05-28T17:59:42.282989530Z Use of uninitialized value $mode in bitwise and (&) at /usr/bin/ddclient line 1168.
2022-05-28T17:59:42.283002510Z readline() on closed filehandle FD at /usr/bin/ddclient line 1180.
2022-05-28T17:59:42.283482544Z stat() on closed filehandle FD at /usr/bin/ddclient line 1167.
2022-05-28T17:59:42.283512337Z Use of uninitialized value $mode in bitwise and (&) at /usr/bin/ddclient line 1168.
2022-05-28T17:59:42.283537758Z readline() on closed filehandle FD at /usr/bin/ddclient line 1180.
2022-05-28T17:59:42.284171520Z WARNING: file /ddclient.conf: Cannot open file '/ddclient.conf'. (No such file or directory)
2022-05-28T17:59:42.284198854Z WARNING: file /ddclient.conf: Cannot open file '/ddclient.conf'. (No such file or directory)
2022-05-28T17:59:43.283821618Z stat() on closed filehandle FD at /usr/bin/ddclient line 1167.
2022-05-28T17:59:43.283874082Z Use of uninitialized value $mode in bitwise and (&) at /usr/bin/ddclient line 1168.
2022-05-28T17:59:43.283886159Z readline() on closed filehandle FD at /usr/bin/ddclient line 1180.
2022-05-28T17:59:43.284400813Z stat() on closed filehandle FD at /usr/bin/ddclient line 1167.
2022-05-28T17:59:43.284477007Z Use of uninitialized value $mode in bitwise and (&) at /usr/bin/ddclient line 1168.
2022-05-28T17:59:43.284495819Z readline() on closed filehandle FD at /usr/bin/ddclient line 1180.
2022-05-28T17:59:43.285049916Z WARNING: file /ddclient.conf: Cannot open file '/ddclient.conf'. (No such file or directory)
2022-05-28T17:59:43.285099851Z WARNING: file /ddclient.conf: Cannot open file '/ddclient.conf'. (No such file or directory)
2022-05-28T17:59:44.285341435Z stat() on closed filehandle FD at /usr/bin/ddclient line 1167.
2022-05-28T17:59:44.285388366Z Use of uninitialized value $mode in bitwise and (&) at /usr/bin/ddclient line 1168.
2022-05-28T17:59:44.285429479Z readline() on closed filehandle FD at /usr/bin/ddclient line 1180.
2022-05-28T17:59:44.285940680Z stat() on closed filehandle FD at /usr/bin/ddclient line 1167.
2022-05-28T17:59:44.285978513Z Use of uninitialized value $mode in bitwise and (&) at /usr/bin/ddclient line 1168.

ctag · Jun 1, 2022

Adding a new pool and dummy benchmarks

Before I realized that I might want to build a second machine to house VMs and keep this one for storage, I already had completed the RAM and CPU upgrades. I'm also going to be on a tighter budget for a while, and so I'm interested in putting the VMs/apps on their own pool, and keep the storage one just for backups. I'm not sure if there's a practical use, but it will at least satisfy the part of my brain that wants some division there.

So I bought four "real" SSDs, though consumer grade -- 500GB Samsung EVO 870 -- and put them on the second SAS card that was used for the cheap-o SSD test a while back. I know that things are more complicated than this, but I wanted to get a rough idea of the performance tradeoffs with respect to HDD vs SSD and ZFS schemes, so I copied a fio script from google and ran it.

Script is from https://cloud.google.com/compute/docs/disks/benchmarking-pd-performance

TEST_DIR=$1

echo "Write throughput: "
fio --name=write_throughput --directory=$TEST_DIR --numjobs=8 \
--size=10G --time_based --runtime=60s --ramp_time=2s --ioengine=libaio \
--direct=1 --verify=0 --bs=1M --iodepth=64 --rw=write \
--group_reporting=1

echo "Write IOPS: "
fio --name=write_iops --directory=$TEST_DIR --size=10G \
--time_based --runtime=60s --ramp_time=2s --ioengine=libaio --direct=1 \
--verify=0 --bs=4K --iodepth=64 --rw=randwrite --group_reporting=1

echo "Read throughput: "
fio --name=read_throughput --directory=$TEST_DIR --numjobs=8 \
--size=10G --time_based --runtime=60s --ramp_time=2s --ioengine=libaio \
--direct=1 --verify=0 --bs=1M --iodepth=64 --rw=read \
--group_reporting=1

echo "Read IOPS: "
fio --name=read_iops --directory=$TEST_DIR --size=10G \
--time_based --runtime=60s --ramp_time=2s --ioengine=libaio --direct=1 \
--verify=0 --bs=4K --iodepth=64 --rw=randread --group_reporting=1

sudo rm $TEST_DIR/write* $TEST_DIR/read*

I ran the script 3 times for each configuration and then averaged the results, just for giggles.

The full set of results is attached.

I hadn't considered how much encryption played in performance for me. Other than that the ZFS partitioning schemes appeared to have less impact than the raw improvement of having SSDs over HDDs. I wound up picking RAIDZ1 as a compromise between double-mirrors and striping.

ctag · Aug 19, 2022

One aspect of having too many alerts for "app upgrades" and disk temperature is that I didn't notice when my storage pool state went degraded a few days ago.

The disk appears to be present and fine. I tried to 'online' it from the GUI, which didn't work and sent me this email:

This feels like the kind of issue that a reboot might fix, but (1) rebooting to fix a transient error in a NAS is a little absurd, and (2) I'm not going to do that because I don't know if the system will come back up.

Concerningly, both the disk in the alert, sde, along with sdh appear to be offline:

ctag · Sep 21, 2022

Installing plugins (and themes?) in tt-rss on SCALE

A little while ago I noticed that part of migrating my tt-rss instance to SCALE was loss of my beloved favicon plugin.

I must have blinked, because it looks like there's now an official plugin for that too. But it isn't bundled by default, and still needs to be installed.

When attempting to install the official plugin via the web UI, this error is returned:

"PI_ERR_NO_WORKDIR"

A google search returns literally nothing, which is rare. I hunted around for a while, expecting that the TrueCharts was built wrong and hadn't set up the plugin storage correctly. But the paths are all OK, instead the folder that plugins are installed to 'app/plugins.local/' is owned by root, and there's a permission mismatch.

To fix

Go to the app in SCALE. Click on "shell"
Open a shell to tt-rss, not postgres
Navigate to the app directory "cd /app"
Fix the permissions "chown -R www-data:www-data /app/plugins.local"
That's it! Go install a plugin.

ctag · Nov 15, 2022

Kubernetes gateway prevents traefik proxy to VMs and external IPs

Another day, another goof up. A while back I was trying to troubleshoot something and saw new settings in the Kubernetes UI, and filled in my LAN's gateway...

Later, I noticed that about half of my services were no longer accessible by their cloudflare domain URLs. Unfortunately there was some other stuff going on in Scaleland by that time and I failed to notice any pattern to the outage until I'd wasted a good bit of time. Eventually I did figure out that the only services that were unreachable were on IPs different from my host Truenas machine. Sound familiar?

I didn't forget the kubernetes gateway setting, but the UI "appeared" to disallow removing it, so I assumed it was a new required value. And I couldn't fathom it was at fault for this issue either. I even tried changing it to be the same IP as the host, as if that'd help. After burning a full day resetting settings, re-creating bridge nodes, and generally pulling my hair out, I finally figured out that you can click the "--" null value at the top of the "Route v4 interfaces" pulldown, and then enter an empty string for the gateway IP, and then the values can be unset and saved.

Suddenly everything started working again! VMs could be accessed from URLs and the internet was once more a magical playground and not hell on earth.

The UI feature that tricked me:

The not-as-obvious-as-I-would-like empty value:

ctag · Nov 15, 2022

Fixing cron on Truenas Official Nextcloud app

This may only apply to users who upgraded from an earlier version of Truenas's Nextcloud app to the version which includes a cron conainer/pod/thingy. That's my setup, I'm not sure if users installing Nextcloud now see this issue.

Cron error

The error message I've been seeing looks like this:

"Last background job execution ran 5 months ago. Something seems wrong. Check the background job settings."

This is caused by the cron container attempting to reach Nextcloud via IP address, which wasn't set up on my instance:

In the above screenshot "172.16.36.238" is the Nextcloud cron container, and "192.168.13.221:9009" is the Truenas host IP and Nextcloud continer's external port. Nextcloud isn't expecting cron to reach it via that 192.168 IP address.

Solution

This is a pretty easy fix. Go to your Truenas UI Applications page and open a shell in the main Nextcloud container:

Once in the shell, edit `config.php` to include the Nextcloud IP.

That's it! Cron should work now.

ctag · Nov 18, 2022

Migrating from the Official Nextcloud app to Truecharts Nextcloud app

During my half frantic search for a solution to the traefik outage a few days ago, one of the folks over at Truecharts' support channel said external-service apps were buggy, not great, and not really supported. They encouraged me to migrate my stuff to not use it anymore.

Unfortunately that's not really an option for most of the things I have using external-service currently, but I did see Nextcloud is now in Truecharts' list, and migrating over to it would allow me to use the built-in Ingress settings.

I don't know what the correct answer is for this kind of invasive management, I wound up using `docker exec` and `docker cp` to get it done. This took several tries, and since there's some sort of heartbeat detection thing going on with the Truecharts app, once it detected an issue it would just flag and reboot incessantly and I would have to delete the whole app and re-install it blank to try again. So I had to refine the process until I could pull through the migration without leaving the new container with any http status errors along the way.

The actual migration process is straightforward. I largely followed the directions here: https://docs.nextcloud.com/server/latest/admin_manual/maintenance/migrating.html

Finding the containers

This migration is carried out from a shell on the Truenas Scale host. One of the first tasks I found myself needing to tackle was IDing these semi-randomly generated docker containers from the shell.

To list all running docker containers, use `docker ps`
Then filter down to the app you want by searching for the name: `docker ps | grep nc-csb`

The above screenshot is showing a Truecharts Nextcloud app, the Official one has fewer containers involved. Scan the right hand names until you find one without "extra junk" like postgres, redis, nginx in the title, and then use the ID code on the far left with the docker commands to access it.

Place both apps into maintenance mode

Before we begin, shell into each app and enter maintenance mode. First, find the ID with the above `docker ps` command, and then enter maintenance mode with `docker exec -u www-data [container-id] /var/www/html/occ maintenance:mode --on`

Or, pull a full shell:

Backup official app data

Once you've found the ID for the official app, run `docker cp [container-id]:/var/www/html/data /path/to/local/dir`

Run that command for these directories in the container:

/var/www/html/data
/var/www/html/custom_apps
/var/www/html/themes
/var/www/html/config

`custom_apps` is missing from the Nextcloud documentation, but I'm pretty sure it needs to be copied over to the new app.

You won't be restoring the config directory directly, so it can be omitted if you want to shell into the official app container later and pull values from the config.

Backup official app database

Next, find the postgres container for the official app and shell into it with `docker exec -it -u root [container-id] /bin/bash`

Then create a sql dump: `PGPASSWORD="password-from-config-php" pg_dump nextcloud -x -O -U oc_user -f nextcloud.bak

I had to add the -x and -O flags that strip ownership and permissions to my backup, because my database user on the official app didn't match the default user on the truecharts app and I couldn't work out how to modify the users/roles.

Next, copy the file back to the Truenas host with `docker cp [container-id]:/nextcloud.bak /path/to/local/dir`

You should now have all of the files needed for the migration. Here is where I would stop the official app and leave it off and in maintenance mode while we finish the migration.

Restore database to truecharts app

Find the truecharts postgres container ID, and restart it with `docker restart [container-id]` or sometimes it will still be locked up from the nextcloud instance.

Then copy in the SQL backup file, open a shell with `docker exec -it -u root [container-id] /bin/bash` and run:

PGPASSWORD="password" psql -U nextcloud -d template1 -c "DROP DATABASE \"nextcloud\";"
PGPASSWORD="password" psql -U nextcloud -d template1 -c "CREATE DATABASE \"nextcloud\";"
PGPASSWORD="password" psql -U nextcloud -d nextcloud -f nextcloud.bak

Restore data to truecharts app

Exit out of the postgres container, and find the ID for the truecharts nextcloud container. Then copy in the backed up custom_apps, data, and themes directories. I had to copy them into [container-id]:/var/www/html/data since it was a larger volume or whatever. Then I shelled into the container and moved the folders to the correct places. Don't forget dotfiles inside the /var/www/html/data/data folder!

Copy over the passwordsalt, secret, and instanceid values from the official app's /var/www/html/config/config.php file to the truecharts' config file.

Exit maintenance mode, set up the new Ingress rules, and you should be done!

ctag · Jan 8, 2023

Upgraded to TrueNAS-SCALE-22.12.0 and got this notification:

Rsync task "/mnt/storage/backups/vbasftp" will not be executed because it uses a locked dataset.

OK, let's take a look:

Selecting "vbasftp":

Clicking "unlock" leads to a prompt to upload a key file:

Why? Was the key lost during the upgrade? I have the big backup file downloaded during the Bluefin upgrade, but I'm not sure how/if I could use it to repair these dataset. I feel let down by the docs again; there's a pattern I've noticed in both Truenas and Truecharts where the documentation will explain "clicking the unlock button will unlock the thing" in a tautology that.. isn't very helpful.

Reading through the known issues for Bluefin doesn't bring up anything that sounds like "dataset locked after upgrade".

Important Announcement for the TrueNAS Community.

Citadel - Build Plan and Log

Patron

Patron

Powered by Neutrality

Patron

Patron

Patron

Attachments

Patron

Patron

Patron

Recovering Web-UI Ports​

What went wrong​

How to fix it​

Patron

Setting up Paperless app with Tika+Gotenberg document conversion​

Setting up Paperless​

Installing Tika​

Installing Gotenberg​

Patron

Accessing apps over network from Truenas host​

Patron

Patron

Setting up Dynamic DNS with Cloudflare and ddclient app​

Editing ddclient.conf​

Getting log files from the container​

Patron

Adding a new pool and dummy benchmarks​

Attachments

Patron

Patron

Installing plugins (and themes?) in tt-rss on SCALE​

To fix​

Patron

Kubernetes gateway prevents traefik proxy to VMs and external IPs​

Patron

Fixing cron on Truenas Official Nextcloud app​

Cron error​

Solution​

Patron

Migrating from the Official Nextcloud app to Truecharts Nextcloud app​

Finding the containers​

Place both apps into maintenance mode​

Backup official app data​

Backup official app database​

Restore database to truecharts app​

Restore data to truecharts app​

Patron

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Citadel - Build Plan and Log"

Similar threads

Recovering Web-UI Ports

What went wrong

How to fix it

Setting up Paperless app with Tika+Gotenberg document conversion

Setting up Paperless

Installing Tika

Installing Gotenberg

Accessing apps over network from Truenas host

Setting up Dynamic DNS with Cloudflare and ddclient app

Editing ddclient.conf

Getting log files from the container

Adding a new pool and dummy benchmarks

Installing plugins (and themes?) in tt-rss on SCALE

To fix

Kubernetes gateway prevents traefik proxy to VMs and external IPs

Fixing cron on Truenas Official Nextcloud app

Cron error

Solution

Migrating from the Official Nextcloud app to Truecharts Nextcloud app

Finding the containers

Place both apps into maintenance mode

Backup official app data

Backup official app database

Restore database to truecharts app

Restore data to truecharts app