SCALE 21.08-BETA.2 Boot Loop

NYSundevil

Dabbler
Joined
Nov 22, 2020
Messages
13
New to TrueNAS, ran CORE for about 9 months, moved to SCALE upon release of BETA because FreeBSD didn't support transcoding on 10th gen Intel. No major issues with SCALE, until I logged in yesterday and notice my main pool (Fusion Pool) was flagged degraded. Knew one of my 3.5 drives had some bent pins, shut everything down this morning and removed it for an RMA. Server hasn't fully booted since. Originally I would get the "Make sure the TureNAS system is powered on and connected to the network, now it fails to get that far.

I have a monitor and keyboard connected so I can watch the boot process loop. I currently only have single SSD SCALE Boot drive but do have additional CORE boot drives installed. I know I need to clean that mess up, I can't go back to CORE because I updated ZFS.

Watching the Shutdown and Boot cycle it was taking longer than usual to complete either over 20 minutes, no issues starting or stopping middleware.
Currently in the boot shutdown cycle I see a few errors
Failed mounting or unmounting/var/db/system.
Failed mounting or unmounting/var/lib/systemd/coredump.
There are other very similar related failures both shutting down starting up

The main issue I see during boot is it continues to cycle with
[OK] Stopped Getty on tty1
[OK] Started Getty on tty1
[OK] Stopped Serial Getty on ttyS0
[OK] Started Serial Getty on ttyS0

It does it a few times then tries to start some other processes before repeating Getty cycle. Eventually it hangs with screens flashing with text that too fast to read any of the text but not fully booting.

I can interrupt this from keyboard with Ctrl-Alt-Delete. When I boot I can choose which version of SCALE to run or advance options. From the advance option I can get to the Console menu with choice 1-11 but never gives me the IP for GUI. The IP is static assigned via my router.

This is my first post, I don't do anything complicated, the box is basically overkill for Plex and family file server. I was backing up the data to my old QNAP box but haven't resetup since moving to SCALE. Don't mind losing my setting but would rather not lose the last 4 months of data changes if I can avoid it.

Any help is greatly appreciated, please let me know if additional information is needed.

Neil
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,691
I'd suggest removing everything except the boot drive you want to use.... get that stable before adding the ZFS data drives.

What was the layout of the fusion pool.... was it a stripe?
 

NYSundevil

Dabbler
Joined
Nov 22, 2020
Messages
13
Thanks for the quick reply, that makes sense, will give it a try this evening.
I believe I setup 2 fusion pools in a mirror.
2x Seagate Exos X16 & 1 NVMe in a fusion pool mirrored by a 2nd setup identically.
In the GUI those 6 drives appear as a single pool.
 

NYSundevil

Dabbler
Joined
Nov 22, 2020
Messages
13
With only the Boot drive and the NVMe drives it boots quickly and gives me the console menu for TrueNAS Scale. It allows me to select the version of SCALE and whether I want advanced options for each but continues very quickly unless I move the cursor. When it defaults to 21.08 Beta.2 and leaves me at the TrueNAS prompt but the GUI has no IP listed, it is blank. I tried selecting recover it is currently hang at [11.443934] input: HDA Intel PCH HDMI/DP, pcm=10 as /devices/pc10000:00/0000:00:1f .3/sound/card/input18
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,691
The good thing is that the issue seems unrelated to the ZFS pool.
The bad news is I have no idea what is causing this issue. Anyone else have a clue yet?
Can you load an earlier version of SCALE or the latest version of CORE?.. if one of these is successful, it's likley to be a software issue. If not, it likely to be a hardware or motherboard/BIOS issue.
 

NYSundevil

Dabbler
Joined
Nov 22, 2020
Messages
13
Thank you for the guidance, my boot disk must have been corrupted. I was able to successfully install 21.08 V2 on different drive and boot to the GUI. My pool is degraded but I should still be able to import it, then resilver once I replace bad drive?
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,691
It is amazing how often there are weird faults in hardware...and hard to diagnose.
 

NYSundevil

Dabbler
Joined
Nov 22, 2020
Messages
13
The issues do seem to be related to my main Fusion Pool. I was able to import the simple mirrored SSD pool but the main pool isn't recognized to be imported. The one drive is removed but the other 5 are attached. At the drive level none are associated with a pool. I can import the individual drives, which I am not sure I should try but not the pool. The system still fails to boot form the original SCALE boot drive, which I believe is related to the Fusion Pool issue.

What should be my next trouble shooting steps? Do I try to import individual disks or are there commands to run from the shell to try to access the Fusion Pool?
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,691
Fusion pool is described as "Stipe + Mirror"... can you be more explcit about hwo it is set up?
 

NYSundevil

Dabbler
Joined
Nov 22, 2020
Messages
13
2x Seagate Exos X16 & 1 NVMe in a fusion stripe pool mirrored by a 2nd setup identically.
In the GUI those 6 drives appear as a single pool.

I tried to import the pool from the shell, forcing it
zpool import FusionPool -f -m

It imports with errors. I can't see the pool in the GUI but the drives are assigned to the pool
1634339053001.png



1634339094202.png

This is where it stands
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,691
Normally a Fusion pool would be a mirror vdev + one or more z1/Z2 vdevs
Mirroring pools within same system is very unusual. How do you do it?
 

NYSundevil

Dabbler
Joined
Nov 22, 2020
Messages
13
Sorry if I am using incorrect terminology, everything I setup was done under Core GUI, I doubt I did anything unique. I must have created 2 identical vdevs in a fusion format then used them to create a mirrored pool. I only did it once when I setup of the server a year ago. Each vdev is 2 14tb and one 512gb nvme. I believed that being mirrored the pool should still me accessible with a single drive failing.

The pool appears to import but isn't available via the GUI. Obviously it has error and corruption I have a backup from a couple months ago when I migrated to SCALE but I would prefer to get a more recent copy of data before creating the pool if it won't resilver properly. Do I have any options for getting data off this pool?
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,691
The terminology does not sound right... The CORE GUI mighty have let you set up a pool that was not a Fusion pool. A fusion pool has a separate vdev for the flash and then a vdev for the HDDs, Each vdev has its own protection... mirror for the flash ,but the HDDS can be Mirror of RAIDZ.
 

NYSundevil

Dabbler
Joined
Nov 22, 2020
Messages
13
I'm sure I have it mixed up. This is the current zpool status, I have 6 drives in 3 mirrors in what I called a FusionPool, only one drive is missing so the pool is in a Degraded state but should be functioning. Should it not? It appears to successfully import but it doesn't appear in the GUI. But the drives are labeled associated with FusionPool until I restart the system.
1634508029705.png
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,691
That looks OK.
Mirror2 is the special vdev and is a mirror that is healthy
Mirror0 is a data vdev with a pair of HDDs - that is healthy.
Mirror1 is a data vdev with mirror HDDs. one is unhealthy.

You would normally add another drive or replace the failed drive in Mirror-1. That would start a resilver process and get Mirror-1 back to healthy state. Until then, there is a danger of corrupting files on the remaining drive of Mirror-1.
 

NYSundevil

Dabbler
Joined
Nov 22, 2020
Messages
13
I understand the risk and am waiting for the replacement drive but I don't understand why the pool is visible in the shell but not in the GUI. The disks appear assigned to the pool but the pool itself doesn't appear. Originally SCALE wouldn't boot because this pool was corrupted, so I reinstalled SCALE and could import the VMpool but not FusionPool. I was able to force the import of FusionPool via the Shell into the new instance but it still isn't recognized in the GUI. I think I updated the ZFS version of both pools so I can't go back to CORE, do you think I might have more luck importing into the nightly version of SCALE?
1634515234134.png


1634515360410.png
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,691
Its possible that the nightly would help, but you are dealing without a full deck of cards. or in this case drives.
When you import via CLI, it bypasses the middleware and hence may not appear in the WebUI.
So the issue is why is it not importing via the WebUI.... probably because of the failed drive and perhaps some corrupted data.
If the pool can be fixed via the CLI... then it can be rebooted and reimported via webUI.
Removing the failed drive would be sensible... but much safer if you have a replacement drive to work with.
The nightly will graduate to the next release on 10/26.. about a week.
 
Top