Failed to boot NAS after upgrade to 12.0-U4

Rabinovitch

Dabbler
Joined
Apr 3, 2021
Messages
43
After trivial upgrade from 12.0-U3 to 12.0-U4 our NAS can't start:
Code:
ERROR: cannot open /boot/lua/loader.lua: too many open files.

Need urgent help please!
 

Attachments

  • 12.0-U4_boot_failure.JPG
    12.0-U4_boot_failure.JPG
    71.3 KB · Views: 301

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Please describe your hardware setup and the boot drives...even Sherlock Holmes needs some clues.
 

Rabinovitch

Dabbler
Joined
Apr 3, 2021
Messages
43
Mirrored ZFS pool WDC SN730 512 GB NVME as boot device, Lenovo P620 as controller, SAS 9405W-16e as HBA, AIC J4078-01 as disk shelf. :cool:
 
Last edited:

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Mirrored ZFS pool as boot device, Lenovo P620 as controller, SAS 9405W-16e as HBA, AIC J4078-01 as disk shelf. :cool:

Can you describe the "mirrored ZFS pool." as boot device... should be OK, but its important to understand devices and their potential faiiure modes. I have not heard of anyone else reporting the issue.
 

Rabinovitch

Dabbler
Joined
Apr 3, 2021
Messages
43
Sorry to mislead you. There was originally one WDC SN730 512 GB NVME as system disk. The picture attached to the first post was on the screen after I gave the command to reboot the updated system.
 

Rabinovitch

Dabbler
Joined
Apr 3, 2021
Messages
43
Not sure if it is related to our upgrade issue, but we also see this:
gmultipath status -s
multipath/disk6 DEGRADED da88 (ACTIVE)
multipath/disk7 DEGRADED da87 (ACTIVE)
multipath/disk5 DEGRADED da86 (ACTIVE)
multipath/disk4 DEGRADED da85 (ACTIVE)
multipath/disk17 DEGRADED da84 (ACTIVE)
multipath/disk19 DEGRADED da83 (ACTIVE)
multipath/disk18 DEGRADED da82 (ACTIVE)
multipath/disk43 DEGRADED da81 (ACTIVE)
multipath/disk16 DEGRADED da80 (ACTIVE)
multipath/disk3 DEGRADED da79 (ACTIVE)
multipath/disk22 DEGRADED da78 (ACTIVE)
multipath/disk44 DEGRADED da60 (ACTIVE)
multipath/disk51 DEGRADED da47 (ACTIVE)
multipath/disk52 DEGRADED da46 (ACTIVE)
multipath/disk50 DEGRADED da45 (ACTIVE)
multipath/disk46 DEGRADED da44 (ACTIVE)
multipath/disk49 DEGRADED da43 (ACTIVE)
multipath/disk48 DEGRADED da42 (ACTIVE)
multipath/disk47 DEGRADED da41 (ACTIVE)
multipath/disk45 DEGRADED da40 (ACTIVE)
multipath/disk41 DEGRADED da39 (ACTIVE)
multipath/disk36 DEGRADED da38 (ACTIVE)
multipath/disk42 DEGRADED da37 (ACTIVE)
multipath/disk40 DEGRADED da36 (ACTIVE)
multipath/disk39 DEGRADED da35 (ACTIVE)
multipath/disk35 DEGRADED da34 (ACTIVE)
multipath/disk38 DEGRADED da33 (ACTIVE)
multipath/disk34 DEGRADED da32 (ACTIVE)
multipath/disk31 DEGRADED da31 (ACTIVE)
multipath/disk37 DEGRADED da30 (ACTIVE)
multipath/disk33 DEGRADED da29 (ACTIVE)
multipath/disk32 DEGRADED da28 (ACTIVE)
multipath/disk26 DEGRADED da27 (ACTIVE)
multipath/disk30 DEGRADED da26 (ACTIVE)
multipath/disk29 DEGRADED da25 (ACTIVE)
multipath/disk28 DEGRADED da24 (ACTIVE)
multipath/disk27 DEGRADED da23 (ACTIVE)
multipath/disk25 DEGRADED da22 (ACTIVE)
multipath/disk24 DEGRADED da21 (ACTIVE)
multipath/disk23 DEGRADED da20 (ACTIVE)
multipath/disk21 DEGRADED da19 (ACTIVE)
multipath/disk20 DEGRADED da18 (ACTIVE)
multipath/disk15 DEGRADED da17 (ACTIVE)
multipath/disk14 DEGRADED da16 (ACTIVE)
multipath/disk13 DEGRADED da15 (ACTIVE)
multipath/disk12 DEGRADED da14 (ACTIVE)
multipath/disk11 DEGRADED da13 (ACTIVE)
multipath/disk10 DEGRADED da12 (ACTIVE)
multipath/disk9 DEGRADED da11 (ACTIVE)
multipath/disk8 DEGRADED da10 (ACTIVE)
multipath/disk2 DEGRADED da9 (ACTIVE)
multipath/disk1 DEGRADED da8 (ACTIVE)
And 52 critical alerts like
Multipath multipath/disk29 connection is not optimal. Please check disk cables.

But so far it is completely impossible to understand what caused these errors.
 

Rabinovitch

Dabbler
Joined
Apr 3, 2021
Messages
43
Ok, we can boot with the disk enclosure off. I believe that some limits on the maximum number of open files should be increased. But how and what variables should be passed to the loader for this?
 
Last edited:

Rabinovitch

Dabbler
Joined
Apr 3, 2021
Messages
43
Wondering what is this (can bee seen during boot when the disk enclosure is on):
1623252293778.png
 
Joined
Jun 2, 2019
Messages
591

Rabinovitch

Dabbler
Joined
Apr 3, 2021
Messages
43
OK, then the loader.lua and it's "too many files" problem is our main concern for now.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Is the JBOD wired for two connections from SAS controller... is it wide-ported or multipath configuration? Any chance that was changed?

It looks like it is ignoring the boot SSD and trying to boot from the pool? Is the Boot SSD operational ?
(if you remove it is the boot sequence the same)?
 

Rabinovitch

Dabbler
Joined
Apr 3, 2021
Messages
43
Single SAS HBA is wired with 4 cables with single disk shelf. Not sure about the details of the logical configuration, but nobody touched anything before the upgrade and reboot.
Sure the Boot SSD is opertional since we can boot normally when the JBOD is absent during loader initialization.
Not sure about removing the boot SSD, we can try it tomorrow. Is there any way to modify the boot sequence? At least to exclude unnecessary boot options?
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
No need to remove boot SSD.... just wanted to confirm that boot SSD was functional.

So, the only thing that is "unusual" is that you have a JBOD with 4 SAS cables?.
Multpathing was working... now it is not
Is it possible something has failed within JBOD???

I'd suggest a manual rollback to 12.0-U3 via USB - unless someone else has a better idea..
 

Rabinovitch

Dabbler
Joined
Apr 3, 2021
Messages
43
Let's leave multpathing aside for now. The main problem is that TN can't boot while JBOD is powered on.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Let's leave multpathing aside for now. The main problem is that TN can't boot while JBOD is powered on.
We're also trying to work out what is different about your system that doesn't get tested well .... and that no-one else has.
There's nothing you have identified that is unusual in process or hardware status.
If it can boot with 12.0-U3 and not with 12.0-U4... then there is a bug.
The unusual (less normal) thing about your config is multipath... its not widely used. Has anyone else tested multipath with 12.0-U4?
 

Rabinovitch

Dabbler
Joined
Apr 3, 2021
Messages
43
We will try U4 in several hours. I think it will boot successfully. It has already been booted on this version (but I repeat - when the disk shelf is powered off until the bootloader is initialized).
 

freqlabs

iXsystems
iXsystems
Joined
Jul 18, 2019
Messages
50
Ok, we can boot with the disk enclosure off. I believe that some limits on the maximum number of open files should be increased. But how and what variables should be passed to the loader for this?

The limit is a compile-time constant. Nothing in the boot loader changed between U3 and U4. Have there been any changes to the hardware configuration or firmware settings?
 
Top