SOLVED Working lacp bond disappeared after reboot, unable to re-add it due to sqlite error

sammael

Explorer
Joined
May 15, 2017
Messages
76
Hi,

I've bonded my 2 nics into a lacp bond0, then added that to br0 to which all vms connect. It was working last few days. I needed to reboot today, after reboot I had no network and on the switch shows both lagg members as down. Connecting monitor and kb to the machine directly I see in network no bond, only 1 free interface and 2nd interface as member of br0. If I remove 1 port from the lagg group on switch I can connect via web ui.

Trying to again recreate the bond I get this error:
(sqlite3.IntegrityError) UNIQUE constraint failed: network_lagginterfacemembers.lagg_physnic [SQL: INSERT INTO network_lagginterfacemembers (lagg_ordernum, lagg_physnic, lagg_interfacegroup_id) VALUES (?, ?, ?)] [parameters: (0, 'enp5s0', 3)] (Background on this error at: https://sqlalche.me/e/14/gkpj)
with a wall of text below:
Code:
Error: Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1900, in _execute_context
    self.dialect.do_execute(
  File "/usr/lib/python3/dist-packages/sqlalchemy/engine/default.py", line 736, in do_execute
    cursor.execute(statement, parameters)
sqlite3.IntegrityError: UNIQUE constraint failed: network_lagginterfacemembers.lagg_physnic

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 201, in call_method
    result = await self.middleware._call(message['method'], serviceobj, methodobj, params, app=self)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1342, in _call
    return await methodobj(*prepared_call.args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/service/crud_service.py", line 169, in create
    return await self.middleware._call(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1342, in _call
    return await methodobj(*prepared_call.args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/service/crud_service.py", line 194, in nf
    rv = await func(*args, **kwargs)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/schema/processor.py", line 44, in nf
    res = await f(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/schema/processor.py", line 177, in nf
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/plugins/network.py", line 802, in do_create
    lagports_ids += await self.__set_lag_ports(lag_id, data['lag_ports'])
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/plugins/network.py", line 1149, in __set_lag_ports
    await self.middleware.call(
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1399, in call
    return await self._call(
           ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1342, in _call
    return await methodobj(*prepared_call.args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/schema/processor.py", line 177, in nf
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/plugins/datastore/write.py", line 62, in insert
    result = await self.middleware.call(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1399, in call
    return await self._call(
           ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1353, in _call
    return await self.run_in_executor(prepared_call.executor, methodobj, *prepared_call.args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1251, in run_in_executor
    return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/plugins/datastore/connection.py", line 106, in execute_write
    result = self.connection.execute(sql, binds)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1365, in execute
    return self._exec_driver_sql(
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1669, in _exec_driver_sql
    ret = self._execute_context(
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1943, in _execute_context
    self._handle_dbapi_exception(
  File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 2124, in _handle_dbapi_exception
    util.raise_(
  File "/usr/lib/python3/dist-packages/sqlalchemy/util/compat.py", line 211, in raise_
    raise exception
  File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1900, in _execute_context
    self.dialect.do_execute(
  File "/usr/lib/python3/dist-packages/sqlalchemy/engine/default.py", line 736, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.IntegrityError: (sqlite3.IntegrityError) UNIQUE constraint failed: network_lagginterfacemembers.lagg_physnic
[SQL: INSERT INTO network_lagginterfacemembers (lagg_ordernum, lagg_physnic, lagg_interfacegroup_id) VALUES (?, ?, ?)]
[parameters: (0, 'enp5s0', 3)]
(Background on this error at: https://sqlalche.me/e/14/gkpj)


Any idea what went wrong and how I can I fix the error?
edit: I should've add this is on latest Cobia 23.10.1
 
Last edited:

sammael

Explorer
Joined
May 15, 2017
Messages
76
I was able to delete the bond from the database manually with https://sqlitebrowser.org
I then imported the modified db to truenas, after reboot I was able to create the bond again (this time I used command line). After another reboot back to square zero with no bond and bridge with 1 port as member.

This is how it goes:

At start there is enp5s0,enp6s0 and br0 (with 1 member enp6s0 and alias set to static ip)
I create bond0 with 1 member port enp5s0
apply, persist
I add bond0 as member to br0 and unset enp6s0 as member of br0
apply, persist
I add enp6s0 as member port to bond0
apply, persist, exit
On managed switch I select the relevant 2 ports and enable them as members of lagg I created on switch. They pop as active lagg ports in seconds.
At this point I regain connectivity with working lacp bond
reboot

I have enp5s0,enp6s0 and br0 (with 1 member enp6s0 and alias set to static ip)
and back to the behaviour from original post.

I'm gonna try deleting all network data from the db and setting up from scratch.

edit: when it's broken it has 1 interface twice ignore the eno1234 they're spf+ or w/e they're just holes at the back of the mobo I don't even have the fibre connectors
1704307452016.png
 

sammael

Explorer
Joined
May 15, 2017
Messages
76
Well, not exatly Solved, but a workaround: I now have working lacp bond that survives reboot.

I managed to edit the database and delete all references to bridges and lagg interfaces and lagg members and got to state when after reboot I only had the 2 interfaces.

Now the weirdness begins:
I created bond0 with 2 member ports. Rebooted. bond0 still exists with 2 member ports
I created br0 with member bond0. Rebooted. No bond anywhere, br0 has 1 member interface- basically back to the original post.

The workaround was to do what I did above, BUT name the bond and bridge something else than 0. I went with br1 and bond1 and that works and survives reboot. Yay.
 

Michael Aos

Dabbler
Joined
Dec 16, 2023
Messages
10
Would you be able to provide a more detailed (along the lines of a HOWTO) description of what you did to resolve this issue?
 

sammael

Explorer
Joined
May 15, 2017
Messages
76
Disclaimer: I usually only know what I'm doing 50-60% of the time and rest I just wing - If you bork your Truenas I don't even speak English.

1st step is you gotta make the broken truenas webui working - how that's up to you. For me it was enough to remove 1 port from the lagg on switch and then dhcp gave it link address. Once you're in webui go to System settings / general, press the Manage configuration button and download your config db. Unpack it into a directory and keep the files that were inside together - you will need to repack them and upload them back to truenas later.

Now onto the part where we manually butcher a database - always a fun time! Download https://sqlitebrowser.org. Open the broken "freenas-v1.db" in it and go to Browse data tab.
1704841741993.png


In the dropdown below it go through all the tables that start with "network_" and delete every and all references to all bridges, link aggregations, bridge members, link aggregation members and interface configurations, which should empty all the tables but one. Some of them will complain with some error, just move to next table and return to delete those later (I think it's something like you can't delete lagg member if lagg is still defined or some such).

At the end of this step the only table starting with "network_" that should have anything in it should be "network_interface_link_address" and it should only have single record for each of your interface, ie the broken database I'm screenshotting for purposes of this post has (offtopic rant, but curse forever whomever came with the nonsense names beased on their pci lane or w/e rubbish. My cards are eth0 and eth1 and noone will tell me different *shouts and shakes arm at cloud*):
1704842216134.png

clearly the bottom 2 rows are absolute utter rubbish and you would need to delete those (and the bridge too according to instructions above innit).

Save the db and compress the contents of the directory where you unpacked - just the contents not the directory itself - into TAR format. I used 7zip under Win11. Upload the file to your truenas via the same button you downloaded it. Your truenas will reboot without any network connection, with only physical interfaces visible.

This step I'm not sure, but I did it twice from cli and it worked - perhaps it could be done from gui - but as that's what broke it in the first place I wasn't takin' any chances.

Either press 1 on to truenas console menu or if you already in shell run "cli_console" and press 1. I ain't gonna go into great detail into the nework config presumably you know what you want, but KEEP IN MIND: If you create a bridge DO NOT CALL IT br0, similarly if you create a lagg interface DO NOT CALL IT bond0. I tried and it created the same broken config I had at start. Then I did the same but named them br1 and bond1 and it works on 2 truenas scales.

Hope any of that helps - good luck. It was extremely annoying to deal with on both my systems.
 
Last edited:

Michael Aos

Dabbler
Joined
Dec 16, 2023
Messages
10
Disclaimer: I usually only know what I'm doing 50-60% of the time and rest I just wing - If you bork your Truenas I don't even speak English.

1st step is you gotta make the broken truenas webui working - how that's up to you. For me it was enough to remove 1 port from the lagg on switch and then dhcp gave it link address. Once you're in webui go to System settings / general, press the Manage configuration button and download your config db. Unpack it into a directory and keep the files that were inside together - you will need to repack them and upload them back to truenas later.

Now onto the part where we manually butcher a database - always a fun time! Download https://sqlitebrowser.org. Open the broken "freenas-v1.db" in it and go to Browse data tab.
View attachment 74414

In the dropdown below it go through all the tables that start with "network_" and delete every and all references to all bridges, link aggregations, bridge members, link aggregation members and interface configurations, which should empty all the tables but one. Some of them will complain with some error, just move to next table and return to delete those later (I think it's something like you can't delete lagg member if lagg is still defined or some such).

At the end of this step the only table starting with "network_" that should have anything in it should be "network_interface_link_address" and it should only have single record for each of your interface, ie the broken database I'm screenshotting for purposes of this post has (offtopic rant, but curse forever whomever came with the nonsense names beased on their pci lane or w/e rubbish. My cards are eth0 and eth1 and noone will tell me different *shouts and shakes arm at cloud*):
View attachment 74415
clearly the bottom 2 rows are absolute utter rubbish and you would need to delete those (and the bridge too according to instructions above innit).

Save the db and compress the contents of the directory where you unpacked - just the contents not the directory itself - into TAR format. I used 7zip under Win11. Upload the file to your truenas via the same button you downloaded it. Your truenas will reboot without any network connection, with only physical interfaces visible.

This step I'm not sure, but I did it twice from cli and it worked - perhaps it could be done from gui - but as that's what broke it in the first place I wasn't takin' any chances.

Either press 1 on to truenas console menu or if you already in shell run "cli_console" and press 1. I ain't gonna go into great detail into the nework config presumably you know what you want, but KEEP IN MIND: If you create a bridge DO NOT CALL IT br0, similarly if you create a lagg interface DO NOT CALL IT bond0. I tried and it created the same broken config I had at start. Then I did the same but named them br1 and bond1 and it works on 2 truenas scales.

Hope any of that helps - good luck. It was extremely annoying to deal with on both my systems.
Thank you!
Worked like a charm.
 

sadoveca

Cadet
Joined
Jan 10, 2024
Messages
1
worked for me as well.
managed to clean up the broken entries and created a lacp bond that survives the reboot.

cheerio
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776

sammael

Explorer
Joined
May 15, 2017
Messages
76
@RhubarbBread I was just moving one of the nas-es to another room and after connecting 2 new net cables (and changing the lagg port numbers in switch) and powering up all is well - br1 with member bond1 which has the 2 physical interfaces. But I agree waiting for the update is safest I think I read it should be up very soon, just mentioning this if the lagg is essential for your setup (I just like to see the graphs go above 1GBps occasionally :), with my usage 1 cable would be plenty enough.
 
Top