iSCSI Multipath Networking in SCALE

r34lj4k3

Cadet
Joined
Jan 24, 2019
Messages
6
Hello,

I've just migrated from CORE over to SCALE (super excited for Linux over FreeBSD btw) and upon first boot, I noticed my iSCSI networking was broken.

Hardware is a Dell r720xd with 26 ssds in it, 384gb ecc ddr3 1333, 2x E5-2643 V2, perc flashed to IT mode, Dell rNDC NIC with dual 1gb Ethernet and Dual 10gbps sfp+.

Previous config under CORE was working perfectly:

10gbps SFP+ port 1 was on the 101 vlan, all the way through the switch to the NICs on the server, vmkping from the esxi host and regular ping from the truenas server were successful.
10gbps SFP+ port 2 was on the 102 vlan, same story.

After the upgrade, even after re-making the vlan interfaces and assigning ports, I can only ever get one to ping at a time, but either one can be active. I am fairly certain this is a networking issue or a bug, as there is no outbound (except for failed pings sent from truenas) nor inbound on the interface that doesn't work. It is listed as DEAD in the iSCSI paths view on VMWare esxi8.

Output of ifconfig -a:
root@Stronghold[~]# ifconfig -a
eno1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
ether c8:1f:66:ec:e9:33 txqueuelen 1000 (Ethernet)
RX packets 20036112 bytes 21176045187 (19.7 GiB)
RX errors 0 dropped 116181 overruns 0 frame 0
TX packets 20409168 bytes 26716749643 (24.8 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device interrupt 143 memory 0xd5000000-d57fffff

eno2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
ether c8:1f:66:ec:e9:35 txqueuelen 1000 (Ethernet)
RX packets 197953 bytes 21273790 (20.2 MiB)
RX errors 0 dropped 116173 overruns 0 frame 0
TX packets 246 bytes 18435 (18.0 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device interrupt 170 memory 0xd6000000-d67fffff

eno3: flags=4098<BROADCAST,MULTICAST> mtu 1500
ether c8:1f:66:ec:e9:37 txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device interrupt 170 memory 0xd7000000-d77fffff

eno4: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.1.249 netmask 255.255.255.0 broadcast 192.168.1.255
ether c8:1f:66:ec:e9:39 txqueuelen 1000 (Ethernet)
RX packets 203629 bytes 21705551 (20.7 MiB)
RX errors 0 dropped 116163 overruns 0 frame 0
TX packets 10580 bytes 9433136 (8.9 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device interrupt 191 memory 0xd8000000-d87fffff

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 6044 bytes 1572156 (1.4 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 6044 bytes 1572156 (1.4 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

vlan101: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.101.100 netmask 255.255.255.0 broadcast 192.168.101.255
ether c8:1f:66:ec:e9:33 txqueuelen 1000 (Ethernet)
RX packets 7296752 bytes 20066216937 (18.6 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 8661678 bytes 25778147747 (24.0 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

vlan102: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.102.100 netmask 255.255.255.0 broadcast 192.168.102.255
ether c8:1f:66:ec:e9:35 txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 37 bytes 8165 (7.9 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

Networking settings page:
1675209562790.png


From the esxi side (hostname Bastion):
[root@Bastion:~] vmkping -I vmk1 192.168.101.100
PING 192.168.101.100 (192.168.101.100): 56 data bytes
64 bytes from 192.168.101.100: icmp_seq=0 ttl=64 time=0.146 ms
64 bytes from 192.168.101.100: icmp_seq=1 ttl=64 time=0.179 ms
64 bytes from 192.168.101.100: icmp_seq=2 ttl=64 time=0.140 ms

--- 192.168.101.100 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.140/0.155/0.179 ms

[root@Bastion:~] vmkping -I vmk2 192.168.102.100
PING 192.168.102.100 (192.168.102.100): 56 data bytes
sendto() failed (Host is down)
[root@Bastion:~]


1675209660076.png


The Round Robin config set from before:
1675210902914.png
 
Last edited:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
super excited for Linux over FreeBSD btw

You're super excited to be running the thing that iXsystems said they were not prioritizing on SCALE, and recommended that everybody who needed iSCSI to keep using CORE?

Okay then....?
 

r34lj4k3

Cadet
Joined
Jan 24, 2019
Messages
6
You're super excited to be running the thing that iXsystems said they were not prioritizing on SCALE, and recommended that everybody who needed iSCSI to keep using CORE?

Okay then....?

Hi jgreco, big fan.

That being said, I did not see that as part of the migration guide. Can you point me in that direction?

The excitement stems from my hatred of FreeBSD and a love of Linux.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Hi jgreco, big fan.

That being said, I did not see that as part of the migration guide. Can you point me in that direction?

The excitement stems from my hatred of FreeBSD and a love of Linux.

I haven't seen any "migration guide." I'm just working off what has been said. It was made very clear that there are a lot of issues with SCALE that are not going to be resolved in the short term, especially including things like the sucky Linux memory management (half the memory for ARC) and a variety of performance issues. My understanding is that iXsystems is focused on making certain subsystems such as Kubernetes, containers, and scale-out features work well. They are not particularly interested in investing time to fix use cases already addressed by CORE, such as iSCSI, where iXsystems actually invested significant time and effort in creating a very high performance kernel iSCSI subsystem. We already know the Linux iSCSI stuff kinda sucks.

Why do you hate FreeBSD, and why do you even care? It's an appliance. Your interactions with it should be through the GUI.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Have you tried removing your NMP claimrules for TrueNAS and letting it revert to the default behavior? I don't have a SCALE iSCSI setup on hand right at the moment, but I'll see if I can spin one up.

After the upgrade, even after re-making the vlan interfaces and assigning ports, I can only ever get one to ping at a time, but either one can be active. I am fairly certain this is a networking issue or a bug, as there is no outbound (except for failed pings sent from truenas) nor inbound on the interface that doesn't work. It is listed as DEAD in the iSCSI paths view on VMWare esxi8.

I'd also suggest removing the VLAN tags from the TrueNAS SCALE machine and setting the switch-side ports to edge-type, "native VLAN" - the Cisco IOS equivalent here would be akin to switchport access vlan 101 and switchport access vlan 102 in the interface configuration.

Is reverting back to a CORE boot environment possible to restore full functionality?
Can you try booting a new install of CORE? Normally, the upgrade is one-way, but importing the pool alone should be portable.
 
Last edited:

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
The excitement stems from my hatred of FreeBSD and a love of Linux.
Kinda' funny cause for me, it's the other way around. I friggin' dislike Linux and its tendency to reinvent the wheel a million times and often times even make it worse (arguably). I mean, just look at systemd and pulseaudio history of their introduction mess just to name a few. That said, Linux does tend to have better desktop HW/SW support, which is why I use it for some of my workstations, but FreeBSD all the way for all my server machines.

Also, I'm currently running an experimental SCALE setup with dummy pool (no real data) and maybe 4 TrueCharts apps and the system constantly hovers around 8-11% CPU usage.... It's literally doing NOTHING and no one is using it. Meanwhile, my CORE, which is actually running production shares and a bunch of jails running production services mostly idle at 0-2%.... Never gonna use SCALE for my production NAS due to this.
 
Last edited:

r34lj4k3

Cadet
Joined
Jan 24, 2019
Messages
6
I haven't seen any "migration guide." I'm just working off what has been said. It was made very clear that there are a lot of issues with SCALE that are not going to be resolved in the short term, especially including things like the sucky Linux memory management (half the memory for ARC) and a variety of performance issues. My understanding is that iXsystems is focused on making certain subsystems such as Kubernetes, containers, and scale-out features work well. They are not particularly interested in investing time to fix use cases already addressed by CORE, such as iSCSI, where iXsystems actually invested significant time and effort in creating a very high performance kernel iSCSI subsystem. We already know the Linux iSCSI stuff kinda sucks.

Why do you hate FreeBSD, and why do you even care? It's an appliance. Your interactions with it should be through the GUI.
Here's the migration guide:


I think we've all needed to hit the CLI for various things, last time I was getting more SMART information via the commands. I just find it very annoying that a lot of the standard Linux commands don't work and the lack of a BASH environment, where I spend most of my time in Linux.

Is reverting back to a CORE boot environment possible to restore full functionality?
Can you try booting a new install of CORE? Normally, the upgrade is one-way, but importing the pool alone should be portable.
I ended up booting from the TrueNAS CORE OS still loaded on my NAS and everything came back right away. I did not upgrade the pool once booting into the SCALE OS just in case. I did see that it was supposedly a one way transaction, but after loading back into CORE, it even picked up the previously broken iSCSI connection automagically. Guess I won't be on Linux for a while :(

Also, I'm currently running an experimental SCALE setup with dummy pool (no real data) and maybe 4 TrueCharts apps and the system constantly hovers around 8-11% CPU usage.... It's literally doing NOTHING and no one is using it. Meanwhile, my CORE, which is actually running production shares and a bunch of jails running production services mostly idle at 0-2%.... Never gonna use SCALE for my production NAS due to this.
I didn't have this experience with CPU usage, maybe it's service/hardware relevant?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Here's the migration guide:

So perhaps I'm a bit confused. You expected the migration guide to give you reasons not to migrate? There are already conspicuous warnings that SCALE is not an upgrade from CORE; in much the same way users are expected to understand the difference between a truck and an SUV when purchasing a vehicle, you are expected to understand that these are different things.

I think we've all needed to hit the CLI for various things, last time I was getting more SMART information via the commands. I just find it very annoying that a lot of the standard Linux commands don't work and the lack of a BASH environment, where I spend most of my time in Linux.

I feel the same way about the idiotic ZSH environment thrust upon us in older versions of TrueNAS. I mostly hate on BASH because an entire generation was raised that cannot tell the difference between BASH and their rear ends; Bourne shell is NOT the same thing as BASH and those of us who write true Bourne scripting would appreciate it if the BASHies would take their damn BASHisms and choke to death on them. Do NOT frickin' shebang /bin/sh if you are writing in BASH! (just a quick venting there, heh) In any case, a "lack of the standard Linux commands" is likely related to your PATH; try defining some dotfiles, especially if you are using a non-root administrative account, due to the issues I pointed out in

 

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
I feel the same way about the idiotic ZSH environment thrust upon us in older versions of TrueNAS. I mostly hate on BASH because an entire generation was raised that cannot tell the difference between BASH and their rear ends; Bourne shell is NOT the same thing as BASH and those of us who write true Bourne scripting would appreciate it if the BASHies would take their damn BASHisms and choke to death on them. Do NOT frickin' shebang /bin/sh if you are writing in BASH! (just a quick venting there, heh)
I'm 100% with you on this one. Linux changing default shell to bash/zsh and symlinking /bin/sh to it really is 100% to blame on this. Hence why in the FreeBSD world, we call these things Linuxims/Bashisms. Don't even get me started on systemd and a whole bunch of software with hard dependencies on systemd making it impossible to port to other POSIX systems.
 

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
I didn't have this experience with CPU usage, maybe it's service/hardware relevant?
I don't think there's anything in my hardware that would make it do anything (listed in my signature primary system). It's also just a simple VM with NO DATA and only 4 TrueCharts apps installed that were not even configured yet (just default deployment). I wanted to test the apps, but didn't bother to do so when I saw that CPU (2 cores) was constantly hovering around 5-13%.... on a VM that essentially has no data, no real users, and not even a configured app? What could k3s process possibly be doing? My CORE VM (2 cores also) sees no such issue and it's an actual production server with actual TB worth of data and real users and a transmission jail that is seeding like 20-30 torrents and it's barely using 2% if even that.
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
Hey OP!

I didn't run into this problem because when I migrated from CORE to SCALE I also migrated my VMs from VMWare to SCALE. That being said, have you tried making your management network not native? In other words, VLAN 101 and VLAN 102 are tagged interfaces on your eno4 port, can you also change your management network from native/untagged to tagged and see if that resolve the problem? I've seen weirder things happen....

I don't see anything "wrong" with your config (except you only have a single uplink and not a LAG but that doesn't have anything to do with your problem :P).
 
Top