iSCSI MPIO without a switch - /30 links

viniciusferrao · Feb 12, 2015

Hello guys,

I'm doing some experiment with iSCSI MPIO between FreeNAS 9.3 and two XenServer 6.5 hosts. I would like to use iSCSI MPIO to serve as VM storage. Pretty common environment, but without a switch to reduce the cost of the solution and minimize the overhead added by a switch on the iSCSI network.

The architecture is the following:
There are 10x GigE interfaces on the FreeNAS Server, two integrated intel board and two 4x combo cards.

I've created /30 links between the combo cards and the two XenServer hosts, in the following way:

Connection to Host #1:
igb0: 192.168.10.1/30
igb4:192.168.11.1/30

Connection to Host #2:
igb1: 192.168.20.1/30
igb5: 192.168.21.1/30

As you can see it's pretty explicative, the XenServer hosts have the following matched IP's:
192.168.10.2
192.168.11.2
192.168.20.2
192.168.21.2

But here the problem starts. I cannot start an iSCSI connection with a Portal with those 4 addresses. It fails when searching for the LUN, during the IQN phase. If I completely forget about the 192.168.20.1,192.168.21.1 IP addresses, I can find the LUN, but host #2 will be without an iSCSI network.

According the FreeNAS documentation I can create multiple portals. But I've tried to do this without success. I cannot map the same LUN on different portals.

Another solution would be using more than one IP on the same subnet on the FreeNAS box, but as all we know this is broken TCP networking.

So... I'm out of options, and my experiment appears to be a huge fail. :(

mav@ · Feb 12, 2015

Up to just a few days ago CTL in FreeNAS supported only one target per LUN and one portal group per target. Few days ago that was fixed at the CTL/ctld level. Now /etc/ctl.conf allows both multiple targets per LUN and multiple portal group per target to be confugured. Unfortunately it is not supported by the WebUI yet, and considering plans of switching development to the new UI for 10.x I am not sure whether it happen at all in 9.x.

I don't know much about Xen, but I am curios: does it really support only dynamic discovery? Can't connection to specific portal address be configured by hands? I know that it is absolutely possible at least in VMware, Windows and FreeBSD.

viniciusferrao · Feb 12, 2015

mav@ said:
Up to just a few days ago CTL in FreeNAS supported only one target per LUN and one portal group per target. Few days ago that was fixed at the CTL/ctld level. Now /etc/ctl.conf allows both multiple targets per LUN and multiple portal group per target to be confugured. Unfortunately it is not supported by the WebUI yet, and considering plans of switching development to the new UI for 10.x I am not sure whether it happen at all in 9.x.

I don't know much about Xen, but I am curios: does it really support only dynamic discovery? Can't connection to specific portal address be configured by hands? I know that it is absolutely possible at least in VMware, Windows and FreeBSD.

Well, this appears to be good news. I just need to know if this really solves the problem :)

Could you point to some documentation on how to do this the right way? If you have it of course.

But let discuss the problem little more... On the XenServer side, what happens: I put the initiator with the address: 192.168.10.1 , 192.168.11.1 , 192.168.20.1 , 192.168.21.1; it successfully discovers the IQN but fails to recognise the LUN. I'm attaching a screenshot of the situation: https://www.dropbox.com/s/dafdqtw8i8b99d2/Screenshot 2015-02-12 21.48.20.png?dl=0

When I select the wildcard options it fails, because the "pool master" cannot connect to the 192.168.20.0/30 and 192.168.21.0/30 networks, only on the networks that this host has interfaces, which in the case are: 192.168.10.0/30 and 192.168.11.0/30.

If I change the initiator to only those nets it works perfectly, but only on the main host.

Probably there's a way to solve this around the command line, but I must understand better how the XAPI works. I know that the SR (Storage Repository) is mapped to the PBD (Physical Block Devices), and those consequently are the iSCSI MPIO connections. So I should be good creating the PBD's with the "xe pbd-create" command. But I not and expert with the iscsiadm command, and I cannot map the volumes correctly through the command line.

Well, that's the problem. Hope I was clear enough to explain the drama.

Thanks in advance,
Vinicius.

mav@ · Feb 13, 2015

Your screenshot creates bunch of questions:
1) Why do you specify all four target addresses if you know that only two are accessible? Does it really need/use more then one to do target discovery?
2) Wildcard target choice sounds like something odd to me. The idea of target discovery is to choose specific target from the list of available and fetch its portal addresses.
3) VMware, for example, allows to completely bypass discovery, manually specifying target name, IP and port for connection. For doing multipath it allows to do it many times for different IPs. And because it is manual, it allows to use only wanted portal addresses, not all reported by the target. Doesn't Xen has some alike functionality?
4) There were reports that Xen can be picky about reported LUN IDs. There is a checkbox in FreeNAS extent settings to make IDs look acceptable for Xen. Have you tried to enable it?

viniciusferrao · Feb 13, 2015

mav@ said:
Your screenshot creates bunch of questions:
1) Why do you specify all four target addresses if you know that only two are accessible? Does it really need/use more then one to do target discovery?
2) Wildcard target choice sounds like something odd to me. The idea of target discovery is to choose specific target from the list of available and fetch its portal addresses.
3) VMware, for example, allows to completely bypass discovery, manually specifying target name, IP and port for connection. For doing multipath it allows to do it many times for different IPs. And because it is manual, it allows to use only wanted portal addresses, not all reported by the target. Doesn't Xen has some alike functionality?
4) There were reports that Xen can be picky about reported LUN IDs. There is a checkbox in FreeNAS extent settings to make IDs look acceptable for Xen. Have you tried to enable it?

Hello Mav@, I will try to answer your questions...

1) This is the standard behaviour when creating an iSCSI MPIO SR (Storage Repository) with XenServer. The problem here is: if I put only the two accessible IP addresses on the discovery, only the pool master connects to the storage, so the shared storage isn't shared anymore. I'm attaching another screenshot of what happens if I put only: 192.168.10.1 and 192.168.11.1 ---> https://www.dropbox.com/s/hiaxhmkn5jjry46/Screenshot 2015-02-13 11.15.39.png?dl=0

2) The wildcard is necessary. It's just a convention, I don't type the wildcard character. On the configuration screen I just put the IP addresses of the MPIO interfaces on the FreeNAS Server comma separated. Selecting the Wildcard on the dropdown menu indicates that I want MPIO on the SCSI and not single paths.

3) VMware have better networking controls, and I agree with you at this point. But XenServer don't. Probably there's a way to do this over the CLI, and this is what I'm researching right now. As I said, manipulating the PBD (Physical Block Devices) and the SR (Storage Repository) should be enough. But things isn't clear yet if it will be viable or not.

4) I never heard about those problems. I've been running XenServer (since 6.2) and FreeNAS 8.3 without issues with XenServer. But anyway when I saw the option of LUN name mapping for Xen, I checked the box and enabled the feature. Even unsure If I should do this, because the documentation says about Xen, and this is always a confusion. What kind of Xen the documentation is talking about? Xen isn't exactly XenServer, and there's other distributions of Xen, like OracleVM, XCP, and the garage-virgin distro built within the Xen Trunk, the latest unstable kernel from kernel.org and in-house compiled versions of glibc >.<

As today I think the problem is really on XenServer side. But I'll continue my research...

Thanks in advance,

bmh.01 · Feb 18, 2015

mav@ said:
Up to just a few days ago CTL in FreeNAS supported only one target per LUN and one portal group per target. Few days ago that was fixed at the CTL/ctld level. Now /etc/ctl.conf allows both multiple targets per LUN and multiple portal group per target to be confugured.

Got a link to the source of this please? This is something I really want after using comstar and its ability to do this, mostly multiple targets per extent.

mav@ · Feb 19, 2015

bmh.01 said:
Got a link to the source of this please? This is something I really want after using comstar and its ability to do this, mostly multiple targets per extent.

https://github.com/trueos/trueos/commit/08eacfe24ddd53435b6aecdc33151444ec9ac3d1
https://github.com/trueos/trueos/commit/20f4ff8cc1138db4e6a0dad39edc33662bd7bfbf

Unfortunately there is still no WebUI for this functionality.

bmh.01 · Feb 19, 2015

Hmm interesting, fingers cross this gets added soon :).

Geek Baba · Jun 28, 2019

Hi - did anyone get this working...

viniciusferrao · Jun 28, 2019

I got. The problem was on XenServer side.

Geek Baba · Jun 28, 2019

viniciusferrao said:
I got. The problem was on XenServer side.

Thats great to hear! Here is my setup and I cant get it to working if I connect NIC to NIC:
Freenas (latest stable) XCP-ng (latest stable)
192.168.1.1 -------------> 192.168.1.2
192.168.2.1 -------------> 192.168.2.2
192.168.3.1 -------------> 192.168.3.2
192.168.4.1 -------------> 192.168.4.2

The 4 hosts are in a pool, the master tires to connect to other host and fails because its point to point, any pointer how to make this work or its not possible to use point to point configuration within a pool.

viniciusferrao · Jun 28, 2019

I don't remember anymore what I've done, but the problem was on XenCenter.

Take a look at this:
https://discussions.citrix.com/topi...switchless-multipathed-iscsi-sr-on-xencenter/

Unfortunately the mailing list has 404'd.

viniciusferrao · Jun 28, 2019

I've found something on my mail client, I can't read it now, so I'm pasting it here, hope this helps:

Code:

Begin forwarded message:
From: Vinícius Ferrão <vinicius@ferrao.eti.br>
Date: February 16, 2015 at 10:24:02 PM GMT-2
Cc: <xs-devel@lists.xenserver.org>
To: Tobias Kreidl <Tobias.Kreidl@nau.edu>
Subject: Re: [xs-devel] Switchless shared iSCSI Network with /30 links
Reply-To: <xs-devel@lists.xenserver.org>
Tobias,

I've done more tests about the SR creation over XenCenter.

All the process definitely occurs only on the pool master. Even if the specified networks are not part of the networks of the host. The connection is tried on the default route.

I requested a new Software iSCSI SR with the following address:
192.168.10.1,192.168.11.1,192.168.20.1,192.168.21.1

The target IQN is discovered without any problem, since the iSCSI Storage reports the LUNs on any interface.

But when trying to discover the LUN, the process only is done on the pool master. During the discovery process, I left a watch command on both XS to with "netstat -an | grep 192.168" and the results are:

On the pool master, there's a lot of things going on:
tcp 0 0 10.7.0.101:443 192.168.172.1:58053 ESTABLISHED
tcp 0 0 192.168.11.2:53266 192.168.11.2:36365 ESTABLISHED
tcp 0 0 192.168.10.2:57930 192.168.10.2:36365 ESTABLISHED
tcp 0 0 10.7.0.101:443 192.168.172.1:58058 ESTABLISHED
tcp 0 0 192.168.10.2:55886 192.168.10.1:3260 TIME_WAIT
tcp 0 0 192.168.11.2:36365 192.168.11.2:53266 ESTABLISHED
tcp 0 0 192.168.10.2:36365 192.168.10.2:57930 ESTABLISHED
tcp 0 48 10.7.0.101:22 192.168.172.1:58064 ESTABLISHED
tcp 0 0 192.168.10.2:55870 192.168.10.1:3260 TIME_WAIT
tcp 0 0 192.168.11.2:52654 192.168.11.1:3260 TIME_WAIT
tcp 0 0 10.7.0.101:443 192.168.172.1:58055 ESTABLISHED
tcp 0 0 10.7.0.101:443 192.168.172.1:58054 ESTABLISHED
tcp 0 0 192.168.11.2:52656 192.168.11.1:3260 TIME_WAIT
tcp 0 0 192.168.10.2:55887 192.168.10.1:3260 TIME_WAIT
udp 0 0 192.168.10.2:123 0.0.0.0:*
udp 0 0 192.168.11.2:123 0.0.0.0:* 

During the discovery, look at the 192.168.21.1 address being searched through the default route:
tcp 0 0 10.7.0.101:443 192.168.172.1:58053 ESTABLISHED
tcp 0 0 192.168.11.2:52686 192.168.11.1:3260 TIME_WAIT
tcp 0 0 192.168.11.2:53266 192.168.11.2:36365 ESTABLISHED
tcp 0 0 192.168.10.2:57930 192.168.10.2:36365 ESTABLISHED
tcp 0 0 10.7.0.101:443 192.168.172.1:58058 ESTABLISHED
tcp 0 0 192.168.10.2:55886 192.168.10.1:3260 TIME_WAIT
tcp 0 0 192.168.10.2:55900 192.168.10.1:3260 TIME_WAIT
tcp 0 0 192.168.11.2:52684 192.168.11.1:3260 TIME_WAIT
tcp 0 0 192.168.10.2:55892 192.168.10.1:3260 TIME_WAIT
tcp 0 0 192.168.11.2:52679 192.168.11.1:3260 TIME_WAIT
tcp 0 0 192.168.11.2:36365 192.168.11.2:53266 ESTABLISHED
tcp 0 0 192.168.10.2:36365 192.168.10.2:57930 ESTABLISHED
tcp 0 0 192.168.10.2:55893 192.168.10.1:3260 TIME_WAIT
tcp 0 48 10.7.0.101:22 192.168.172.1:58064 ESTABLISHED
******************
tcp 0 1 10.7.0.101:56481 192.168.21.1:3260 SYN_SENT
******************
tcp 0 0 10.7.0.101:443 192.168.172.1:58055 ESTABLISHED
tcp 0 0 10.7.0.101:443 192.168.172.1:58054 ESTABLISHED
tcp 0 0 192.168.10.2:55887 192.168.10.1:3260 TIME_WAIT
udp 0 0 192.168.10.2:123 0.0.0.0:*
udp 0 0 192.168.11.2:123 0.0.0.0:* 

Even addresses that does not exist on the pool (since the IQN discovery process reports them):
tcp 0 0 10.7.0.101:443 192.168.172.1:58053 ESTABLISHED
tcp 0 0 192.168.11.2:52686 192.168.11.1:3260 TIME_WAIT
tcp 0 0 192.168.11.2:53266 192.168.11.2:36365 ESTABLISHED
tcp 0 0 192.168.10.2:57930 192.168.10.2:36365 ESTABLISHED
tcp 0 0 10.7.0.101:443 192.168.172.1:58058 ESTABLISHED
tcp 0 0 192.168.10.2:55886 192.168.10.1:3260 TIME_WAIT
tcp 0 0 192.168.10.2:55900 192.168.10.1:3260 TIME_WAIT
tcp 0 0 192.168.11.2:52684 192.168.11.1:3260 TIME_WAIT
tcp 0 0 192.168.10.2:55892 192.168.10.1:3260 TIME_WAIT
tcp 0 0 192.168.11.2:52679 192.168.11.1:3260 TIME_WAIT
tcp 0 0 192.168.11.2:36365 192.168.11.2:53266 ESTABLISHED
******************
tcp 0 1 10.7.0.101:52787 192.168.30.1:3260 SYN_SENT
******************
tcp 0 0 192.168.10.2:36365 192.168.10.2:57930 ESTABLISHED
tcp 0 0 192.168.10.2:55893 192.168.10.1:3260 TIME_WAIT
tcp 0 48 10.7.0.101:22 192.168.172.1:58064 ESTABLISHED
tcp 0 0 10.7.0.101:443 192.168.172.1:58055 ESTABLISHED
tcp 0 0 10.7.0.101:443 192.168.172.1:58054 ESTABLISHED
tcp 0 0 192.168.10.2:55887 192.168.10.1:3260 TIME_WAIT
udp 0 0 192.168.10.2:123 0.0.0.0:*
udp 0 0 192.168.11.2:123 0.0.0.0:* 

If you're wondering the 192.168.30.1 is reported because in the future we will add another XS host to the pool. The point here is that XenCenter should not lookup for this network since it isn't specified during the Software iSCSI creation. On my understanding it should only lookup for the specified addresses.

And to finish the tests, the second XS host remains equally during all the SR creation phase:
[root@xenserver2 ~]# netstat -an | grep -i 192.168
tcp 0 48 10.7.0.102:22 192.168.172.1:58046 ESTABLISHED
udp 0 0 192.168.21.2:123 0.0.0.0:*
udp 0 0 192.168.20.2:123 0.0.0.0:* 

As you said, perhaps propagating the initial connection through the host will solve the problem without modifying the XenCenter interface. A more sophisticated approach would be query the network-list over XAPI, retrieve the PIFs associated to those networks and them just check the IP addresses on the SR creation request and match them with the equivalent PIFs.

For example here is the output of the xe network-param-list to retrieve the associated PIF's and the xe pif-param-lst with the UUID's of the retrieved PIFs:

[root@xenserver1 ~]# xe network-param-list uuid=4dc936d7-e806-c69a-a292-78e0ed2d5faa
uuid ( RO) : 4dc936d7-e806-c69a-a292-78e0ed2d5faa
  name-label ( RW): iSCSI #0
  name-description ( RW):
  VIF-uuids (SRO):
  PIF-uuids (SRO): 3baa8436-feca-cd04-3173-5002d9ffc66b; 362d63cb-647a-465f-6b81-1b59cd3b1f5d
  MTU ( RW): 9000
  bridge ( RO): xenbr2
  other-config (MRW): automatic: true
  blobs ( RO):
  tags (SRW):
  default-locking-mode ( RW): unlocked

[root@xenserver1 ~]# xe pif-param-list uuid=3baa8436-feca-cd04-3173-5002d9ffc66b
uuid ( RO) : 3baa8436-feca-cd04-3173-5002d9ffc66b
  device ( RO): eth2
  MAC ( RO): 40:f2:e9:f3:5c:64
  physical ( RO): true
  managed ( RO): true
  currently-attached ( RO): true
  MTU ( RO): 9000
  VLAN ( RO): -1
  bond-master-of ( RO):
  bond-slave-of ( RO): <not in database>
  tunnel-access-PIF-of ( RO):
  tunnel-transport-PIF-of ( RO):
  management ( RO): false
  network-uuid ( RO): 4dc936d7-e806-c69a-a292-78e0ed2d5faa
  network-name-label ( RO): iSCSI #0
  host-uuid ( RO): c2ce0cf6-4bdf-4ee7-b387-b8eb47db4f7e
  host-name-label ( RO):xenserver2.local.iq.ufrj.br
  IP-configuration-mode ( RO): Static
  IP ( RO): 192.168.20.2
  netmask ( RO): 255.255.255.252
  gateway ( RO):
  IPv6-configuration-mode ( RO): None
  IPv6 ( RO):
  IPv6-gateway ( RO):
  primary-address-type ( RO): IPv4
  DNS ( RO):
  properties (MRO): gro: on
  io_read_kbs ( RO): 0.000
  io_write_kbs ( RO): 0.000
  carrier ( RO): true
  vendor-id ( RO): 8086
  vendor-name ( RO): Intel Corporation
  device-id ( RO): 1521
  device-name ( RO): I350 Gigabit Network Connection
  speed ( RO): 1000 Mbit/s
  duplex ( RO): full
  disallow-unplug ( RW): true
  pci-bus-path ( RO): 0000:06:00.2
  other-config (MRW): management_purpose: iSCSI MPIO #0

[root@xenserver1 ~]# xe pif-param-list uuid=362d63cb-647a-465f-6b81-1b59cd3b1f5d
uuid ( RO) : 362d63cb-647a-465f-6b81-1b59cd3b1f5d
  device ( RO): eth2
  MAC ( RO): 40:f2:e9:f3:5a:1c
  physical ( RO): true
  managed ( RO): true
  currently-attached ( RO): true
  MTU ( RO): 9000
  VLAN ( RO): -1
  bond-master-of ( RO):
  bond-slave-of ( RO): <not in database>
  tunnel-access-PIF-of ( RO):
  tunnel-transport-PIF-of ( RO):
  management ( RO): false
  network-uuid ( RO): 4dc936d7-e806-c69a-a292-78e0ed2d5faa
  network-name-label ( RO): iSCSI #0
  host-uuid ( RO): c8927969-b7bf-4a0f-9725-035550381d9c
  host-name-label ( RO):xenserver1.local.iq.ufrj.br
  IP-configuration-mode ( RO): Static
  IP ( RO): 192.168.10.2
  netmask ( RO): 255.255.255.252
  gateway ( RO):
  IPv6-configuration-mode ( RO): None
  IPv6 ( RO):
  IPv6-gateway ( RO):
  primary-address-type ( RO): IPv4
  DNS ( RO):
  properties (MRO): gro: on
  io_read_kbs ( RO): 0.000
  io_write_kbs ( RO): 0.000
  carrier ( RO): true
  vendor-id ( RO): 8086
  vendor-name ( RO): Intel Corporation
  device-id ( RO): 1521
  device-name ( RO): I350 Gigabit Network Connection
  speed ( RO): 1000 Mbit/s
  duplex ( RO): full
  disallow-unplug ( RW): true
  pci-bus-path ( RO): 0000:06:00.2
  other-config (MRW): management_purpose: iSCSI MPIO #0

So as you can see, we have the IP addresses and the network mask. Which is sufficient to an efficient and precise algorithm of SR creation.

I thinks this clarifies everything we discussed until now. A patch on XenCenter appears to be really viable.

Many thanks,
Vinicius.
On Feb 16, 2015, at 9:02 PM, Tobias Kreidl <Tobias.Kreidl@nau.edu> wrote:
Yes, correct. The MD36xx looks like two separate controller units, each acting as a separate device. For a standalone server or some storage devices, that's right that you need a separate subnet for each connection.

Because a single connection isn't seeing the broadcast from the other connectors at the time of the initial connection from one host, the pool is unaware of the others. The "repair" operation causes this to be conducted from each host, and as I guessed, then fixes the SR connectivity for the entire pool. I am speculating that the request for a new connection of this type via XenCenter could perhaps be solved by propagating that initial connection process to the rest of the hosts in the pool and then rescan the SR.

-=Tobias

On 2/16/2015 2:20 PM, Vinícius Ferrão wrote:
Hi Tobias,

As for I know, the Dell MD36xx is a dual controller based storage. So it's basically "two computers". Considering this with the added knowledge that I've about TCP networking there's a golden rule that says: only one IP address on a given subnet on a single computer. So I can't use the same network on my case. That's why I use point-to-point links too.

Since I've only one machine acting as a storage server (It's a Supermicro Machine with 32 gigs of RAM running FreeNAS 9.3), I do need four separate networks. FreeNAS is FreeBSD based, and BSD enforces this rule by default. I can't have two IP addresses of the same subnet on same machine, even if I manually configure it.

If this wasn't bad TCP network practice, I would be able to configure the SR via XenCenter, since it will "find" the same network on the others hosts without problem.

About the XenCenter, it really appears to be only a missing function on the software. If I was a good programmer I would try to implement this. But this is beyond my knowledge... :(

Thanks once again,
Vinícius.

On Feb 16, 2015, at 6:53 PM, Tobias Kreidl <Tobias.Kreidl@nau.edu> wrote:

No reason to necessarily use four separate subnets. You could have 192.168.10.1. and 20.1 on one iSCSI controller and 192.168.10.2 and 2.2 on the other controller (on, for example, an MD36xx that is standard). If however it is something like an MD32xx which recommends four separate subets, then of course, use four. One should follow the vendor's specifications.

I will respond in more detail later when I get a chance, Vinícius, but after a quick read, I agree with your findings from your other email. It would appear that the initial connection is indeed evidently a missing function in XenCenter and perhaps all that is needed to make this work "out of the box". As to the wildcard discovery, it's generally preferred as it allows the host to figure out the path on its own. The only reason I could see not using a wildcard discovery would be if there were any chance for overlaps and ambiguous connections.

Regards,
--Tobias

On 2/16/2015 1:38 PM, Vinícius Ferrão wrote:
Hello Dave,

In total I have four paths. Two in each host. I've made a drawning using ASCII characters, hope this can clarify the architecture:

  +----------------+
  | iSCSI Storage |
  +--|1||2||3||4|--+
  | | | |
  | | | \--------------\ iSCSI Multipathed /30
  | | \--------------\ | Network with two links
  | | | |
  +--|1||2|--------+ +--|1||2|--------+
  | XenServer Host | | XenServer Host |
  +----------------+ +----------------+

The Storage has four gigabit ethernet adapters with the following IP addresses:

192.168.10.1/30
192.168.11.1/30
192.168.20.1/30
192.168.21.1/30

On the first XenServer there are two interfaces with those IP's:

192.168.10.2/30
192.168.11.2/30

And on the second host the equivalent configuration:

192.168.20.2/30
192.168.21.2/30

That's why I'm using multipath. Because I do need MPIO to sum the two network interfaces per host.

To exemplify a little more heres a screenshot of the network configuration made on XenCenter:
https://www.dropbox.com/s/olfuxrh8d8nyheu/Screenshot%202015-02-16%2018.38.43.png?dl=0

Cheers,
Vinícius.

On Feb 15, 2015, at 12:22 PM, Dave Scott <Dave.Scott@citrix.com> wrote:

Hi,

On 15 Feb 2015, at 05:44, Vinícius Ferrão <vinicius@ferrao.eti.br> wrote:

Hello Tobias,
Thanks for the reply, even to discourage me :)

I really can’t understand why it isn’t a good idea. VMware do it, they recommend, and they even sell a solution using DELL hardware with two machines and a shared storage running with 10GbE links. Point-ot-point networks, dual controllers on the storage side and two different network for iSCSI. They sell it at least here in Brazil. What I’m trying to do is achieve the same topology with XenServer and commodity hardware.

Why I want to do this? Because I do believe that XenServer is a better product than VMware vSphere.

Perhaps we aren’t agreeing at some points due to different markets. For example, I’ve used Fibre Channel only one time in my life. And, believe it or not, it was a switchless configuration. And it worked… Wasn’t for VM workload but was in a GRID Cluster with TORQUE/OpenPBS software. I do have the pictures of the topology, but I need to search for it. At that time, I wasn’t the guy who created the solution, but I understood that it was a huge money saving, because the FC switches are a extremely expensive.

Leaving all those “political” points aside, let’s talk technically :)

I’ve managed to achieve what I want. A lot of PBD hacking was necessary in the CLI, but I was comfortable with this. The following steps were necessary to reproduce this:https://www.dropbox.com/s/laigs26ic49omqv/Screenshot%202015-02-15%2003.02.05.png?dl=0

I’m really excited because it worked. So let’s me explain what I’ve did.

First I started creating the SR via XenCenter, it failed as expected. But to bypass I just created the SR via XenCenter with the two Storage IP’s of the Pool Master. So the process went OK, but it started broken, since the second XS host cannot see the IP addresses.

Using the xe sr-list and xe sr-params-list commands, I got the SR UUID created by XenCenter and the referenced PBDs. According to what I know about XS, the PBD’s are the physical connect from separate XS hosts to the shared storage. So it’s not related to other hosts, it’s exclusively to the host machine that I’m using. Understood that, the ideia is to destroy the created PBD on the second XS host and recreate it with the proper settings and IP addresses.

That’s when things got interesting. To create the PBD, I do need the iSCSI working in the host, and consequently the multipath connection. So I done the process in the terminal with the following commands:

#Discovery of iSCSI Service
    iscsiadm -m discovery -t sendtargets -p 192.168.20.1
    iscsiadm -m discovery -t sendtargets -p 192.168.21.1

#Login in the iSCSI Targets
    iscsiadm -m node --targetname iqn.2015-01.br.ufrj.iq.local.istgt:iscsi0target1 -p 192.168.20.1:3260 --login
    iscsiadm -m node --targetname iqn.2015-01.br.ufrj.iq.local.istgt:iscsi0target1 -p 192.168.21.1:3260 --login

#Enable Multipath
    multipath
    multipath -ll

After all I was able to create the PBD with those new settings. The tricky part was to pass the required “device-config” values via the xe pbd-create command. I wasn’t aware of the method, but after 4 tries I understood the process and typed the following command:

#Create the appropriate PBD
    xe pbd-create sr-uuid=4e8351f6-ef83-815d-b40d-1e332b47863f host-uuid=c2ce0cf6-4bdf-4ee7-b387-b8eb47db4f7e device-config:multiSession="192.168.20.1,3260,iqn.2015-01.br.ufrj.iq.local.istgt:iscsi0target1|192.168.21.1,3260,iqn.2015-01.br.ufrj.iq.local.istgt:iscsi0target1|" device-config:target=192.168.20.1,192.168.21.1 device-config:multihomelist=192.168.10.1:3260,192.168.30.1:3260,192.168.20.1:3260,192.168.21.1:3260,192.168.11.1:3260,192.168.31.1:3260 device-config:targetIQN=* device-config:SCSIid=1FreeBSD_iSCSI_Disk_002590946b34000 device-config:port=3260I’d like to understand your configuration better— have you effectively enabled 2 storage paths for the SR, knowing that each host will only be able to use one path? Are your PBDs identical or are there differences?

Thanks,
Dave

Basically everything necessary is in there. The IP’s, the name, the multipath link, the LUN values, everything. After this, I just plugged the PDB via the CLI:

#Plug the PBD
    xe pbd-plug uuid=029fbbf0-9f06-c1ee-ef83-a8b147d51342

And BOOM! It worked. As you can see in the screenshot.

To prove that’s everything is working fine, I’ve created a new VM, installed FreeBSD 10 on it, done a pkg install xe-guest-utilities and requested a XenMotion operation. Oh boy, it worked:

Vinicius-Ferraos-MacBook-Pro:~ ferrao$ ping 10.7.0.248
PING 10.7.0.248 (10.7.0.248): 56 data bytes
64 bytes from 10.7.0.248: icmp_seq=0 ttl=63 time=14.185 ms
64 bytes from 10.7.0.248: icmp_seq=1 ttl=63 time=12.771 ms
64 bytes from 10.7.0.248: icmp_seq=2 ttl=63 time=16.773 ms
64 bytes from 10.7.0.248: icmp_seq=3 ttl=63 time=13.328 ms
64 bytes from 10.7.0.248: icmp_seq=4 ttl=63 time=13.108 ms
64 bytes from 10.7.0.248: icmp_seq=5 ttl=63 time=14.775 ms
64 bytes from 10.7.0.248: icmp_seq=6 ttl=63 time=317.368 ms
64 bytes from 10.7.0.248: icmp_seq=7 ttl=63 time=14.362 ms
64 bytes from 10.7.0.248: icmp_seq=8 ttl=63 time=12.574 ms
64 bytes from 10.7.0.248: icmp_seq=9 ttl=63 time=16.565 ms
64 bytes from 10.7.0.248: icmp_seq=10 ttl=63 time=12.273 ms
64 bytes from 10.7.0.248: icmp_seq=11 ttl=63 time=18.787 ms
64 bytes from 10.7.0.248: icmp_seq=12 ttl=63 time=14.131 ms
64 bytes from 10.7.0.248: icmp_seq=13 ttl=63 time=11.731 ms
64 bytes from 10.7.0.248: icmp_seq=14 ttl=63 time=14.585 ms
64 bytes from 10.7.0.248: icmp_seq=15 ttl=63 time=12.206 ms
64 bytes from 10.7.0.248: icmp_seq=16 ttl=63 time=13.873 ms
64 bytes from 10.7.0.248: icmp_seq=17 ttl=63 time=13.692 ms
64 bytes from 10.7.0.248: icmp_seq=18 ttl=63 time=62.291 ms
64 bytes from 10.7.0.248: icmp_seq=19 ttl=63 time=11.968 ms
Request timeout for icmp_seq 20
Request timeout for icmp_seq 21
Request timeout for icmp_seq 22
Request timeout for icmp_seq 23
Request timeout for icmp_seq 24
Request timeout for icmp_seq 25
64 bytes from 10.7.0.248: icmp_seq=26 ttl=63 time=14.007 ms
64 bytes from 10.7.0.248: icmp_seq=27 ttl=63 time=17.851 ms
64 bytes from 10.7.0.248: icmp_seq=28 ttl=63 time=13.637 ms
64 bytes from 10.7.0.248: icmp_seq=29 ttl=63 time=12.817 ms
64 bytes from 10.7.0.248: icmp_seq=30 ttl=63 time=13.096 ms
64 bytes from 10.7.0.248: icmp_seq=31 ttl=63 time=13.136 ms
64 bytes from 10.7.0.248: icmp_seq=32 ttl=63 time=16.278 ms
64 bytes from 10.7.0.248: icmp_seq=33 ttl=63 time=12.930 ms
64 bytes from 10.7.0.248: icmp_seq=34 ttl=63 time=11.506 ms
64 bytes from 10.7.0.248: icmp_seq=35 ttl=63 time=16.592 ms
64 bytes from 10.7.0.248: icmp_seq=36 ttl=63 time=13.329 ms
^C
--- 10.7.0.248 ping statistics ---
37 packets transmitted, 31 packets received, 16.2% packet loss
round-trip min/avg/max/stddev = 11.506/25.372/317.368/54.017 ms

Pics:https://www.dropbox.com/s/auhv542t3ew3agq/Screenshot%202015-02-15%2003.38.45.png?dl=0

I thinks this is sufficient for now, to prove that the concept works and XenServer can create this kind of topology. It’s just not available on the GUI. If XenCenter supports this kind of SR on the next patches, it will be breakthrough.

What I ask now: a feedback, if possible.

Thanks in advance,
Vinícius.

PS: Sorry any english mistakes. It’s almost 4AM here, and I was really anxious to report this in the list.

On Feb 14, 2015, at 8:14 PM, Tobias Kreidl <Tobias.Kreidl@nau.edu> wrote:

Frankly speaking, it doesn't seem that it's a good idea to try to try to squeeze something out of a technology that is not built-in or future-safe. Would you want to try to connect a fiber-channel storage device without a fiber channel switch (a very similar situation)? There are other alternatives, for example, to connect iSCSi storage to another, XenServer-independent server and then serve storage from it via NFS or use NFS in the first place and not make use at all of iSCSI technology. NFS has come a long way and in many cases, can be equivalent in I/O performance to iSCSI.

The intercommunications and ability to share SRs within a pool are a significant consideration and cannot be ignored as part of the storage support protocols within XenServer. XenServer provides a number of options for storage connectivity to cover a wide range of needs, preferences and costs. Often, trying to save money in the wrong area results in more headaches later on.

That all being said, it is possible to bypass the SR mechanism altogether (obviously, not a supported configuration) and connect at least Linux VMs directly to iSCSI storage using open iSCSI, but I have not explored this without still involving a network switch. As you indicated, it's also not clear if XenServer will remain 100% open iSCSI compatible or not in the future. It also takes aways some of the things an SR can offer, like VM cloning, storage migration, etc.

Finally, I am not sure I see where the lack of a switch factors in in improving performance. Networks switches are much more capable and much faster than any storage device or host in routing packets. The amount of overhead they add is tiny.

Regards,
-=Tobias

On 2/14/2015 2:26 PM, Vinícius Ferrão wrote:
Hello guys,

I was talking with Tim Mackey on Twitter about the viability of using a iSCSI SR without a Switch. The main goal is to reduce the TCO of the solution, get some extra performance removing the overhead that a managed switch puts on the network and reduce the complexity of the network.

At a first glance I thought that I will only achieve those kind of setup with a distributed switch, so DVSC should be an option. But as Tim said it’s only a interface to control some policies.

I’ve looked for help on ServerFault and FreeNAS Forums, since I’m running FreeNAS as iSCSI service. The discussion is going on those links:

http://serverfault.com/questions/667267/xenserver-6-5-iscsi-mpio-with-point-to-point-links-without-a-switch
https://forums.freenas.org/index.php?threads/iscsi-mpio-without-a-switch-30-links.27579/

It appears that XenServer cannot support switchless iSCSI storages without a switch, because XS needs the same network block on all the XS hosts to create a shared storage between the hosts. So the ideia of point-to-point networks doesn’t apply.

If I understood correctly the SR is created with the appropriate PBD’s. Perhaps one way to solve this issue is ignoring XenCenter on SR creation and try to create an SR with all the PBD’s from the XS hosts. But I was unable to test this setting due to lack of my knowledge when trying to create and manipulate the SR via the CLI.

Anyway, I’m opening the discussion here to get some feedback of the developers, to see if this method of building the iSCSI network is viable, since as Tim said, XS is not 100% open-iscsi upstream and if it could be integrated on next versions of XenServer via the XenCenter GUI. I think that’s a very common architecture and VMware is doing this on small pools! So let’s do it equally and of course surpass them. :)

Thanks in advance,
Vinicius.

Geek Baba · Jun 28, 2019

Thanks and appreciate it! I was able to find the mailing list archive! - Its a long thread and I was able to read some of it and was amazed by your persistence - https://xs-devel.xenserver.narkive.com/VltfcWjV/switchless-shared-iscsi-network-with-30-links#post35

I belive you simplified it in the end right?

Code:

Sorry for the delayed answer, I was very busy on those days, but I was able to create the SR directly from the command-line without the needs of manually create the PBD's on each host. The resulting issued command was: (I splitted the command for better reading)

[***@xenserver1 ~]# xe sr-create \
host-uuid=c8927969-b7bf-4a0f-9725-035550381d9c \ <======== This is the host-uuid of the Pool Master.
device-config:multiSession="
192.168.10.1,3260,iqn.2015-01.br.ufrj.iq.local.istgt:iscsi0target1|
192.168.11.1,3260,iqn.2015-01.br.ufrj.iq.local.istgt:iscsi0target1|
192.168.20.1,3260,iqn.2015-01.br.ufrj.iq.local.istgt:iscsi0target1|
192.168.21.1,3260,iqn.2015-01.br.ufrj.iq.local.istgt:iscsi0target1|
192.168.30.1,3260,iqn.2015-01.br.ufrj.iq.local.istgt:iscsi0target1|
192.168.31.1,3260,iqn.2015-01.br.ufrj.iq.local.istgt:iscsi0target1|" \
device-config:target=
192.168.10.1,
192.168.11.1,
192.168.20.1,
192.168.21.1,
192.168.30.1,
192.168.31.1 \
device-config:multihomelist=
192.168.10.1:3260,
192.168.11.1:3260,
192.168.20.1:3260,
192.168.21.1:3260,
192.168.30.1:3260,
192.168.31.1:3260
device-config:targetIQN=* \
device-config:SCSIid=1FreeBSD_iSCSI_Disk_002590946b34000 \
device-config:port=3260 \
name-label="FreeNAS iSCSI"
type=lvmoiscsi
shared=true

After some time the command answered, the UUID of the created SR: fa091aef-6645-6a77-6016-7b266a5e57e3

And in XenCenter everything was fine, the multipathed connections are made without any problem. See the attached screenshot for more details:
Failed to load image: https://www.dropbox.com/s/lely8w8gxndjz7g/Screenshot%202015-03-27%2017.16.25.png?dl=0 <Failed to load image: https://www.dropbox.com/s/lely8w8gxndjz7g/Screenshot%202015-03-27%2017.16.25.png?dl=0>

Do you guys think this last test is sufficient to implement this on XenCenter?

Thanks in advance,
Vinicius.

viniciusferrao · Jun 28, 2019

Yep, this is correct.

Keep in mind that I was using multipath too. So each Hypervisor had two paths to the storage.

I’ve been running this setting since the beginning of 2015. So it’s damn stable :)

Thanks for you appreciation. Perhaps we can ask this as a feature to the XCP-ng folks. Perhaps mailing Olivier directly.

Geek Baba · Jun 28, 2019

Yes sounds like a good idea to raise request for xcp-ng or even xoa/c (I tried with xoc and it didn't work), I have couple of questions for you before I do it in production:

1. This survived all the version upgrades right?
2. Yes I have multipath enabled, so all i need is to use your sr create command with my IP's and it should just show up in xen center right? I did not read the entire thread but just your inputs so I am not sure if I need to do anything else.

viniciusferrao · Jun 28, 2019

1. Yes, it survived. It's a configuration directly on XenAPI. The only thing that does not support this is the dumb XenCenter application. Changing the code of XenCenter was beyond my knowledge set, since it was written on C# and not a programming guy.

2. Yes, you should indeed configure the network first on XenCenter, but everything should be good after.

You can check if it's working with a simple xe sr-list command and multipath -ll.

Geek Baba · Jul 19, 2019

I was finally able to move the VM's off of existing iscsi target and now trying to setup point to point, the connecting between Freenas and XCP-ng is as follows and I see they are active and connected.
192.168.1.1 -------------> 192.168.1.2
192.168.2.1 -------------> 192.168.2.2
192.168.3.1 -------------> 192.168.3.2

multipath is enabled and I am using the command as follows:

Code:

xe sr-create \
host-uuid=d32158ed-3afc-4612-aaca-f68f5bf68d82 \
device-config:multiSession="
192.168.1.1,3260,iqn.2005-10.org.freenas.ctl:fastnas|
192.168.2.1,3260,iqn.2005-10.org.freenas.ctl:fastnas|
192.168.3.1,3260,iqn.2005-10.org.freenas.ctl:fastnas|" \
device-config:target= \
192.168.1.1, \
192.168.2.1, \
192.168.3.1 \
device-config:multihomelist= \
192.168.1.1:3260, \
192.168.2.1:3260, \
192.168.3.1:3260 \
device-config:targetIQN=* \
device-config:SCSIid=1FreeBSD_iSCSI_Disk_002590946b34000 \
device-config:port=3260 \
name-label="freenas iscsi" type=lvmoiscsi \
shared=true

The only thing I am not sure of is the value of - device-config:SCSIid=, so I used whatever was there in your command. However every time I get the following error :(

Error code: SR_BACKEND_FAILURE_202
Error parameters: , General backend error [opterr=need more than 2 values to unpack],

Not sure if you still remember how it worked but taking my chances...

Geek Baba · Jul 21, 2019

it finally worked, the trick is to create an SR using reachable IP, note the scsi id of the disk, then run the following:

Code:

iscsiadm -m discovery -t sendtargets -p 192.168.1.1

iscsiadm -m node --targetname iqn.2005-10.org.freenas.ctl:fastnas:target -p 192.168.1.1:3260 --login

iscsiadm -m discovery -t sendtargets -p 192.168.2.1

iscsiadm -m node --targetname iqn.2005-10.org.freenas.ctl:fastnas:target -p 192.168.2.1:3260 --login

iscsiadm -m discovery -t sendtargets -p 192.168.3.1

iscsiadm -m node --targetname iqn.2005-10.org.freenas.ctl:fastnas:target -p 192.168.3.1:3260 --login

xe sr-create \
host-uuid=d32158ed-3afc-4612-aaca-f68f5bf68d82 \
device-config:multiSession="192.168.1.1,3260,iqn.2005-10.org.freenas.ctl:fastnas:target|192.168.2.1,3260,iqn.2005-10.org.freenas.ctl:fastnas:target|192.168.3.1,3260,iqn.2005-10.org.freenas.ctl:fastnas:target|" \
device-config:target=192.168.1.1,192.168.2.1,192.168.3.1 \
device-config:multihomelist=192.168.1.1:3260,192.168.2.1:3260,192.168.3.1:3260 \
device-config:targetIQN=* \
device-config:SCSIid=36589cfc0000000e3a8a41f54ef71818f \
device-config:port=3260 \
name-label="freenas iscsi" \
type=lvmoiscsi \
shared=true

viniciusferrao · Jul 30, 2019

Yes, the IP must be reachable. So the tests can be made to validate the SR.

Anyway, nice to know that the problem was (once again) solved.

Important Announcement for the TrueNAS Community.

iSCSI MPIO without a switch - /30 links

Contributor

iXsystems

Contributor

iXsystems

Contributor

Explorer

iXsystems

Explorer

Explorer

Contributor

Explorer

Contributor

Contributor

Explorer

Contributor

Explorer

Contributor

Explorer

Explorer

Contributor

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "iSCSI MPIO without a switch - /30 links"

Similar threads