Alerts about multipaths

Status
Not open for further replies.

saurav

Contributor
Joined
Jul 29, 2012
Messages
139
Since this morning, I am seeing this in /var/log/messages

Code:
alert.py: [system.alert:156] Alert module '<multipaths_status.MultipathAlert object at 0x80a754550>' failed: xmlParseEntityRef: no name, line 503, column 83


It started exactly at 1:01 AM today and repeats every 5 minutes, but sometimes at 2 - 3 minutes also.

Any idea what this is about? I do not have any network settings in FreeNAS like VLAN, static routes, aggregation, etc.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Can you give the forum rules a read and provide some of that stuff? We've got an "i have an error" and that means nothing without far more context.

Thanks.
 

saurav

Contributor
Joined
Jul 29, 2012
Messages
139
Sorry, I think I was too eager to know what's going on. Google and forum search turned up nothing.

The FreeNAS mini is connected to a simple 8-port switch which is connected to the router. All pool/datasets are healthy, replication seems to be working too, and I can ssh into the box just fine.

smartctl (ada2 is the SATA dom)

/dev/ada0: http://pastebin.com/7qtEuD18
/dev/ada1: http://pastebin.com/suh6D0Zj
/dev/ada3: http://pastebin.com/JZuGwE6D
/dev/ada4: http://pastebin.com/RVeacB2w

ifconfig : http://pastebin.com/025DpGUu

pciconf -lv : http://pastebin.com/4geSiiqX

dmesg
Code:
arp: 192.168.1.50 moved from 02:e0:8a:00:0d:0a to d0:50:99:26:50:82 on epair6b
arp: 192.168.1.50 moved from d0:50:99:26:50:82 to 02:e0:8a:00:0d:0a on epair6b
arp: 192.168.1.50 moved from 02:e0:8a:00:0d:0a to d0:50:99:26:50:82 on epair6b
arp: 192.168.1.50 moved from 02:e0:8a:00:0d:0a to d0:50:99:26:50:82 on epair6b
arp: 192.168.1.50 moved from 02:04:e7:00:0c:0a to d0:50:99:26:50:82 on epair5b
arp: 192.168.1.50 moved from 02:34:18:00:0b:0a to d0:50:99:26:50:82 on epair4b
arp: 192.168.1.50 moved from 02:d3:12:00:0a:0a to d0:50:99:26:50:82 on epair3b
arp: 192.168.1.50 moved from 02:d1:2e:00:0e:0a to d0:50:99:26:50:82 on epair7b
arp: 192.168.1.50 moved from 02:99:13:00:09:0a to d0:50:99:26:50:82 on epair2b
arp: 192.168.1.50 moved from d0:50:99:26:50:82 to 02:e0:8a:00:0d:0a on epair6b
arp: 192.168.1.50 moved from d0:50:99:26:50:82 to 02:34:18:00:0b:0a on epair4b
arp: 192.168.1.50 moved from 02:e0:8a:00:0d:0a to d0:50:99:26:50:82 on epair6b
arp: 192.168.1.50 moved from 02:34:18:00:0b:0a to d0:50:99:26:50:82 on epair4b
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Can you post the debug file from your Mini?
 

saurav

Contributor
Joined
Jul 29, 2012
Messages
139
Attaching the tgz file here. There are lots of files in it. If you want something specific amongst them, please post back.
 

Attachments

  • debug-freenas-primary-20141107125112.tgz
    500.9 KB · Views: 201

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
I'm really tired, so I could be out to lunch.. but here's what I'm seeing that seems really messed up:
arp: 192.168.1.50 moved from d0:50:99:26:50:82 to 02:e0:8a:00:0d:0a on epair6b
arp: 192.168.1.50 moved from 02:e0:8a:00:0d:0a to d0:50:99:26:50:82 on epair6b
arp: 192.168.1.50 moved from d0:50:99:26:50:82 to 02:e0:8a:00:0d:0a on epair6b
arp: 192.168.1.50 moved from 02:e0:8a:00:0d:0a to d0:50:99:26:50:82 on epair6b

That's an indicator that you have IP conflicts on your network. Plain and simple. So that needs to be fixed.
+--------------------------------------------------------------------------------+
+ /etc/resolv.conf +
+--------------------------------------------------------------------------------+
nameserver 192.168.239.245



+--------------------------------------------------------------------------------+
+ Interfaces +
+--------------------------------------------------------------------------------+
igb0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=400b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO>
ether d0:50:99:26:50:82
inet 192.168.1.50 netmask 0xffff0000 broadcast 192.168.255.255
nd6 options=9<PERFORMNUD,IFDISABLED>
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active

This seems weird. Your name server is a very odd setup. Your netmask is also very odd for being a small network. I can't imagine why you think your network is so big to need a 255.255.0.0 netmask. Now the settings don't appear to be "invalid" but they are definitely what I'd call "far out of a typical network setup".

I'm seeing weird settings (and ones that are clearly broken such as the conflicting IP address). The conclusion I'm drawing is that you really don't understand basic networking very well or you have the most overly complex network that anyone could ever have, coupled with some noob mistakes. I'm not sure which. Normally I'd think the former just because of past experience with other users.

At this point I can't point you to the problem because nobody knows how your network should be setup except for you. So trying to determine what settings are correct and what aren't isn't possible for the forum.

Multipath is only used for iSCSI or SAS that I can think of. Unfortunately I don't think it's SAS because the FreeNAS Mini has no SAS and you don't appear to be using iSCSI so I'm actually pretty confused.
 

saurav

Contributor
Joined
Jul 29, 2012
Messages
139
If those arp transitions are due to IP conflicts, I have no idea how to prevent them. The 192.168.1.50 is the mini's IP and it already has a DHCP reservation in my router. And my DHCP reservation list is pretty small to have configuration mistakes (and I have triple-checked it). Perhaps I should also set up arp-mac binding (its an option in my router).

I'm not a networking expert, but what is so weird about my setup (other than the DNS/Gateway not being 192.168.0.1)? /16 is a valid netmask for 192.168 netbloc, right? And I have my jails on 192.168.2.*, hence the 16-bit netmask.

I'm seeing weird settings (and ones that are clearly broken such as the conflicting IP address).
If you have the time, please mention some of the other weirdness and broken things. And btw, I'm not trying to refute you point by point. I'm only trying to get all possible inputs since *you* took the time to go over *my* logs and setup.

Thanks,
Saurav.

ps: FWIW, the strange DNS (actually the router login) is because of an old security advisory that if your your browser has router's login cookies and it doesn't prevent cross-domain requests properly, a malicious page can alter your router's settings. Because most home routers are at 192.168.0.1, it advised to use a non-conventional router ip.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Normally you don't assign a subnet mask for 65000+ IPs if you are in a home environment. It just doesn't make sense to do that for a bunch of reasons.

As for your router settings, if you can't even trust your router to not do cross domain requests you should NOT be even using it. That's a noob error and they are a bag of fail if that can happen.

The weirdness (at least the stuff I could find) I already mentioned. At that point I stopped because somewhere, someone is being overzealous for whatever reason.

Not to bash your whole network topology, but when you start doing non-standard stuff you increase the change of problems. Many people don't have a deep understanding of what happens when you go with a /16 netmask and how that can be potentially crippling on a network that is improperly configured.

Other than that, I really don't know what to say. It's easy to help someone that has a fairly typical network and fairly typical settings. But as soon as you start going overly complex and trying to account for various theoretical scenarios the entire responsibility falls on that person to make it all work out. Sorry, but that's all the advice I can give.
 

saurav

Contributor
Joined
Jul 29, 2012
Messages
139
Normally you don't assign a subnet mask for 65000+ IPs if you are in a home environment. It just doesn't make sense to do that for a bunch of reasons.
Yes, normally you don't. You don't need those many IPs. I certainly don't. But it shouldn't hurt, unless there are bugs somewhere. The only reason I can think of why it doesn't make sense to have /16 is, home grade routers might not be robust enough to handle it. In my case, it just happens to be there since a long time, and I have never had a single networked device alert me about an IP conflict, ever. I would like to pursue this more, but not as part of this thread.

As for your router settings, if you can't even trust your router to not do cross domain requests you should NOT be even using it. That's a noob error and they are a bag of fail if that can happen.
The router is not responsible for this. The browser is. Or, they were, in the past. Like I said earlier, this is an old setting.

Not to bash your whole network topology, but when you start doing non-standard stuff you increase the change of problems. Many people don't have a deep understanding of what happens when you go with a /16 netmask and how that can be potentially crippling on a network that is improperly configured.
Ok, time for full disclosure, I guess. I have another set of router/dialer, etc.. I keep switching between two different ISPs (because they both suck at times). The other router has all standard settings, DNS/Gateway/router IP at 192.168.1.1 or something. And this arp hopping happens in that setup too. Which makes me think that this arp hopping is not because of the non-standard gateway or netmask. And I switched to the current _weird_ configuration a couple of weeks back, and these multipath errors started happening only this morning (my time).

Other than that, I really don't know what to say. It's easy to help someone that has a fairly typical network and fairly typical settings. But as soon as you start going overly complex and trying to account for various theoretical scenarios the entire responsibility falls on that person to make it all work out. Sorry, but that's all the advice I can give.
Fair enough (more than that, actually)! I really appreciate the time you took to spend on my problem. Of course its my problem and the onus is on me to fix it. If there is a slight bit of contention here, all I'm claiming is, my network setup is not overly complex or weird. Its a little different, that's all :)

Thanks!
Saurav.
 
Last edited:

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Yes, normally you don't. You don't need those many IPs. I certainly don't. But it shouldn't hurt, unless there are bugs somewhere. The only reason I can think of why it doesn't make sense to have /16 is, home grade routers might not be robust enough to handle it. In my case, it just happens to be there since a long time, and I have never had a single networked device alert me about an IP conflict, ever. I would like to pursue this more, but not as part of this thread.

Yeah, the ARP thing is *definitely* a conflict. You have the MAC addresses for the two conflicting devices, so it should be easy to find out what is going wrong. The IP conflict definitely is a different problem from the /16, so don't confuse the two in the same paragraph. ;)


Ok, time for full disclosure, I guess. I have another set of router/dialer, etc.. I keep switching between two different ISPs (because they both suck at times). The other router has all standard settings, DNS/Gateway/router IP at 192.168.1.1 or something. And this arp hopping happens in that setup too. Which makes me think that this arp hopping is not because of the non-standard gateway or netmask. And I switched to the current _weird_ configuration a couple of weeks back, and these multipath errors started happening only this morning (my time).

No, the ARP hopping is definitely because of two conflicting devices. That's why you see the MAC address hop back and forth between the two. ;)

Fair enough (more than that, actually)! I really appreciate the time you took to spend on my problem. Of course its my problem and the onus is on me to fix it. If there is a slight bit of contention here, all I'm claiming is, my network setup is not overly complex or weird. Its a little different, that's all :)

Not to be a whiny bitch (I am one, but not because of this thread ;) ) but I'd simplify down your network to something that is more typical as a first step. Then see what is or isn't going on. ;)
 

saurav

Contributor
Joined
Jul 29, 2012
Messages
139
Yeah, the ARP thing is *definitely* a conflict. You have the MAC addresses for the two conflicting devices, so it should be easy to find out what is going wrong. The IP conflict definitely is a different problem from the /16, so don't confuse the two in the same paragraph.
I already know what those two devices are. The IP is that of the FreeNAS box. One of the mac addresses is the FreeNAS box. The other is one of the jails. In every single case, over a month's time. These come as security alerts almost every day. I checked all of those mails today. Its just that I don't know what to do about it since the FreeNAS IP is reserved in DHCP, and the IP of the jail is static and outside the range of DHCP pool.

May be I'll try to bring even the jails within the DHCP IP pool and reserve addresses for them. Unless that involves re-creating the jails...

That still leaves me with the multipath alerts. They are still happening. I would try a restart, but the smart long tests are running right now...
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
I already know what those two devices are. The IP is that of the FreeNAS box. One of the mac addresses is the FreeNAS box. The other is one of the jails. In every single case, over a month's time. These come as security alerts almost every day. I checked all of those mails today. Its just that I don't know what to do about it since the FreeNAS IP is reserved in DHCP, and the IP of the jail is static and outside the range of DHCP pool.

That's creepy. Any chance you are running 9.3 alpha? Or do you have a jail's IP set to match the FreeNAS box?

Edit: I'm foolish today.. I know you are on 9.2.1.8.

IMay be I'll try to bring even the jails within the DHCP IP pool and reserve addresses for them. Unless that involves re-creating the jails...

Nope, you don't have to recreate them. Just change the setting.

I've sent your error message for multipath to the lead developer of FreeNAS. His first response was "this isn't a TrueNAS box?" It's not, I can validate that via the debug. Right now I'm just waiting for his response, but it probably won't be until later today.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Provide the output of:

sysctl -b kern.geom.confxml
 

saurav

Contributor
Joined
Jul 29, 2012
Messages
139
Status
Not open for further replies.
Top