Can't bind to secondary DC

Status
Not open for further replies.

echelon5

Explorer
Joined
Apr 20, 2016
Messages
79
I'm having a problem binding FreeNAS to a secondary DC and I need some pointers on how to troubleshoot this. I'm fairly new to Active Directory and I think it has something to do with DNS.

Problem: FreeNAS binds to primary domain controller, but doesn't bind to secondary domain controller, when PDC is offline.

Machine Setup: it's pretty simple on both FreeNAS and Domain Controller. I've set them up following mostly FreeNAS documentation and Microsoft documentation.
- FreeNAS running FreeNAS-11.1-U4 on freenas1 and 11.1-U5 on freenas2
- both fnas1 and fnas2 work correctly when dc1 is online, but neither binds when dc1 is offline.
- DCs are running Windows 2016 with the same ativirus with the exact same settings

Network Setup:
- 3 locations: onsite, offsite and azure all connected between each other through IPSEC
- dc (dc1) and freenas 1 (fnas1) are onsite, freenas 2 (fnas2) offsite, secondary dc (dc2) is an Azure VM.
- configured zones for each location in AD DNS
- dc1 = 192.168.0.6 - Windows 2016 Essentials
- dc2 = 10.0.0.6 - Windows 2016 Datacenter

Other details:
- Windows machines seem to work ok with dc2
- DNS on both DCs point to each other
- replication and DNS between the DCs seems to work fine
- I've setup this up a few months ago, on 11.1-U2 I think and I'm pretty sure it worked correctly at the time

Error 1
Using nameserver1: IP address of DC1 and nameserver2: IP address of DC2

Something related to SRV records, similar to another thread (link ):

Jul 22 14:47:08 fnas2 uwsgi: [common.freenasldap:1147] FreeNAS_ActiveDirectory_Base.get_SRV_records: looking up SRV records for _ldap._tcp.dc._msdcs.adsub.contoso.com
Jul 22 14:47:13 fnas2 uwsgi: [common.freenasldap:1166] FreeNAS_ActiveDirectory_Base.get_SRV_records: no SRV records for _ldap._tcp.dc._msdcs.adsub.contoso.com found: The DNS operation timed out after 5.003402233123779 seconds
Jul 22 14:47:13 fnas2 uwsgi: [common.freenasldap:12] Traceback (most recent call last):
Jul 22 14:47:13 fnas2 uwsgi: [common.freenasldap:12] File "./freenasUI/common/freenasldap.py", line 1156, in get_SRV_records
Jul 22 14:47:13 fnas2 uwsgi: [common.freenasldap:12] answers = r.query(host, 'SRV')
Jul 22 14:47:13 fnas2 uwsgi: [common.freenasldap:12] File "/usr/local/lib/python3.6/site-packages/dns/resolver.py", line 949, in query
Jul 22 14:47:13 fnas2 uwsgi: [common.freenasldap:12] timeout = self._compute_timeout(start)
Jul 22 14:47:13 fnas2 uwsgi: [common.freenasldap:12] File "/usr/local/lib/python3.6/site-packages/dns/resolver.py", line 858, in _compute_timeout
Jul 22 14:47:13 fnas2 uwsgi: [common.freenasldap:12] raise Timeout(timeout=duration)
Jul 22 14:47:13 fnas2 uwsgi: [common.freenasldap:12] dns.exception.Timeout: The DNS operation timed out after 5.003402233123779 seconds
Jul 22 14:47:13 fnas2 uwsgi: [directoryservice.form:537] Exception: type = <class 'freenasUI.common.freenasldap.FreeNAS_ActiveDirectory_Exception'>


ran host -t

host -t srv _ldap._tcp.dc._msdcs.adsub.contoso.com

_ldap._tcp.dc._msdcs.adsub.contoso.com has SRV record 0 100 389 dc2.adsub.contoso.com.
_ldap._tcp.dc._msdcs.adsub.contoso.com has SRV record 0 100 389 dc1.adsub.contoso.com.


and

nslookup -type=SRV _ldap._tcp.dc._msdcs.adsub.contoso.com
Server: 10.0.0.6
Address: 10.0.0.6#53

_ldap._tcp.dc._msdcs.adsub.contoso.com service = 0 100 389 dc1.adsub.contoso.com.
_ldap._tcp.dc._msdcs.asub.contoso.com service = 0 100 389 dc2.adsub.contoso.com.


Error 2

Using nameserver1: IP address of DC2 and nameserver2: IP address of DC1

It connects to dc2:
Jul 22 13:43:26 fnas2 uwsgi: [common.freenasldap:287] FreeNAS_LDAP_Directory.open: initialized
Jul 22 13:43:26 fnas2 uwsgi: [common.freenasldap:331] FreeNAS_LDAP_Directory.open: trying to bind
Jul 22 13:43:26 fnas2 uwsgi: [common.freenasldap:232] FreeNAS_LDAP_Directory.open: (authenticated bind) trying to bind to dc2.adsub.contoso.com:389
Jul 22 13:43:27 fnas2 uwsgi: [common.freenasldap:333] FreeNAS_LDAP_Directory.open: binded
Jul 22 13:43:27 fnas2 uwsgi: [common.freenasldap:347] FreeNAS_LDAP_Directory.open: connection open
Jul 22 13:43:27 fnas2 uwsgi: [common.freenasldap:349] FreeNAS_LDAP_Directory.open: leave


But then it tries to switch to dc1:
Jul 22 13:43:28 fnas2 uwsgi: [common.freenasldap:277] FreeNAS_LDAP_Directory.open: enter
Jul 22 13:43:28 fnas2 uwsgi: [common.freenasldap:284] FreeNAS_LDAP_Directory.open: uri = ldap://dc1.adsub.contoso.com:389
Jul 22 13:43:28 fnas2 uwsgi: [common.freenasldap:287] FreeNAS_LDAP_Directory.open: initialized
Jul 22 13:43:28 fnas2 uwsgi: [common.freenasldap:331] FreeNAS_LDAP_Directory.open: trying to bind
Jul 22 13:43:28 fnas2 uwsgi: [common.freenasldap:232] FreeNAS_LDAP_Directory.open: (authenticated bind) trying to bind to dc1.adsub.contoso.com:389
Jul 22 13:43:37 fnas2 sshd[65632]: pam_winbind(sshd): valid_user: wbcGetpwnam gave WBC_ERR_WINBIND_NOT_AVAILABLE


Then, it shows an error (since dc1 is offline):
Jul 22 13:43:38 fnas2 uwsgi: [common.freenasldap:194] FreeNAS_LDAP_Directory[ERROR]: {'desc': "Can't contact LDAP server", 'errno': 60, 'info': 'Operation timed out'}

Error 3:

I've deleted offsite from AD Sites and Services. With nameserver1 set to DC1, it shows the same error as Error 1
With nameserver1 set to DC2, it binds to dc2, then it looks for onsite:

Jul 22 15:25:47 fnas2 uwsgi: [common.freenasldap:1147] FreeNAS_ActiveDirectory_Base.get_SRV_records: looking up SRV records for _ldap._tcp.onsite._sites.dc._msdcs.adsub.contoso.com
Jul 22 15:25:47 fnas2 uwsgi: [common.freenasldap:1147] FreeNAS_ActiveDirectory_Base.get_SRV_records: looking up SRV records for _kerberos._tcp.onsite._sites.adsub.contoso.com
Jul 22 15:25:47 fnas2 uwsgi: [common.freenasldap:1147] FreeNAS_ActiveDirectory_Base.get_SRV_records: looking up SRV records for _kpasswd._tcp.adsub.contoso.com


then it throws an error since dc1 is offline:

Jul 22 15:25:58 fnas2 uwsgi: [common.freenasldap:335] FreeNAS_LDAP_Directory.open: could not bind to dc1.adsub.contoso.com:389 ({'desc': "Can't contact LDAP server", 'errno': 60, 'info': 'Operation timed out'})

I've recreated the site in AD, and it goes the same route: binds to dc2, looks for offsite,

Jul 22 15:36:44 fnas2 uwsgi: [common.freenasldap:1147] FreeNAS_ActiveDirectory_Base.get_SRV_records: looking up SRV records for _ldap._tcp.offsite._sites.dc._msdcs.adsub.contoso.com
Jul 22 15:36:44 fnas2 uwsgi: [common.freenasldap:1166] FreeNAS_ActiveDirectory_Base.get_SRV_records: no SRV records for _ldap._tcp.offsite._sites.dc._msdcs.adsub.contoso.com found: None of DNS query names exist: _ldap._tcp.offsite._sites.dc._msdcs.adsub.contoso.com., _ldap._tcp.offsite._sites.dc._msdcs.adsub.contoso.com.adsub.contoso.com.
Jul 22 15:36:44 fnas2 uwsgi: [common.freenasldap:12] Traceback (most recent call last):
Jul 22 15:36:44 fnas2 uwsgi: [common.freenasldap:12] File "./freenasUI/common/freenasldap.py", line 1156, in get_SRV_records
Jul 22 15:36:44 fnas2 uwsgi: [common.freenasldap:12] answers = r.query(host, 'SRV')
Jul 22 15:36:44 fnas2 uwsgi: [common.freenasldap:12] File "/usr/local/lib/python3.6/site-packages/dns/resolver.py", line 1051, in query
Jul 22 15:36:44 fnas2 uwsgi: [common.freenasldap:12] raise NXDOMAIN(qnames=qnames_to_try, responses=nxdomain_responses)
Jul 22 15:36:44 fnas2 uwsgi: [common.freenasldap:12] dns.resolver.NXDOMAIN: None of DNS query names exist: _ldap._tcp.offsite._sites.dc._msdcs.adsub.contoso.com., _ldap._tcp.offsite._sites.dc._msdcs.adsub.contoso.com.adsub.contoso.com.


Other things I've tried:
- disabled Windows Firewall on DC2 (although it's the same firewall policy as on DC1)
- manually setting Domain Controller and Global Catalogue Server to DC2 in Directory Service -> Active Directory -> Advanced - same error as in Error 2
 
Joined
Dec 29, 2014
Messages
1,135
There are two pieces to being able to bind to the DC. The first (and what you are investigating above) is that the right stuff is in DNS (SRV records) to direct traffic to the right hosts. The second and simpler is having a DNS server to respond to your requests. What are the DNS server settings for your FreeNAS? In the scenario you describe, I would say that it should be the IP for DC1 first and DC2 second. If you only have a single DNS server, you won't be able to access anything if that server is down.

EDIT: Sorry, you did say that is the order you are using for DNS servers. What is doing the VPN connection to Azure? Is it configured to clear the DF bit (do not fragment). Windows loves to set DF for no particularly good reason (IMHO). The DNS query to dc2 could be a large enough TCP packet that DF could break it.
 

echelon5

Explorer
Joined
Apr 20, 2016
Messages
79
There are two pieces to being able to bind to the DC. The first (and what you are investigating above) is that the right stuff is in DNS (SRV records) to direct traffic to the right hosts. The second and simpler is having a DNS server to respond to your requests. What are the DNS server settings for your FreeNAS? In the scenario you describe, I would say that it should be the IP for DC1 first and DC2 second. If you only have a single DNS server, you won't be able to access anything if that server is down.

EDIT: Sorry, you did say that is the order you are using for DNS servers. What is doing the VPN connection to Azure? Is it configured to clear the DF bit (do not fragment). Windows loves to set DF for no particularly good reason (IMHO). The DNS query to dc2 could be a large enough TCP packet that DF could break it.

Boths DCs are running DNS servers and they are replicating through ADI. Usually I set nameserver1=DC1 (primary DC), but I've noticed the errors are different when I switch to DC2 as NS1. Also, it works properly when nameserver1=dc2, but DC1 is online.

VPN is setup through pFsense but it's the same setting on both sites and it's getting through to DC1.
  • dc1-pfsense1-Azure
  • dc1-pfsense1-offsite
  • dc2-pfsense2-Azure
I've just checked (and rebooted) "Clear invalid DF bits instead of dropping the packets" in pfsense, but it doesn't help. The errors are the same as above (error 1 for NS1=DC1 and error 2 for NS1=DC2).
 
Joined
Dec 29, 2014
Messages
1,135
I've just checked (and rebooted) "Clear invalid DF bits instead of dropping the packets" in pfsense, but it doesn't help. The errors are the same as above (error 1 for NS1=DC1 and error 2 for NS1=DC2).

You aren't trying to use LDAPs, are you? Are there any options selected in the MS DNS server that restrict who can make TCP connections to the DNS server? I suspect (but am not positive) that FreeNAS is going to use TCP for those queries, so that could be a problem.

My other suggestion is to just check the really simple stuff. With all DC's up and happy, do this ipconfig /registerdns from a command prompt as administrator on each DC. After giving that sufficient time to replicate (10-15 minutes), restart the Netlogon service on each DC, and give that time to replicate. Then try it again.
 

echelon5

Explorer
Joined
Apr 20, 2016
Messages
79
You aren't trying to use LDAPs, are you? Are there any options selected in the MS DNS server that restrict who can make TCP connections to the DNS server? I suspect (but am not positive) that FreeNAS is going to use TCP for those queries, so that could be a problem.

My other suggestion is to just check the really simple stuff. With all DC's up and happy, do this ipconfig /registerdns from a command prompt as administrator on each DC. After giving that sufficient time to replicate (10-15 minutes), restart the Netlogon service on each DC, and give that time to replicate. Then try it again.
I'm pretty sure I'm not using LDAP or have restrictions on the DNS Server. It does reply to queries and Windows PCs are able to update GPOs through it.

I tried several restarts of both FreeNAS and DCs and that didn't do anything.

It seems to work now though. I hope it was some Windows Server glitch. I've updated both DCs with the latest security update (2018-07), left them like that since yesterday and now it seems to work. I've made so many changes I've no idea what worked in the end.

Thanks for taking your time with this.
 
Status
Not open for further replies.
Top