By design Sipxcom is designed is always use its built-in DNS server for querying SRV records - the Sipxcom unmanaged DNS server option when enabled allows querying of an unmanaged DNS server for phone registrations. In testing the Sipxcom HA solution in release 16.12, difficulties were encountered using the managed DNS tools of Sipxcom. This document http://wiki.sipxcom.org/display/sipXcom/DNS+Management provides a good overview of the Sipxcom managed DNS capabilities and how they should work. Significant investments were madeto make Sipxcom DNS management tools work with the high availability solution - the HA solution performed best in the lab when Sipxcom used a DNS server that was completely separate from Sipxcom, i.e. DNS services on each of the Sipxcom servers was disabled. The first part of this document assumes that Sipxcom uses a separate DNS server for all SRV record processing - to do this, careful attention needs to be placed on how Sipxcom primary and secondary servers are built and to disable the DNS service from being enabled at startup. The second part of this document describes how the HA solution works using the managed DNS tools available in Sipxcom.
Configuring Sipxcom for High Availability using Standalone DNS servers
Build the primary Sipxcom server using the 10.20.2.35 IP address as the primary DNS address in the Ethernet network settings - otherwise Sipxcom will override any manual entries in the /etc/resolv.conf whenever the primary Sipxcom server is restarted. When the fresh Sipxcom system is started, check the primary External DNS setting to ascertain that it points to the standalone DNS server built in the previous step; Otherwise, the etc/resolv.conf will need to be updated to point to the 10.20.2.35 standalone DNS server after every reboot. Start all all necessary Sipxcom services except Sipxbridge, which is not supported in the HA configuration.
NAT Traversal Settings
Go to the System - > NAT Traversal - > Settings menu and disable the Enable NAT Traversal and Server Behind NAT functionality. The Sipxcom high availability solution is only certified to use unmanaged gateways such as SBCs and ISDN/SIP gateways - Sipxbridge does not work properly in this environment.
The Sipxcom active phone registrations page does not provide any indication in a high availability configuration which voice server the phones are registered to, and the DNS server in Sipxcom by default spreads the phone registrations across all 3 servers. For phase 2 testing where replication bandwidth to a remote server is measured, the best way to do this is having all phones registered onto pbx3. In the lab setup, the .sip.tcp, .sip._udp, and .sips._tcp DNS SRV records are weighted so that phones register to pbx3 first, then pbx2, and then the pbx primary proxy. Perform DIGs of the SRV record from a client machine pointed to the 10.20.2.35 DNS server to ascertain that pbx3 appears first in the response. Also weight the SRVs for resource records and test.
Validate that the registry rr IN records in the /var/named/dnsfile directory are correctly configured and that the DNS A records have the correct IP address for each pbx. For phase 1 testing, the pbx3 IP address is 10.20.2.33, and the phase 2 pbx3 IP address is 10.10.17.10.
Go into the System - > Servers - > Core menu, and turn off the DNS service on the primary Sipxcom proxy.
SSH to the primary server, and perform the following:
- Issue the chkconfig --list named - by default, the Sipxcom installation turns on DNS service regardless of core server settings. Issue the chkconfig named off to disable DNS service startup upon a restart of the primary service.
- Ascertain that DNS service is not running - service named status. If the service is running then disable by issuing a service named stop command.
- Check the /etc/resolv.conf file to ascertain that it points to the standalone DNS server defined in the previous step.
- Reboot the primary server and validate that the DNS server is indeed stopped and that Sipxcom has not rewritten the /etc/resolv.conf file to point to a different external DNS server.
- Perform a DIG command on one of the SRV records and validate the response.
Add Server and Role
After the ISO is installed, sipxecs-setup is automatically invoked. Point the primary server to 10.10.17.10 and provision the administration ID (in this case 6 for pbx3) in the setp script. After the script completes, go back to the System - > Servers menu - the status field should change from uninitialized to configured. Repeat this process for each secondary server or arbiter assigned to the system.
Go into the System - > Servers - > Telephony section and turn on Sip Proxy and SIP Registrar services for each secondary server added to the HA System.
Add Secondary Servers to Global Databases
Go to System - > Databases and add secondary servers to the list of global databases - it will take 60-90 seconds for each database to be added and correctly synchronized. If you are seeing multiple errors or having difficulties getting the server added to the list of global databases, try upgrading the computing platforms being used for the servers.
In the lab setup, phones are manually provisioned with IP address, TFTP, SNTP, and DNS service addresses - a lab phone group was defined on the voice server and assigned to the test phones. The settings are as follows:
Build the standalone DNS servers 10.20.3.25 and 10.10.17.10 as per the previous section. Manipulate the weights of the tcp, udp, and rr service records on the 10.20.2.35 DNS server to have the phones on the 10.20.2.x subnet register to the pbx2 server while phones on the 10.10.17.x subnet register to the pbx3 subnet (tcp records illustrated below).
Build Primary Sipxcom Server
Build primary Sipxcom server with a valid upstream DNS forwarder address (e.g. 220.127.116.11). Once the primary Sipxcom server has been built, turn on all services except for Sipxbridge and DHCP (in the lab phones were statically provisioned). Sipxcom builds the following DNS settings in /etc/named.conf, /etc/resolv.conf, and /var/named/default.view.lvtest.com.zone.
NAT Traversal Settings
Set NAT traversal settings exactly like NAT traversal settings in previous section with standalone DNS servers.
- Go to System - > Core Services and turn on DNS on pbx2 and pbx3
- Go to System - > Telephony Services and turn on SIP Proxy and SIP Registrar services on pbx2 and pbx3
- Go to System - > Services and push the server profiles, which replicates the Mongo database and DNS information
- Go to Diagnostics - > Job Status and ascertain all replication was successfully completed.
Check DNS Configuration and Configure Failover
- The /etc/resolv.conf file on each system should have the IP address of the server as the first nameserver, followed by the other 2 nameservers.
- The /etc/named.conf file should point to the upstream DNS server defined at initial Sipxcom installation (e.g. 18.104.22.168)
- The zone file is defined as default.view.lvtest.com.zone file and located in the /var/named directory.
- By default the SRV records in the zone file are configured to deliver services equally across all three servers - i.e. there are three servers, and if the HA system had 90 registered phones, each system would have 30 registrations
- The A records for each system are defined at the end of the zone file.
Defining Regions for DNS Failover
The System - > Regions and System - > DNS - > Record View features within Sipxcom creates separate DNS zone files for each subnetwork. The following architecture and registration rules will be used to build DNS regions and failover rules within Sipxcom.
The first step is to define two regions within Sipxcom - one is called Main1020 with an IP address range of 10.20.2.x/24 and the other region is called Local1010 with an IP address range of 10.10.17.x/24. The System - > DNS - > Record View menu will map the region to the failover plan.
- pbx1020 with a fail-over plan of pbx2failover that applies to Main1020 region
- pbx1010 with a fail-over plan of pbx3failover that applies to local1010 region
What the Sipxcom DNS tools does is build the following DNS configuration in the /etc/named.conf file - DNS queries from the 10.20.2.x subnetwork use the pbx1020 zone file which always returns pbx2 SRV records while DNS queries from the 10.10.17.x subnetwork use the pbx1010 zone file which always returns pbx3 SRV records.
Double-Check Lab Configuration with Standalone DNS Servers
- DNS service is turned off
- /etc/resolv.conf file is pointed to the unmanaged DNS server at 10.20.2.35
- The SRV records on the unmanaged DNS service are pointing to pbx3 first, then pbx2, and then pbx - do a dig SRV _sip._tcp.lvtest.com command
- Double-check that all Sipxcom processes are running by doing a service sipxecs status
- Using Wireshark, double-check that phones are registering to pbx3.
- Place internal and external calls on system to validate that everything is working properly.
Double-Check Lab Configuration with Sipxcom DNS Servers
- Use the Excel Import capability of Sipxcom to pre-populate a large number of users with the same SIP password.
- Go into the Registration (UAC) section of Sipxcom and pull down the Add Batch menu
- Provision the first user name, expiry field (I shorten from 3600 to 300 seconds), the number of user registrations to create, SIP password, and IP address of the Registrar that the users should register to. in this case, we are trying to register all phones to the 10.10.17.10 Sipx proxy.
- Hid the Add symbolic link - doublecheck the status field to ascertain the users connected correctly, and click on the trace symbolic link to ascertain that users are registering to the correct proxy. Go to the Sipxcom Diagnostics - > Registrations page to validate that the users have registered correctly to Sipxcom.
When the Delete All symbolic link in Siptester is selected, the tool will instruct Sipxcom to un-register each line - in the Sipxcom Active registrations page, there may be still active or expired registrations - ssh to the ASipxcom primary server and use the following procedure to clear out all active registrations.
Preliminary Phase 1 Test Results
In scenario 1 when phones are registered to pbx3 from the SIP tester on the separate 10.20.2.x subnetwork, 10-40 kbps of bandwidth is generated in replication traffic and phone registrations every few seconds. When 100 or 500 phones immediately register to pbx3, 1-3 megabits of traffic per second is generated for several seconds comprised in phone registrations and state replication from pbx3 to pbx2 and pbx1. In the attached graph, there are two peaks in each test - the first peak represents the phones registering to pbx3 and the second peak represents the traffic to de-register the phones. The Sipxtester tool has the capability to delete all phone registrations simultaneously.
In scenario 2 when phones are register to pbx3 from the SIP tester on the same 10.10.17.10 subnetwork, 5-10 Kbps of bandwidth is generated in replication traffic to the primary and secondary servers in the 10.20.2.x subnetwork. When 100 or 500 phones immediately register to pbx3, approximately 1 megabit per second of bandwidth (or 100 Kilobytes (KB)) is generated for several seconds that is destined for the primary and secondary servers on the 10.20.2.x. subnetwork - this information is replication traffic only and not user registrations.