How to maintain high availability in wholesale telecom.

The Problem

When we built the Peeredge Wholesale platform, we operated under the guidance that we need to be as redundant as possible inside of our infrastructure. We abstracted that into every aspect of our platform, from load balancing, to routing, to the storage and processing of call detail records (CDRs). Unfortunately, the wholesale telecom market still exists on standard IPV4 IP addresses. Unless we do some fancy software defined networking, which SIP is typically allergic to, we are left with requiring things like options pings to determine if an endpoint is dead. The unfortunate part of this methodology is that many traditional or non-traditional switches do not support proper responses to options pings, or simply arent enabled for it. Regardless if a carrier load balances their traffic, the single point of failure (SPOF) then becomes the load balancer. 

This is why we turn up redundant load balancers for each of our Peeredge Wholesale clients. Unlike traditional load SIP balancers, our load balancers are clustered to provide shared intelligence like capacity control and authentication. 

However, that brings me back to the crux in maintaining high availability within wholesale telecom. Even with redundant load balancers, carriers not utilizing intelligent switching will continue to send new call requests to potentially dead load balancers. This usually leads to scenarios where the person making the phone call hangs up, because they receive dead air for the period that it takes for their switch or SBC to either roll the call to another vendor or fail it back. 

The Solution

This is where DNS load balancing and failover comes in. As many of you know, DNS is what translates friendly names like "myswitch.mycarrier.com" into a list of ordered IP addresses in which to send a request to. Many people dont realize that DNS has a spec for SIP traffic. The benefits of using DNS to route phone calls are numerous:

1. You can change your underlying IP addresses, without affecting how your carriers provision you. This gives you the ability to move or add IP addresses without any involvement from your carrier relationships, the power is yours. 

2. If setup properly, a single DNS name can load balance traffic traffic using fairly intelligent weighting to multiple endpoints, without relying on your customers to have equipment to perform the even distribution of traffic. Simply put, you get free inbound and outbound load balancing 

3. If setup properly, a single DNS name can be setup to failover between IP addresses transparently. Furthermore, with proper backend integration between your switching layer and DNS, systems can automatically remove a failing IPs from the list of IPs to try. This means more happy customers.

How To

The Peeredge Wholesale platform supports inbound and outbound DNS routing natively, even at volumes typically only seen in short-duration scenarios. However, to take full advantage of it, you need to setup proper DNS records for your switching. Now, many clients of ours request that we set this up for them over our Peeredge.io domain. And this is perfectly wonderful, but many want to use their existing domain to control the integration themselves. For the purposes of this post, I will assume that you want to use your already procured domain to handle your DNS setup. 

Let me preface this further by saying that every DNS provider is different and some do not support manually entered DNS SRV records by their customers. If this scenario, you may wish to switch the DNS for your domain to a more flexible vendor like Route53 from Amazon or Rackspace's cloud DNS.

So lets see how this works.

For the purposes of this example, I will assume that you have 2 IP addresses that you want to consolidate into a single DNS name and that you want the fail over between them and transparently load balance between them. Here is what the SRV records look like.

_sip._udp.carrier.com 60 IN SRV 10 50 5060 sbc1.carrier.com

_sip._udp.carrier.com 60 IN SRV 10 50 5060 sbc2.carrier.com

_sip._udp.carrier.com 60 IN SRV 20 0 5060 backupsbc.carrier.com

Here is what the A-name records look like

sbc1.carrier.com = 192.168.0.10

sbc2.carrier.com = 192.168.0.20

backupsbc.carrier.com = 192.168.0.30

Note: the primary SBCs and backup SBC can be located in different data centers or geographic areas.

While A-names are simple...they simply translate a DNS name to an IP address, the srv records can be more complex.

In this scenario, we are using the root domain as the endpoint...ie carrier.com, but you could be more specific if you wanted to be.

60 = the TTL (time to live)

10 = the priority (the lower the priority, the more often it will be choses...yes counterintuitive)

50 = the weight (how you want to distribute traffic amongst the SBCs)

5060 = the SIP UDP Port of the SBC

sbc1.carrier.com = the DNS name of the SBC

If you notice that our backup carrier has a priority of 20 and a weight of 0. This means that it is only chosen if both primary SBCs fail, because it has a lower priority and an "all-else-fails" weight of 0.

Photo by peepo/iStock / Getty Images
Photo by peepo/iStock / Getty Images

Conclusion

With the proper amount of attention to your DNS setup, the requisite redundancy in your SBCs and carriers that accept DNS naming in their switching, you can build a high-availability wholesale solution that will allow you to sleep peacefully at night.