Skip to main content

It’s not DNS: Ensuring high availability in a hybrid cloud environment


Our customers have multi-faceted requirements around DNS forwarding, especially if they have multiple VPCs that connect to multiple on-prem locations. As we discussed in an earlier blog post, we recommend that customers utilize a hub-and-spoke model, which helps get around reverse routing challenges due to the usage of the Google DNS proxy range.

But in some configurations, this approach can introduce a single point of failure (SPOF) within the hub network, and if there are connectivity issues within your deployment, it could cause an outage in all your VPC networks. In this post, we’ll discuss some redundancy mechanisms you can employ to ensure that Cloud DNS is always available to handle your DNS requests.

A non-redundant hub and spoke DNS architecture.jpg

Figure 1.1 - A non-redundant hub and spoke DNS architecture

Adding redundancy to the hub-and-spoke model

If you need a redundant hub-and-spoke model, consider a model where the DNS-forwarding VPC network spans multiple Google Cloud regions, and where each region has a separate path (via interconnect or other means) to the on-prem network. In the image below, VPC Network H spans us-west1 and us-east1 and each region has a dedicated Interconnect to the customer’s on-prem network. The other VPC networks are then peered with the hub network.

A highly available hub-and-spoke architecture.jpg

Figure 1.2 - A highly available hub-and-spoke architecture

This scenario provides highly available DNS capabilities, allowing the VPC to egress queries out of either interconnect path, and allowing return queries to return via either interconnect path. The outbound request path always leaves Google Cloud via the nearest interconnect location to where the request originated (unless a failure occurred, at which point it uses the other interconnect path). 

Note, while Cloud DNS will always route the request back to on-prem through the interconnect closest to the region, the responses back from the on-prem network to Google Cloud will depend on your WAN routing. In cases with equal cost routing in place, you may see asymmetric routing behaviors on the return responses, which take a different path than the way they went, and may introduce additional resolution latencies in some cases. 

Alternative DNS setups

A highly available hub-and-spoke model isn’t an option for all companies, though. 

Some organizations’ IP address space consists of a mixture of address blocks across many locations. This often happens to companies as a result of a merger or acquisition, which can make it difficult to set up a clean geo-based DNS. Let’s look at a different DNS setup and how customers may have to adapt for failures of the DNS stack.

To understand the problem, consider the case of a Google Cloud customer that was managing U.S. East Coast DNS resolvers for East Coast-based VPCs, and U.S. West Coast resolvers for West Coast-based VPCs, in order to reduce latency for DNS queries. The challenge arose when it came time to build out redundancy. Specifically the customer wanted a third set of resolvers to provide backup for both east and west coast resolvers in the event of a failure of either of the resolvers.  

Unfortunately, a setup like Figure 1.3 could cause issues in a failure scenario.

Multiple Hub and Spokes.jpg

Figure 1.3 - Multiple Hub and Spokes With a Single Set Of Backup DNS Resolvers

In this setup, the failure of the West Coast DNS resolvers would result in traffic being forwarded to the backup servers running in the central US, with the source IP addresses for these DNS requests corresponding to Google Cloud’s DNS proxy server address range ( But because there are two VPCs and the WAN sees two different routes to get back to the Google Cloud DNS proxy server address range, it would typically route the return requests back via the closest link advertising the Google Cloud DNS proxy IP range. In this case, that would be the east coast interconnect. And because the east coast interconnect connected to a different VPC than originated the request, the response would be dropped by the Google Cloud DNS proxies (since the Virtual Network ID (VNID) of the return packets would be different from the VNID for the east coast VPC). The problem herein lies with the routing and subnet advertisements, not the DNS layer itself.  

So the question becomes, how do you support network topologies with multiple VPCs and DNS resolvers while still providing HA DNS resolvers on-premise?

One approach is to proxy the DNS request as shown in Figure 1.4 below. By forwarding all DNS requests to a proxy setup within the VPC (or even within a specific subnet, depending on your desired granularity), you end up with VPC-specific IP addresses making it easy for the on-prem infrastructure to correctly send their responses back to the correct VPC. This also simplifies on-prem firewall configurations because you no longer need to open them up for Google’s DNS proxy IP range. Since you can specify multiple IP addresses for DNS forwarding, you can run multiple proxy VMs for additional proxy redundancy and further bolster your availability.

Insertion of Proxy VM For HA DNS Configuration.jpg

Figure 1.4 - Insertion of Proxy VM For HA DNS Configuration

Highly available DNS: the devil is in the details

DNS is a critical capability for any enterprise, but setting up highly available DNS architectures can be complex. It’s easy to build a highly redundant DNS stack that can handle many failure scenarios, but overlook the underlying routing until something fails and DNS queries are unable to resolve. When designing a DNS architecture for a hybrid environment, be sure to take a deep look at your underlying infrastructure, and think through how failure scenarios will impact DNS query resolution. 


Popular posts from this blog

People are going wild for a handy new shortcut that will change the way you use Google Docs

- Google has introduced new URLs that can open up blank Google Docs with the click of a button. - To try it out, simply point your browser to  or other Google URLs. - Here's an incomplete list of these new URLs, along with a way to take the shortcut to the next level. Last month, Google rolled out a new time-saving shortcut for anyone who spends a lot of time in Google Docs. To open a new, blank document — or spreadsheet, or presentation — all you have to do is go to one of Google's handy new URLs. So if you want to start a new document, you just have to type " " into your browser. Google Docs ✔ @googledocs Introducing a .new time-saving trick for users. Type any of these .new domains to instantly create Docs, Sheets, Slides, Sites or Forms ↓ 9:35 PM - Oct 25, 2018 4,550 2,812 people are talking about this Twitter Ads info and privacy Here&#

Set start times and import reminders in Tasks

Here comes one of the most awaited features. Tasks is one of the goals to follow what you have to do in G Suite. These new updates will help ensure the majority of your to-dos are in Tasks, and guarantee that you can monitor the due dates related with them. Moreover, importing reminders to Tasks can support your users if your association is at present changing from Inbox to Gmail. Set a date and time for your tasks and receive notifications - You’ll find a place to add date & time. Create repeating tasks - Also you can make an event recur. Import reminders into Tasks This import tool will pull your reminders (from Inbox/Gmail, Calendar, or the Assistant) into Tasks.When importing reminders into Tasks, we’ll copy over the title, date, time and recurrence of the reminder. Please note, reminders with locations associated will not be imported. Additionally, this is a one-time import and not a constant sync. - When you open Tasks on the web or your mobile app, you’ll se

Use Vault for Gmail Confidential Messages and Jamboard Files

Google vault will be supporting two new formats in the future, Gmail confidential mode emails & Jamboard files stored in Google Drive. Google Vault gives you a chance to retain, hold, search, and export data to support your organization’s retention and eDiscovery needs. This dispatch includes support for new information types with the goal that you can thoroughly oversee your association's information. What happens when individuals in your association sends confidential messages? Vault can hold, retain, search, and export all confidential mode messages sent by users in your association. Messages are constantly accessible to Vault, notwithstanding when the sender sets a termination date or denies access to private messages. Here’s an example of what will see in Vault when they search for and preview this email sent by . But It’ll not work vise versa. Admins can hold, retain, search and export message headers and s