No Swimming for 30 Minutes After Azure DNS Changes

The Windows Azure team recently published Windows Azure Domain Name System Improvements.  If you only saw this on Twitter, or just took a cursory glance the announcement, you probably just thought, “Good.  Azure’s DNS will be faster.” and moved on to something else.  It’s important to note, however, that the announcement also includes some gotchas.

DNS Entry Propagation

Although DNS entry propagation delays are not new, you’ll need to keep the following issues in mind regarding Azure’s DNS infrastructure:

  1. New DNS entries may take up to 2 minutes to propagate through all of Azure’s global DNS system
  2. Systems that cache responses to DNS queries (such as web browsers, DNS Client, etc.), typically cache responses for about 15 minutes.

So, What’s the Problem?

These issues can add up to significant pain when you are launching a new Azure application, service, etc.  The end users of the application or service may receive errors due to these timing issues, and first impressions are often lasting impressions.

The problem due to the first item above is the more obvious: When you create a new entry in Azure’s DNS, you should wait at least two minutes before expecting DNS queries will receive the right answer.  Any DNS queries occurring inside this two minute window may encounter an “entry not found,” “DNS Not Found” or similar error.  When this error situation occurs, it can be compounded by the second problem…

Most systems which query DNS (e.g., Windows’ DNS Client) will also cache the response for a period of time – typically 15 minutes.  If a system queries DNS inside the 2 minute window and then caches the resulting error, subsequent queries for that entry will typically be resolved by the cache and will respond with the error.  If you try to use the name in your web browser, for example, it may take as much as 15 minutes until you stop getting the cached error and actually get the right answer.

You can manually flush the local DNS cache on Windows systems using ipconfig /flushdns, but be aware that this method does not impact other caching systems along the DNS resolution route.

Theoretically, an additional 15 minutes could be incurred if the route for DNS resolution changes between queries.  But the incidence of this scenario is very rare.

Risk Reduction

The best practice for reducing the risk of these errors occurring – and the resulting negative impressions – is to get your DNS house in order in advance.  Simply registering the appropriate names (DNS entries) well in advance of publishing the application or system should eliminate these issues.  Early registration also gives your team an opportunity to verify / test the name resolutions in advance.

In a situation in which we were forced to “live on the edge” and quick-publish software, we would wait at least 30 minutes before announcing that the software was ready to access.