Categories
Uncategorized

Troubleshooting App Service failed VNET integration and outbound connectivity issues

Overview:

A significant part of a website’s functionality often involves outbound connectivity to dependencies like database, API, etc. Azure App Services have default outbound connectivity to the public Internet using its pool of outbound IPs and a capability to integrate with a VNET to achieve connectivity into a private network, including on-prem.

Two options for VNET integration in multi-tenant App Service currently exist:

I will not discuss these options in great detail here, but instead focus on how to troubleshoot general outbound connectivity issues to both public and private endpoints.

Troubleshooting:

Follow these steps to effectively isolate and troubleshoot a network connectivity problem in an App Service:

  1. Validate hostname resolution.
    • From the Web App’s console, execute the command NAMERESOLVER against the target endpoint’s hostname and verify that it resolves to the expected IP.
      • Ex:
    • The target indicated after Server: is the default DNS server which is used for the lookup, and the output is the result of the resolution. In this example the default DNS server successfully resolves the hostname.
    • In most private connectivity scenarios, a private DNS server will be used for hostname resolution. If the output says Server: Default, then the private DNS server is not being used as the default and must be properly configured.
    • There are two ways to configure a custom DNS server on a Web App. Using the App Setting WEBSITE_DNS_SERVER with value equal to the IP address of the custom DNS server, or by integrating on a VNET and defining a custom DNS server on the VNET.
    • In the following example my app attempts to resolve the same hostname against a custom DNS Server and fails because the DNS server itself is unreachable (because it is a fake IP).
      • Ex:
    • Your Web App must be able to reach its custom DNS server on port 53 to resolve DNS. If you attempt to TCPPING the custom DNS server on port 53 and it is unreachable, then it is not currently being used for DNS resolution and connectivity against those hostnames will fail.
      • Ex:
    • Now that we have tested and confirmed out DNS resolution is working properly, we can move on to testing raw networking connectivity.
  2. Test connectivity against the endpoint.
    • From the Web App’s console, execute the command TCPPING against the target endpoint on the specific port that the service runs on. (1433 for SQL, 443 for HTTPS, etc).
    • A successful result will show the time taken to receive a response from the target endpoint:
      • Ex:
    • A failure will show an error message in the output, either a timeout, server reset connection, or similar problem. It is important to observe and consider the exact error message seen when running this test, and research in what scenarios this error message occurs.
      • Ex:
    • It is important to understand that the output here is not the final answer on connectivity. Always test your application and observe what is the specific error it encounters when trying to reach an endpoint. For example, there are many different reasons connectivity against a SQL database could fail that are not specifically TCP failures.
  3. Pause a moment and consider what the data suggests about the problem. Ask yourself these questions:
    1. Are we expecting this connectivity to occur across the public Internet or the private VNET?
    2. Is this the only endpoint that is failing? Are there other endpoints on the public Internet or within the private VNET where I am able to successfully establish connection? If other private endpoints are reachable, then we can conclude that the VNET integration itself is successful, but this specific endpoint is unreachable for another reason.
    3. Are other apps connecting successfully, or are all failing with this identical setup?
    4. Understanding this will help us to isolate where changes must be made to resolve the issue.
    5. If the issue is isolated to a single endpoint within a VNET and other endpoints work successfully, focus investigation on any firewall or route that could be impacting that specific target.
    6. If all connections are failing with the same error message, this indicates a problem with the VNET integration itself, or other factors in the overall networking infrastructure.
  4. If the VNET integration appears to be a failure, your mitigation depends on what type of VNET integration you are using.
    • Point-to-site gateway required VNET integration:
      • Try syncing certificates under the App Service Plan > Networking blade in the Azure portal
      • Disconnect, delete the AppServiceCert seen in the VPN Gateway under P2S Settings, and then reconnect to regenerate the integration cert. Note that this may cause interruption of any other integrated apps. Take these steps at your own risk and understand the consequences.
      • Reset the VPN Gateway. This will take ~15 minutes to complete and can interrupt connectivity of other services using the gateway.
    • Regional VNET integration:
      • Try scaling up/down the App Service, to move to a new set of worker machines where VNET integration will re-occur, and test again. This is a good test to determine if something has failed with the initial VNET integration process.
      • Disconnect, and reconnect to the same subnet
      • Disconnect, delete the previous integration subnet, create a new subnet with a different name, integrate on that subnet, and test again. This will force the integration process to occur from scratch.
  5. If the VNET integration itself is not the source of the problem, no amount of troubleshooting there will help to fix the problem. Focus investigation on the target endpoint itself and any firewall that could be interfering with the connection.

Conclusion

The most important step to solving a network connectivity problem is to understand and isolate the issue. Is the failure with the client’s VNET integration itself, or on the server side of the endpoint?

Read more about App Service networking commands here: https://blogs.msdn.microsoft.com/waws/2017/07/24/networking-related-commands-for-azure-app-services/

3 replies on “Troubleshooting App Service failed VNET integration and outbound connectivity issues”

After two days of trying to troubleshoot Azure Devops and its tools indicating SNAT exhaustion issues, I read this article and it saved my life! Thank you! It was DNS problem!

Leave a Reply

Your email address will not be published. Required fields are marked *