vCHS – Implementing Microsoft WSFC and SQL 2012 AlwaysOn Availability Groups in Public Clouds

Amazon Web Services recently released a white paper describing how to deploy highly available SQL 2012 instance in AWS Cloud. It is quite interesting reading as Amazon AWS Cloud is not usually associated with running traditional enterprise applications that are not designed for fail. This is mainly because AWS does not offer infrastructure level high availability as is common with clouds based on vSphere which offer by default vSphere HA.

Microsoft SQL Server 2012 AlwaysOn Availability Groups are built on top of Microsoft Windows Server Failover Clustering (WSFC) but do not require any shared block storage or storage replication and therefore are good candidate for deployment into public clouds which do not offer such highly available infrastructure. While reading the white paper I wondered what would be the experience and difference of deploying such setup into vCloud type public cloud. As i am currently testing VMware vCloud Hybrid Service Cloud (vCHS) I just for fun deployed SQL 2012 cluster into it.

Business Drivers

Before I will describe the exercise let’s talk about the reasons why to do this. We want to deploy an application into public cloud that requires SQL database with requirements for availability higher than the cloud provider is offering. vCloud clouds with vSphere High Availability usually provide 99.9% availability but that might be not enough if higher availability is required or if we want geo-protection. vSphere HA also does not help if we want to do rolling patch updates of the underlying Microsoft OS (every patch Tuesday).

We can deploy multiple WSFC cluster nodes with SQL DB running on each one into multiple availability zones or multiple regions of one vCloud provider or even to clouds provided by different vCloud providers.

Architecture

In short SQL AlwaysOn Availability Groups are logical containers of databases and a unit of failover. Each set of availability database is hosted by an availability replica where there is always one read-write primary replica and one or more read-only secondary replicas ready for failover. Primary replica sends transaction logs to the secondary replicas so there is no need for shared storage. AlwaysOn Availability Groups run on top of WSFC cluster who takes care of the resource monitoring, quorum and failover of the resources in case of node failure.

I have closely followed the whitepaper design mentioned above. Using two cloud availability zones connected with VPN connectivity (provided by Org VDC Edge Gateways) I have deployed two node cluster in different subnets. As all the nodes need to be in the same Active Directory domain I have also set up in each cloud replicated domain controller/DNS. As we have even number of cluster nodes it is also necessary to provide a file share witness. For simplicity I have used one provided by one of the domain controllers but this might not be a good strategy for production deployments as failure of the availability zone where the node and the witness file share are running would render the cluster unusable as the surviving node would not be able to establish majority and the fail over would have to be forced manually. In production we would deploy the witness file share or third node into third availability zone (on-premise data center).

i have not include any application logic and all my testing was done from on-premise simple ODBC connected test application.vCHS VSFC SQL Always On

Deployment Steps

Here are high level deployment steps (more detail around the MSFC and SQL configuration is in the Amazon whitepaper).

  1. Create OrgVDC networks in both clouds
    OrgVDC networks
  2. On Edge Gateways establish Virtual Private Network between clouds
    VPN
  3. On Edge Gateways create firewall rules in both clouds to allow communication between all networks.
    Firewall Rules
    As can be seen from the screenshot I did not go very granular just for the sake of implicity and lazines.
  4. Deploy VMs (Windows 2012) for the domain controllers and create the domain. Create also DNAT rule on the Edge Gateways to be able to access domain controllers from the internet (RDP: TCP port 3389) so they can act as jump servers. Optionally create SNAT rule to be able to reach internet (for updates, downloads, etc.)
    Domain Controller vApp
    On the screenshot below you can see also DNAT rule for the SQL AvailabilityGroup Listener (port 5023).
    NAT Rules
    I have used 10.9.2.101 and 10.10.2.101 IP addresses for the Windows Node VMs, 10.9.2.102 and 10.10.2.102 for the WSFC cluster nodes and 10.9.2.103 and 10.10.2.103 for the avaliability group listener.
  5. Create (witness) file share on the first domain controller
  6. Deploy cluster nodes, add them to domain and create the cluster with Node and File Share Majority Quorum.
    WSFC
  7. Install SQL 2012 on both nodes, and enable AlwaysOn Availability in SQL Server Configuration Manager.
    SQL AlwaysOn Availability
  8. On one of the nodes with SQL Server Management Studio create test SQL database in full recovery mode.
  9. Still in SQL Server Management Studio create the Availability Group. For initial database synchronization (full backup and restore) we will need another share (Replica) which I also created on the first domain controller. We will also setup listener.
    Availability Group Configuration

As was mentioned in the step 4 I have created DNAT rules for the listener IP addresses in each cloud and thus published the database to the internet. On the client PC I have installed SQL ODBC 11 driver that supports always on availability and set up the database connection. As my client was not in the same domain I had to create a DNS records for the listener IPs and connect with SQL authentication. As the listener IPs are in different subnets, multi subnet failover must be enabled in the driver which means that it will try to create parallel connections to all listener IP addresses which will result in much faster failover. in my tests (I tried node failures and split brain scenario) it took about 10 seconds.

Multi Subnet Failover

ODBC configuration

That is it. I recommend following Microsoft whitepaper for more information about SQL High Availability.

In the Amazon whitepaper the cloud infrastructure creation and configuration steps (networks, NAT instances, security groups, IP assignments) were done with (fairly complex) CloudFormation templates. I am not aware of any similar tool for vCloud that would create with vCloud API Org VDC infrastructure snapshot which could be reused to quickly redeploy to another Org VDC but it did not take me that much time to create it through vCloud GUI.

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s