Dynamically Scaling Windows Azure - Sample Application


Grzegorz Gogolowicz, Trent Swanson

Introduction

A key benefit of cloud computing is the ability to add and remove capacity as and when it is required. This is often referred to as elastic scale. Being able to add and remove this capacity can dramatically reduce the total cost of ownership for certain types of applications; indeed, for some applications cloud computing is the only economically feasible solution to the problem.

The Windows Azure Platform supports the concept of elastic scale through a pricing model that is based on hourly compute increments. By changing the service configuration, either through editing it in the portal or through the use of the Management API, customers are able to adjust the amount of capcity they are running on the fly.

A question that is commonly asked is how to automate the scaling of a Windows Azure application. That is, how can developers build systems that are able to adjust scale based on factors such as the time of day or the load that the application is receiving?  The goal for the “Dynamic Scaling Windows Azure” sample is to show one way to implement an automated solution to handle elastic scaling in Azure. Scaling can be as simple as running X amount of server during weekdays and Y amount during weekends, or very complex based on several different rules.  By automating the process, less time needs to be spent on checking logs and performance data manually, and there is no need for an administrator to manually intervene in order to add and remove capacity.  An automated approach also has the benefit of always keeping the instance count at a minimum while still being able to cope with the current load- it allows the application to be run with a smaller amount of ‘headroom’ capacity and therefore at a lower cost. The sample as provided below scales based on 3 different variables, but the framework has been designed to be flexible in terms of future requirements.

In this paper we will discuss the concept of dealing with variable load through rule based scaling and then explain in detail the architecture, implementation and use of the provided sample code. We finish by discussing potential extensions to the framework to support other scenarios.

Types of Variable Load

The load placed on most applications is quite stochastic; however, at a higher level most applications will display some broad trends in load. Cloud computing is particularly well suited to the following types of load pattern.

Dealing with Variable Load

Dealing with variable load in a Windows Azure application will take two broad forms:

Maintaining Headroom
In systems that have highly volatile load across short time periods it will generally be necessary to run additional capacity at all times. Because it takes a period of time for a Windows Azure instance to startup and because they are charged by the hour, any load variance, up and then down, that occurs throughout each hour is best handled by running enough instances to handle the hourly peak load. We shall refer to this capacity as headroom.

Adding and removing instances
When load exceeds the available headroom then it will be necessary to add additional instances to meet demand. In Windows Azure this is achieved by changing the instance count in the service configuration. Increasing the instance count will cause Windows Azure to start new instances, decreasing the instance count will in turn cause it to shut instances down.

Windows Azure supports the concept of Small, Medium, Large and Extra Large instances. These have 1, 2, 4 and 8 processor cores respectively along with correspondingly increased amounts of RAM and local storage. While some applications may benefit from the use of larger sized Azure instances to improve throughput they are less useful for achieving elastic scale due to the fact that a change in VM size currently requires a full redeploy of the service package.

Rule Based Scaling

Decisions around when to scale and by how much will be driven by some set of rules. Whether these are implicit and codified in some system or simply stored in the head of a systems engineer they are rules nonetheless. In this section we discuss those rules and the metrics that drive them.

Rules for scaling a Windows Azure application will typically take one of two forms. The first is a rule that specifies to add or remove capacity from the system at a given time- for example: run 20 instances between 8am and 10am each day and 2 instances at all other times. The second form is rules that are based around a response to metrics and this form warrants more discussion.

Types of Load Metrics

A metric based rule involves monitoring aspects of the system that can change over time and making decisions to scale up or down on that basis.  The following sets out some of the broad categories of metrics that may be considered.

Gathering Metrics

Metrics will need to be collected from a number of different sources. Some metrics will require post processing or analysis such as smoothing or derivation. Many metrics will be collected through the use of the Windows Azure diagnostics mechanism (http://msdn.microsoft.com/en-us/library/ee758705.aspx)- this supports the collection of metrics such as performance counters, logs and crash reports.

When using diagnostics for the purposes of auto-scaling it is important to remember the performance impact that is incurred in actually collecting each data point. While it may be appealing to collect large amounts of performance data from each instance in order to guide elastic scaling it must be remembered that each data point carries a cost of collection. It is beyond the scope of this article to discuss this aspect in more detail but additional information can be fund here: http://technet.microsoft.com/en-us/library/cc938553.aspx 

Some desirable metrics remain difficult to collect. For example it is not yet possible to collect billing information via an API call and therefore collecting a metric of total spend month to date will have to be done through monitoring the number of instances and using this to derive a cost value.

Evaluating Business Rules

As data is collected it will be subjected to the evaluation of business rules to determine a proposed course of action. These rules may have various levels of complexity. In many cases it will be simple enough to codify rules; for example when a metric crosses a given threshold.

Examples of rules that might be evaluated include the following;

“Run 10 instances minimum and 20 instances maximum”
“If the response time average over 30 seconds exceeds 1000ms then add additional instances”
“If the response time average over 30 seconds drops under 500ms then remove instances”

“Follow these rules until monthly spend exceeds $4000 at which point run the minimum instance count only and the administrator shall be sent an Instant Message via Office Communications Server”

In some cases it may be useful to implement a more sophisticated rules engine- for example there may be a need to chain rules. One suitable approach may be to use the Windows Workflow Foundation rules engine. This (http://msdn.microsoft.com/en-us/library/dd349784.aspx) article provides a good primer on using the WF rules engine for rule processing in .NET applications.

Taking Action

Taking action to scale Windows Azure is exposed through the use of the Management API. Through a call to this API it is possible to change the instance count in the service configuration and in doing so to change the number of running instances.

When adding and removing instances it is important to remember that the Management API is an asynchronous API. This means that once a change has been requested, an application will need to poll the service to determine if and when that change has taken effect. Applications will need to ensure that rules are cognizant of the time delay in adding more capacity and that they do not result in significant excess capacity being added as a result of subsequent evalutation of rules while new instances are being started.

Other actions may include messaging other systems or administrators. For example a rule may specify that an email is sent to an administrator any time the total cost per hour exceeds $100.

Auto Scaling Sample Application

The sample as shipped demonstrates an implementation of a simple automatic elastic scaling engine for Windows Azure Compute instances. It is based around two simple business rules; It assumes that the business needs to be able to set a minimum as well as maximum number of instances to be running at any given time. It also assumes that these threshold values may differ between weekdays and weekends. The default configuration sets the limits for weekends lower than for weekdays- the sort of load pattern you might expect or an application targeting business users.This can be changed quite easily to target workloads that are weighted towards the weekends- a fantasy football application for example. Subject to the thresholds set by day of week, the application bases its scaling decisions around the collection of metrics. Specifically it tracks the current length of the processing queue and the current number of requests per second to the webservice that delivers up the application. If the queue length continues to grow over specific time period, a new worker role instance is started. Correspondingly if the number of requests per second crosses a threshold a new web role is started.

Design Overview

Architecture

The Azure Loading Scaling sample consists of a number of components.

Loyalty Management is a Windows Azure service containing a web role and a worker role. In this sample, it represents a service that needs dynamic scaling to handle the load being put on it. The web role exposes a WCF service that adds items to a queue. The worker role picks items off the queue and processes them.  The sample application will scale the instance count for the two roles based on a couple of factors such as amount of messages in the queue and the day of week.

Load Client is a client application (WPF) that calls the WCF service hosted by the Loyalty Management web role. It can be configured to call the Loyalty Management service up to 4 times per second. The Load Client is a simple test harness designed to simulate various levels of load on the application.

Scaling Engine is the part of the sample that is responsible for enforcing the scaling rules. It is built to be hostable both on-premise and in the cloud. For the sample, it has been built as a console application running on-premise. It gathers metrics, such as the number of queued up work items and the number of web requests per second, from the Loyalty Management application. Using these metrics, it determines whether the application needs to scale up or down the number of Azure instances. If a need to scale is determined, it calls the Azure Service Management API to start off this action. While the sample locates this logic in a client application it would also be possible to locate this logic in a Windows Azure worker role- the scaling engine stores all its data in Azure storage

Health Monitoring is a web application that serves two purposes. It serves up a Silverlight client to the user and then exposes the metrics gathered by the Scaling Engine to this Silverlight client via a WCF service. It can be hosted either on-premise or as an Azure application. The Silverlight dashboard displays metric data, pending scaling actions and the number of instances currently running to the user in an easily understandable way.

Solution Structure

The sample consists of 3 Visual Studio solutions, each with a certain area of responsibility.

  1. The solution named LOBApp contains the projects that represent the line of business application. This project contains an Azure service with one Azure web role and one worker role. The Web Role exposes a webservice with a single method that places a message on a queue. The worker role then takes one message off the queue and simulates some processing occuring.
  2. The ScalingEngine solution contains two projects;
  1. a console application called ScalingEngineClient, and a Window Presentation Foundation (WPF) application called LoadClient. The ScalingEngineClient is responsible for most of the work in this sample- it houses the core auto scaling logic. It is responsible for continuously monitoring the queue length, the requests per second performance counter as well as current instance count. It takes these metrics and saves them to 2 tables in azure table storage. It also uses them to determine, based on very simple business rules in code, whether any of the roles needs to scale up or down. If any scaling changes are needed, it uses the Azure Service Management API to initiate the change.
  2. The LoadClient is a simple WPF application responsible for simulating load. It allows the user to determine the amount of load to simulate and then starts calling the LoyaltyService the desired amount of times per second. It also lets the user to track the current queue length.
  1. The HealthMonitoring solution includes a ASP.NET webapplication as well as a Windows Azure web role, making it possible to host it both on-premise and in Azure. The solution also contains a Silverlight application that is then hosted within the web role. The HelathMonitoring service makes it possible to watch the current state of the solution in close to real time. The Silverlight client uses a simple to understand UI, allowing an end user to monitor the current queue length, instance count as well as any scaling changes, currently happening as well as previously executed.

Implementation Details

As discussed above the primary logic for the sample application is contained in the ScalingEngine. A number of useful visualization components are also contained in the HealthMonitoring application. This section will discuss the detailed implementation of both of these components of the sample.

The ScalingEngine solution contains the auto-scaling functionality- a LoadClient and the ScalingEngineClient. The LoadClient has a simple purpose; to generate load. It allows the user to input a rate of calls to the LoyaltyService per minute and then executes calls at that rate. It is limited to 240 calls per minute in the current sample. The tool also displays the approximate queue length after each call.

The ScalingEngine is a lot more complicated, even if it might seem simple as it is hosted inside a console application in the sample. It is, as mentioned before, responsible for tracking the current load on the application and then scaling the application up and down accordingly. To enable this it utilizes a couple of configurable objects. First it uses a list of so called MetricProvider objects. The sole responsibility of a MetricProvider is to collect metric data from the Azure based application. As soon as new data is obtained, it is evaluated to determine if there is a need to change the instance count. This is done by the use of another list of custom objects called ScalingLogicProviders- think of these as the encapsulation of a scaling rule. The metric data collected by the MetricProvider is passed to each of the defined ScalingLogicProviders, one at a time. These then take that data collected by the MetricProvider and use some logic to determine if there is a need to initiate a scaling change. If this is the case, the requested scaling change is passed to the last list of objects. They are called TimeLogicProviders and have two responsibilities. First of all, they are responsible for verifying that any scaling change initiated by the ScalingLogicProviders will keep the instance count within the configured values for the current time. If they accept the requested scaling change, the scaling engine makes a call to the Azure Service Management API to initiate the change. The second responsibility for the TimeLogicProviders is to initiate any scaling change needed to stay within the configured max and min values for the current time- for example shutting down instances once the weekend has arrived.

Calls made to the Azure Service Management API are highly privileged operations- they will affect both the performance of an application as well as the bill received at the end of the month. As such this service needs to be secured appropriately. Calls to the Management API are secured using an X509 certificate registered with the Azure account. This is covered in the Installation section of this article.

The Health Monitoring part web and Silverlight application has two responsibilities; it is responsible for serving up the Silverlight application to the client, as well as for hosting a WCF webservice called LogService.svc. The webservice is used by the Silverlight application to get the necessary information from Azure Storage- it could be used by other clients if needed. The Silverlight application takes the data retrieved from the webservice and displays it in a visual dashboard using lists and graphs. The shipped UI incorporates a very strict and simple design; the goal was to provide a UI that can be easily modified. The Silverlight application is built using the Model-View-ViewModel pattern; this is regarded as a best practice UI pattern creating a clean separation between the UI and the UI logic in Silverlight applications. You can find more on the M-V-VM patter here: http://msdn.microsoft.com/en-us/magazine/dd458800.aspx. Another benefit of the M-V-VM pattern is that it makes the UI completely “Blendable”, that is, it is easily modified using Microsoft Expression Blend. All the UI controls that are used in the interface are standard Silverlight controls, although a couple of them have been templated to suit the situation. The only control that is not a Silverlight standard control is the graph control. As Silverlight has no native graph support, the graphing support comes from including the Silverlight Toolkit from Codeplex.

Future consideration

As this is only a sample, there are several areas in this solution that could be extended or modified to suit a business. It would for example be very nice to be able to include the billing aspects of the scaling in the logic. Unfortunately, at the time of writing, there is no API available to retrieve this information- it may be possible to ‘estimate’ costs based on monitoring the number of instances running.

The sample also lacks any form of notification scheme. Having the ability to notify someone when for example the configured limit has been reached or when an unforeseen problem arises would be very useful. It also lacks any form of DOS attack identification and mitigation scheme.

It would also be a good idea to extend the Silverlight based health monitoring application to include the metrics for the request per second performance counter as this is part of the scaling engines decision to scale in or out.

The scaling engine was built to enable hosting it anywhere. Even though running it in Windows Azure- specifically to have it run as a Worker Role, isn’t shown in the sample, it is a definite possibility. This removes the dependency of running locally on some machine; if this machine for some reason were to experience issues, the scaling of the solution would grind to a halt. The main drawback of doing it like this is that the solution will need an extra server running in Azure just to host the scaling client. For a big solution that always runs a lot of servers constantly, adding one extra is not a problem. But if the solution runs on a few servers, adding an extra for the scaling client would mean a substantial increase in cost- there are various techniques that can be used to allow a worker role to run a variety of different workloads.

Installation

Setup Requirements

  1. Visual Studio 2008 SP1
  2. Silverlight 3 Tools for Visual Studio
  3. Windows Azure Tools for Visual Studio
  4. Windows Azure account

How to get it to run

Setting up Window Azure Services

  1. Login into your Windows Azure account at http://windows.azure.com.
  2. Select the Project you will create the new services under.
  3. Select New Service and then choose to create a Storage Account.

  1. Set the the label and description of the new service to AzureScalingStore and click Next.
  2. Set the public name to azurescalingstore followed by some numbers. This has to be globally unique – change the numbers and click Check Availability until you find an available public name.
  3. Choose to create a new affinity group, select a region close to you and give it the name AzureScalingStore. An affinity group will ensure that the different services you create are hosted together.
  4. Click Create.

  1. Copy the table end point and primary access keys. These will be used later when configuring the applications.

  1. Create a new Azure service, selecting a Hosted Service. This hosting service will host the LoyaltyManagement Azure application.
  2. Give the new service the label and description AzureScalingHosting and click Next.
  3. Set the public service name to azurescalinghosting followed by some numbers. Click Check Availability until you find an available public name.
  4. Set the affinity group to the existing affinity group AzureScalingStore.
  5. Click Create.
  6. Create a new Azure service, selecting a Hosted Service. This hosting service will host the Health Monitor website that contains the dashboard.
  7. Give the new service the label and description AzureScalingMon and click Next.
  8. Set the public service name to azurescalingmon followed by some numbers. Click Check Availability until you find an available public name.
  9. Click Create.
  10. You should now have three Azure services and see the following on the Summary screen.

Generating Azure API Certificate

  1. We need to generate an API certificate and register it with Windows Azure. To do that we will use makecert.exe - included with the Windows SDK. Open a Visual Studio 2008 Command Prompt.

  1. Enter the following to generate your certificate. The private key has been registered on your computer and the public key has been written to c:\

makecert -r -pe -a sha1 -n "CN=Windows Azure API Authentication Certificate" -ss My -len 2048 -sp "Microsoft Enhanced RSA and AES Cryptographic Provider" -sy 24 c:\public.cer

  1. Select the Account tab on the Windows Azure website and upload the public key to Azure.

  1. Copy down the certificate thumbprint. We will use this later when configuring the applications.

Configure and Deploy Loyalty Management Azure Application

  1. Open the LOBApp.sln solution found inside the LoyaltyManagement directory.
  2. Inside ServiceConfiguration.cscfg, LoyaltyManagementProcessing’s app.config and LoyaltyManagementService’s web.config replace the QueueConnectionString and DiagnosticsConnectionString value’s with the following (find the public account name and primary access key in the AzureScalingStore service configured earlier):

DefaultEndpointsProtocol=https;AccountName= PUBLIC_ACCOUNT_NAME;AccountKey=PRIMARY_ACCESS_KEY

  1. Build the LOBApp solution then right click the Azure LOBApp project and select Publish…

  1. Visual Studio will launch Internet Explorer. Browse to the AzureScalingHosting service and click Deploy…
  2. On the Production Deployment page click the browse buttons to select the application package(cspkg) and configuration settings (cscfg) files. They can be found in the LoyaltyManagement\bin\Debug\Publish directory. Enter Loyalty 1 as the deployment name.

  1. Click Deploy.
  2. It will take a couple of minutes for Azure to complete deploying the application. Once it has finished click Start to start the deployed application in Azure. Once the services have finished starting you will see something like this:

Configure and Deploy Health Monitor Azure Application

  1. Open the HealthMonitoring.sln solution found inside the HealthMonitoring directory.
  2. Inside ServiceConfiguration.cscfg and MonitoringWebRole’s web.config replace the AccountName setting with the public name of Azure storage account created earlier, the AccountSharedKey setting with the storage account’s access key and the ServiceName setting with the public name of the LoyaltyManagement hosting service created earlier.

  1. Open the ServiceReferences.ClientConfig file and update the end point address to point towards the AzureScalingMon address that was configured earlier:

http://PUBLIC_ACCOUNT_NAME.cloudapp.net/LogService.svc

  1. Deploy the HealthMonitoring application to Azure, repeating the steps earlier for the Loyalty Management application but selecting the health monitoring cskpg and cscfg.

Configure Load and Scaling Engine Client Applications

  1. Open the ScalingEngine.sln solution found inside the ScalingEngine directory.
  1. Open LoadClient’s app.config and replace the QueueStorageConnectionString setting with the following (find the public account name and primary access key in the AzureScalingStore service configured earlier):

DefaultEndpointsProtocol=https;AccountName= PUBLIC_ACCOUNT_NAME;AccountKey=PRIMARY_ACCESS_KEY

  1. Update the endpoint address to point towards the AzureScalingHost address that was configured earlier:

http://PUBLIC_ACCOUNT_NAME.cloudapp.net/LoyaltyService.svc

  1. Open ScalingEngineClient’s app.config and replace the TableStorageConnectionString and storageConnectionString (found inside the PerformanceCounterMetricProvider) settings with the same value just used above for the QueueStorageConnectionString.
  2. Replace the deploymentId setting (found inside the PerformanceCounterMetricProvider) with the Deployment ID on the AzureScalingHosting summary page.

  1. Replace the SubscriptionId setting with your Windows Azure Subscription ID. This can be found on your Azure Account page.

  1. Replace the CertificateThumbprint setting with the thumbprint from the certificate generated earlier. You can find the thumbprint by going to your Azure Account page and selecting Manage API Certificates.

  1. Done!

Running the Azure Load Scaling Applications

  1. Launch the Scaling Engine Client and let it run in the background.

  1. Launch the Load Client.
  2. Enter the number of messages you want to add to the queue per minute and click Start.

  1. Launch Internet Explorer and browse to the Health Monitoring website deployed in Windows Azure.

  1. The Health Monitoring website tracks the number of items on the queue and displays when the Scaling Engine Client actions an increase or decrease in the number of role instances.

Additional Notes

The sample code uses a couple of external open source libraries, which have been included in the solution. They are however included as compiled assemblies. If there is a need to debug these parts of the solution for some reason, the source code needs to be downloaded and included in the solution.

The first external library called Microsoft.Samples.WindowsAzure.ServiceManagement.dll and is available at http://code.msdn.microsoft.com/windowsazuresamples. It includes functionality to talk to the Service Management API in an easy way.

The second external assembly is called System.Windows.Controls.DataVisualization.Toolkit and contains the charting parts used by the Health Monitorin UI. It is a part of the Silverlight Toolkit which can be found at http://silverlight.codeplex.com/.