Dynamically Scaling Windows Azure - Sample Application
Grzegorz Gogolowicz, Trent Swanson
A key benefit of cloud computing is the ability to add and remove capacity as and when it is required. This is often referred to as elastic scale. Being able to add and remove this capacity can dramatically reduce the total cost of ownership for certain types of applications; indeed, for some applications cloud computing is the only economically feasible solution to the problem.
The Windows Azure Platform supports the concept of elastic scale through a pricing model that is based on hourly compute increments. By changing the service configuration, either through editing it in the portal or through the use of the Management API, customers are able to adjust the amount of capcity they are running on the fly.
A question that is commonly asked is how to automate the scaling of a Windows Azure application. That is, how can developers build systems that are able to adjust scale based on factors such as the time of day or the load that the application is receiving? The goal for the “Dynamic Scaling Windows Azure” sample is to show one way to implement an automated solution to handle elastic scaling in Azure. Scaling can be as simple as running X amount of server during weekdays and Y amount during weekends, or very complex based on several different rules. By automating the process, less time needs to be spent on checking logs and performance data manually, and there is no need for an administrator to manually intervene in order to add and remove capacity. An automated approach also has the benefit of always keeping the instance count at a minimum while still being able to cope with the current load- it allows the application to be run with a smaller amount of ‘headroom’ capacity and therefore at a lower cost. The sample as provided below scales based on 3 different variables, but the framework has been designed to be flexible in terms of future requirements.
In this paper we will discuss the concept of dealing with variable load through rule based scaling and then explain in detail the architecture, implementation and use of the provided sample code. We finish by discussing potential extensions to the framework to support other scenarios.
The load placed on most applications is quite stochastic; however, at a higher level most applications will display some broad trends in load. Cloud computing is particularly well suited to the following types of load pattern.
Dealing with variable load in a Windows Azure application will take two broad forms:
In systems that have highly volatile load across short time periods it will generally be necessary to run additional capacity at all times. Because it takes a period of time for a Windows Azure instance to startup and because they are charged by the hour, any load variance, up and then down, that occurs throughout each hour is best handled by running enough instances to handle the hourly peak load. We shall refer to this capacity as headroom.
Adding and removing instances
When load exceeds the available headroom then it will be necessary to add additional instances to meet demand. In Windows Azure this is achieved by changing the instance count in the service configuration. Increasing the instance count will cause Windows Azure to start new instances, decreasing the instance count will in turn cause it to shut instances down.
Windows Azure supports the concept of Small, Medium, Large and Extra Large instances. These have 1, 2, 4 and 8 processor cores respectively along with correspondingly increased amounts of RAM and local storage. While some applications may benefit from the use of larger sized Azure instances to improve throughput they are less useful for achieving elastic scale due to the fact that a change in VM size currently requires a full redeploy of the service package.
Decisions around when to scale and by how much will be driven by some set of rules. Whether these are implicit and codified in some system or simply stored in the head of a systems engineer they are rules nonetheless. In this section we discuss those rules and the metrics that drive them.
Rules for scaling a Windows Azure application will typically take one of two forms. The first is a rule that specifies to add or remove capacity from the system at a given time- for example: run 20 instances between 8am and 10am each day and 2 instances at all other times. The second form is rules that are based around a response to metrics and this form warrants more discussion.
A metric based rule involves monitoring aspects of the system that can change over time and making decisions to scale up or down on that basis. The following sets out some of the broad categories of metrics that may be considered.
Metrics will need to be collected from a number of different sources. Some metrics will require post processing or analysis such as smoothing or derivation. Many metrics will be collected through the use of the Windows Azure diagnostics mechanism (http://msdn.microsoft.com/en-us/library/ee758705.aspx)- this supports the collection of metrics such as performance counters, logs and crash reports.
When using diagnostics for the purposes of auto-scaling it is important to remember the performance impact that is incurred in actually collecting each data point. While it may be appealing to collect large amounts of performance data from each instance in order to guide elastic scaling it must be remembered that each data point carries a cost of collection. It is beyond the scope of this article to discuss this aspect in more detail but additional information can be fund here: http://technet.microsoft.com/en-us/library/cc938553.aspx
Some desirable metrics remain difficult to collect. For example it is not yet possible to collect billing information via an API call and therefore collecting a metric of total spend month to date will have to be done through monitoring the number of instances and using this to derive a cost value.
As data is collected it will be subjected to the evaluation of business rules to determine a proposed course of action. These rules may have various levels of complexity. In many cases it will be simple enough to codify rules; for example when a metric crosses a given threshold.
Examples of rules that might be evaluated include the following;
“Run 10 instances minimum and 20 instances maximum”
“If the response time average over 30 seconds exceeds 1000ms then add additional instances”
“If the response time average over 30 seconds drops under 500ms then remove instances”
“Follow these rules until monthly spend exceeds $4000 at which point run the minimum instance count only and the administrator shall be sent an Instant Message via Office Communications Server”
In some cases it may be useful to implement a more sophisticated rules engine- for example there may be a need to chain rules. One suitable approach may be to use the Windows Workflow Foundation rules engine. This (http://msdn.microsoft.com/en-us/library/dd349784.aspx) article provides a good primer on using the WF rules engine for rule processing in .NET applications.
Taking action to scale Windows Azure is exposed through the use of the Management API. Through a call to this API it is possible to change the instance count in the service configuration and in doing so to change the number of running instances.
When adding and removing instances it is important to remember that the Management API is an asynchronous API. This means that once a change has been requested, an application will need to poll the service to determine if and when that change has taken effect. Applications will need to ensure that rules are cognizant of the time delay in adding more capacity and that they do not result in significant excess capacity being added as a result of subsequent evalutation of rules while new instances are being started.
Other actions may include messaging other systems or administrators. For example a rule may specify that an email is sent to an administrator any time the total cost per hour exceeds $100.
The sample as shipped demonstrates an implementation of a simple automatic elastic scaling engine for Windows Azure Compute instances. It is based around two simple business rules; It assumes that the business needs to be able to set a minimum as well as maximum number of instances to be running at any given time. It also assumes that these threshold values may differ between weekdays and weekends. The default configuration sets the limits for weekends lower than for weekdays- the sort of load pattern you might expect or an application targeting business users.This can be changed quite easily to target workloads that are weighted towards the weekends- a fantasy football application for example. Subject to the thresholds set by day of week, the application bases its scaling decisions around the collection of metrics. Specifically it tracks the current length of the processing queue and the current number of requests per second to the webservice that delivers up the application. If the queue length continues to grow over specific time period, a new worker role instance is started. Correspondingly if the number of requests per second crosses a threshold a new web role is started.
The Azure Loading Scaling sample consists of a number of components.
Loyalty Management is a Windows Azure service containing a web role and a worker role. In this sample, it represents a service that needs dynamic scaling to handle the load being put on it. The web role exposes a WCF service that adds items to a queue. The worker role picks items off the queue and processes them. The sample application will scale the instance count for the two roles based on a couple of factors such as amount of messages in the queue and the day of week.
Load Client is a client application (WPF) that calls the WCF service hosted by the Loyalty Management web role. It can be configured to call the Loyalty Management service up to 4 times per second. The Load Client is a simple test harness designed to simulate various levels of load on the application.
Scaling Engine is the part of the sample that is responsible for enforcing the scaling rules. It is built to be hostable both on-premise and in the cloud. For the sample, it has been built as a console application running on-premise. It gathers metrics, such as the number of queued up work items and the number of web requests per second, from the Loyalty Management application. Using these metrics, it determines whether the application needs to scale up or down the number of Azure instances. If a need to scale is determined, it calls the Azure Service Management API to start off this action. While the sample locates this logic in a client application it would also be possible to locate this logic in a Windows Azure worker role- the scaling engine stores all its data in Azure storage
Health Monitoring is a web application that serves two purposes. It serves up a Silverlight client to the user and then exposes the metrics gathered by the Scaling Engine to this Silverlight client via a WCF service. It can be hosted either on-premise or as an Azure application. The Silverlight dashboard displays metric data, pending scaling actions and the number of instances currently running to the user in an easily understandable way.
The sample consists of 3 Visual Studio solutions, each with a certain area of responsibility.
As discussed above the primary logic for the sample application is contained in the ScalingEngine. A number of useful visualization components are also contained in the HealthMonitoring application. This section will discuss the detailed implementation of both of these components of the sample.
The ScalingEngine solution contains the auto-scaling functionality- a LoadClient and the ScalingEngineClient. The LoadClient has a simple purpose; to generate load. It allows the user to input a rate of calls to the LoyaltyService per minute and then executes calls at that rate. It is limited to 240 calls per minute in the current sample. The tool also displays the approximate queue length after each call.
The ScalingEngine is a lot more complicated, even if it might seem simple as it is hosted inside a console application in the sample. It is, as mentioned before, responsible for tracking the current load on the application and then scaling the application up and down accordingly. To enable this it utilizes a couple of configurable objects. First it uses a list of so called MetricProvider objects. The sole responsibility of a MetricProvider is to collect metric data from the Azure based application. As soon as new data is obtained, it is evaluated to determine if there is a need to change the instance count. This is done by the use of another list of custom objects called ScalingLogicProviders- think of these as the encapsulation of a scaling rule. The metric data collected by the MetricProvider is passed to each of the defined ScalingLogicProviders, one at a time. These then take that data collected by the MetricProvider and use some logic to determine if there is a need to initiate a scaling change. If this is the case, the requested scaling change is passed to the last list of objects. They are called TimeLogicProviders and have two responsibilities. First of all, they are responsible for verifying that any scaling change initiated by the ScalingLogicProviders will keep the instance count within the configured values for the current time. If they accept the requested scaling change, the scaling engine makes a call to the Azure Service Management API to initiate the change. The second responsibility for the TimeLogicProviders is to initiate any scaling change needed to stay within the configured max and min values for the current time- for example shutting down instances once the weekend has arrived.
Calls made to the Azure Service Management API are highly privileged operations- they will affect both the performance of an application as well as the bill received at the end of the month. As such this service needs to be secured appropriately. Calls to the Management API are secured using an X509 certificate registered with the Azure account. This is covered in the Installation section of this article.
The Health Monitoring part web and Silverlight application has two responsibilities; it is responsible for serving up the Silverlight application to the client, as well as for hosting a WCF webservice called LogService.svc. The webservice is used by the Silverlight application to get the necessary information from Azure Storage- it could be used by other clients if needed. The Silverlight application takes the data retrieved from the webservice and displays it in a visual dashboard using lists and graphs. The shipped UI incorporates a very strict and simple design; the goal was to provide a UI that can be easily modified. The Silverlight application is built using the Model-View-ViewModel pattern; this is regarded as a best practice UI pattern creating a clean separation between the UI and the UI logic in Silverlight applications. You can find more on the M-V-VM patter here: http://msdn.microsoft.com/en-us/magazine/dd458800.aspx. Another benefit of the M-V-VM pattern is that it makes the UI completely “Blendable”, that is, it is easily modified using Microsoft Expression Blend. All the UI controls that are used in the interface are standard Silverlight controls, although a couple of them have been templated to suit the situation. The only control that is not a Silverlight standard control is the graph control. As Silverlight has no native graph support, the graphing support comes from including the Silverlight Toolkit from Codeplex.
As this is only a sample, there are several areas in this solution that could be extended or modified to suit a business. It would for example be very nice to be able to include the billing aspects of the scaling in the logic. Unfortunately, at the time of writing, there is no API available to retrieve this information- it may be possible to ‘estimate’ costs based on monitoring the number of instances running.
The sample also lacks any form of notification scheme. Having the ability to notify someone when for example the configured limit has been reached or when an unforeseen problem arises would be very useful. It also lacks any form of DOS attack identification and mitigation scheme.
It would also be a good idea to extend the Silverlight based health monitoring application to include the metrics for the request per second performance counter as this is part of the scaling engines decision to scale in or out.
The scaling engine was built to enable hosting it anywhere. Even though running it in Windows Azure- specifically to have it run as a Worker Role, isn’t shown in the sample, it is a definite possibility. This removes the dependency of running locally on some machine; if this machine for some reason were to experience issues, the scaling of the solution would grind to a halt. The main drawback of doing it like this is that the solution will need an extra server running in Azure just to host the scaling client. For a big solution that always runs a lot of servers constantly, adding one extra is not a problem. But if the solution runs on a few servers, adding an extra for the scaling client would mean a substantial increase in cost- there are various techniques that can be used to allow a worker role to run a variety of different workloads.
makecert -r -pe -a sha1 -n "CN=Windows Azure API Authentication Certificate" -ss My -len 2048 -sp "Microsoft Enhanced RSA and AES Cryptographic Provider" -sy 24 c:\public.cer
The sample code uses a couple of external open source libraries, which have been included in the solution. They are however included as compiled assemblies. If there is a need to debug these parts of the solution for some reason, the source code needs to be downloaded and included in the solution.
The first external library called Microsoft.Samples.WindowsAzure.ServiceManagement.dll and is available at http://code.msdn.microsoft.com/windowsazuresamples. It includes functionality to talk to the Service Management API in an easy way.
The second external assembly is called System.Windows.Controls.DataVisualization.Toolkit and contains the charting parts used by the Health Monitorin UI. It is a part of the Silverlight Toolkit which can be found at http://silverlight.codeplex.com/.