Over the past year we've amassed a large repository of cloud benchmarks and metrics. Today, we are making most of that data available via web services. This data includes the following:
We are releasing this data in hopes of improving transparency and making the comparison of cloud services easier. There are many ways that this data might be used. In this post, we'll go through a few examples to get you started and let you take it from there.
Our web services API provides both RESTful (HTTP query request and JSON or XML response) and SOAP interfaces. The API documentation and SOAP WSDLs are published here: http://api.cloudharmony.com/api
The sections below are separated into individual examples. This is not intended to be comprehensive documentation for the web services, but rather a starting point and a reference for using them. More comprehensive technical documentation is provided for each web service on our website.
In this first example we'll use the getClouds web service to lookup all available public clouds. The table at the top of the web service documentation describes the data structure that is used by this web service call.
Due to the large amount of data that can be returned, our web services utilize results pagination (similar to to getting multiple pages of results for a web search). The maximum number of results request to this web service will return is 10. You may set a limit lower than 10 using the ws-limit request parameter, but not greater than 10. The example request URIs above return only the first 10 results (as determined by the limit response value). At the time of this writing there were 37 total records (as determined by the count response value). To return the remaining 27 results, utilize the following URIs:
In this post we'll only be showing use of the RESTful API interface. A SOAP interface is also provided. The base API documentation page includes links to WSDLs you may use to import and utilize the SOAP interface (some IDEs let you import WSDLs). The parameters and response structure for SOAP requests are very similar, but not identical to the REST interface (XML names may differ slightly from HTTP request and response names). The WSDL for the getClouds web service is available here: http://api.cloudharmony.com/getClouds/wsdl
In this Example you'll use the same getClouds web service again, but we'll add a few constraints so only clouds with server, storage and CDN services are returned. Unless otherwise stated, from this point forward the same rules with regards to pagination apply (see Example #1 for more details). Additionally, only JSON URIs will be shown (to use XML responses, simply add the parameter ws-format=xml to the URI).
In the example URI above, we used ws-constraint parameters to filter the results. Constraints can be applied to specific attributes defined by the data structure for a given web service. The data structure is documented in a table at the top of the web services documentation page. In this example, we used 3 such attributes: hasServers, hasStorage, and hasContentDelivery. Because these are boolean type attributes, we assigned constraint values 1, signifying that only clouds where those attributes are TRUE should be returned.
The API also supports more complex constraint parameters. This example utilizes the most simple form of constraints by testing for equality and joining the 3 constraints with an AND connective. Constraints can also be used to check that an attribute is less or greater than a desired value, and use of an OR connective if multiple constraints are specified. We'll go into these types of constraints in a proceeding example.
In this Example, only 5 public clouds are returned instead of the 37 returned in the previous example, signifying that only 5 public clouds offer all cloud servers, storage and CDN services.
Each data structure has an attribute that is the unique identifier. This attribute is called the primary key. The getClouds web service can be used to retrieve a specific public cloud if you know the primary key for the cloud. In this example, we'll use this feature to retrieve the AWS (Amazon Web Services) public cloud.
The response is almost identical to the previous 2 requests with the exception that the base response value is not an array. When a web service is invoked for a specific cloud using the primary key as we've done here, the response will always be a single data structure value. This is in contrast to the previous 2 requests that returned multiple clouds using an array as the base data structure.
In this Example we'll use the getCloudServerServices web service to return the cloud server/IaaS service for AWS (Amazon Web Services) - EC2. We know AWS has such a service because the boolean hasServers attribute was true in the previous examples. The API documentation shows that the CH_CloudServer data structure contains an attribute named cloud that references the cloud that the service belongs to. In order for this web service to return only the cloud server service belonging to AWS, we'll just need to add a single ws-constraint for this attribute.
Even though only a single result for EC2 was returned, the base response data structure is an array. This will always be the case when invoking the API for a data structure without a primary key (as we did in Example 4 above), because when this is the case, there is always the possibility that multiple results could be returned.
Suppose you are looking to deploy a Windows server in the cloud. Because the CH_CloudServer data structure has an attribute operatingSystemsSupported that defines which operating systems are supported by that service, we can use it in conjunction with a ws-constraint request parameter to filter the results accordingly. In our previous use of constraints, we used the default equality operator. In this example, we'll need to change the operator to a substring search. This is because the operatingSystemsSupported attribute is an array which may contain multiple values representing all of the operating systems supported by the service (i.e. Linux, Windows, etc.). By using the substring operator, the request will search for services where the operatingSystemsSupported attribute contains Windows. The substring operator is the numeric value 32 (operators are numeric to support multiple operators using bitmasks). The operators supported and their corresponding values are shown on the API documentation.
At the time of this writing, this request returned 14 services that support the Windows operating system.
Example 6: Find other cloud services
In addition to the getCloudServerServices web service discussed in Examples 4-5, the following additional web services are provided: getCloudDatabaseServices, getCloudMessagingServices, getCloudPlatformServices,getCloudStorageServices and getCDNs. The usage for each of these is identical. In the example below we'll use them for various lookups.
We are still in the process of populating vendor profiles for different cloud services. Currently only basic information is provided by the web services. In the future, the data structures for these services will be expanded to include many additional details such as pricing, SLAs, features, technical details, etc.
Up to this point we've been using web services to lookup which clouds and cloud services are available. The remaining Examples will involve retrieving benchmarking related data. To get started, we'll first need to determine which benchmarks are available using the getBenchmarks web service. This web service provides access to information about the benchmarks we conduct. Unlike the previous web services, getBenchmarks does not support the ws-constraint parameters to filter results. This is always the case when the top section of the API documentation page does not show a data structure table. Instead of ws-constraint filters, getBenchmarks supports 4 request parameters (these are shown on the right column of the API documentation table):
Benchmarks are assigned to 1 or more categories. The getBenchmarkCategories web service may be used to obtain all of the available benchmark categories (see Example 8 below).
In this example, we'll retrieve all benchmarks (or at least the first 10 due to pagination).
The response from this web service is an array of benchmarks each containing the follow values:
Aggregate benchmarks are a special type of benchmark that aren't benchmarks themselves, but rather a compilation of multiple benchmark result metrics. This compilation is used in conjunction with a baseline configuration to produce a more comprehensive benchmark metric related to some facet of performance. A more detailed description of aggregate benchmarks and baselines is discussed on our blog. CCU is one such aggregate benchmark discussed here.
Every benchmark is assigned to one or more categories. The getBenchmarkCategories web service returns a list of all possible benchmark category name. This web service is very simple. It does not use any parameters or pagination.
In this example we'll use the same getBenchmarks web service to retrieve only server benchmarks in the category System: CPU (we discovered this category previously using the getBenchmarkCategories web service). To accomplish this, we'll use the serverOnly and category request parameters.
Before attempting to analyze benchmark results, it may be helpful to first determine what benchmark results data is available including which clouds and server configurations have been benchmarked. Generally, we conducted cloud server benchmarking 3-4 times each year. Every benchmark test run has a unique testId. The typical format of a testId is MMYY-[SEQ]. For example, the test 0410-1 was conducted in April 2010. Do determine what tests have been run within clouds the getServerCloudsBenchmarked web service may be used. This web service uses the following parameters:
The return value is an array of cloud server services and the corresponding testing information for those services including testIds and testing dates.
The response from this web service is an array of services and information about the benchmark tests that have been conducted within those services.
In this example, we'll use the same getServerCloudsBenchmarked web service to determine when testing has occurred only in the AWS and GoGrid clouds on or after June 2010. To do so, we'll use the serviceId and start parameters to filter the results. The serviceId parameter can be either the ID of the specific server service or the ID of a cloud.
When it comes down to retrieving cloud server benchmark metrics we'll use the getServerBenchmarkResults web service. This web service requires 2 parameters:
Additionally, the following parameters may optionally be provided:
As you can see, requests using this web service can be quite complex if desired. In this example, we'll keep it simple by using only the benchmarkId and serviceId parameters. The Geekbench benchmark produces a metric that rates CPU and memory performance
The response is an array of benchmark result metrics each consisting of the following values:
In this example, we'll use the dataCenter and lastBenchmarksOnly parameters to return all of the CCU benchmark results for Amazon EC2's APAC region (this region is located in Singapore - hence the dataCenter parameter is set to the ISO 3166 country code SG). Unlike the previous example where multiple test results were returned, in this example because lastBenchmarkOnly is TRUE, the web service will only return a single benchmark value (the values, testDates, testIds and resultsUrls values will not be included in the response). CCU is an aggregate benchmark consisting of many underlying CPU performance related benchmarks as discussed here.
Before proceeding any further with getServerBenchmarkResults examples, we'll demonstrate how to find out what server configurations are available for a given cloud service. This is useful because the getServerBenchmarkResults supports a serverId parameter that can be used to filter benchmark results using a specific server identifier. For example, you may want to compare benchmark results between EC2 m2.4xlarge and GoGrid 16GB cloud servers only.
The getCloudServerConfigurations web service allows you to lookup cloud server configurations. This web service uses a data structure containing various details about cloud servers including CPU, memory, and storage specifications; pricing and more (review the API documentation for full details). Because this web service is based on a data structure, we'll be able to use ws-constraint parameters to filter the results. In this example, we'll use 2 constraints (cloud and dataCenter) to filter the results so that only Amazon EC2 APAC region servers are returned.
Now that we've been able to obtain the identifiers of cloud servers using the getCloudServerConfigurations web service, we can go back to the getServerBenchmarkResults web service and compare cloud servers using those IDs and the serverId parameter. In this example, we'll compare storage IO performance between 4GB Rackspace Cloud and GoGrid cloud servers (gg-4gb and rs-4gb) using the aggregate IOP benchmark. IOP is an an aggregate storage IO benchmark based on 7 IO related benchmarks as documented here. This is benchmark is NOT the same as IOPS. To invoke retrieve the IOP benchmark results for only Rackspace and GoGrid 4GB cloud servers, we'll set the serverId parameter to gg-4gb|rs-4gb (multiple IDs can be specified each separated by a pipe character).
In this example, we'll use the getCloudServerConfigurations to lookup US-based cloud services offering cloud servers with at least 2GB memory and costing $0.10/hr or less. This will involve use of 4 filtering constraints: dataCenter, memory, priceHourly and priceCurrency. In order to apply these constraints, we'll first need to determine what operators should be used.
The dataCenter attribute value is either [state/province], [country] (US or Canada only) OR [country]. Thus, we'll want the dataCenter attribute to "end with" "US". According to the API documentation, the "ends with" operator is 16.
The memory attribute is a numeric value representing the # of gigabytes included with a cloud server. We'll want this attribute to be equal to or greater than 2. The operator for "equal to" is 1. The operator for "greater than" is 2. Thus, an "equal to or greater than" operator is 1+2=3 (bitmask addition).
The priceHourly attribute is also numeric representing the price of the server per hour. We'll want this attribute to be equal to or less than 0.10. The operator for "equal to" is 1, and the operator for "less than" is 4. Thus, an "equal to or less than" operator is 1+4=5.
The priceCurrency attribute is a string representing the currency code for pricing defined in the server configuration (USD = US dollar). Thus we want this attribute to be equal to "USD". Equality is the default operator, so we do not need to provide an operator value for this constraint.
At the time of this writing, only gigenet cloud offers a cloud server with these specifications.
In addition to system benchmarks, we also continually collect networking benchmark metrics. These include both throughput and latency metrics within clouds, between clouds, and from clouds to consumer (i.e. residential Internet connections such as DSL and cable to various cloud services).
Suppose you are evaluating cloud services and decide to use GoGrids' cloud servers. Your business and customers are in California, so you opt to use GoGrid's US West data center. For added protection against a large scale failure, you decide to use an external storage service for backups (instead of GoGrid's own storage service). You've narrowed your storage choices down to either Amazon S3, Zetta or Google's Storage for Developers. You'd like to know which of these storage services will provide the fastest uplink throughput from your cloud servers at GoGrid in order to ensure that backups can be uploaded as quickly as possible. The getNetworkBenchmarkResults web service provides access to this sort of data. This web service uses the following parameters:
As you can see, the getNetworkBenchmarkResults web service support a complex array of parameters. For the purposes of this example, we'll only be using a few parameters:
The API documentation states that the results will be an array of hashes each with the following possible values:
Because we are testing throughput from one cloud service to another, the results only include value, serviceId, dataCenter, numTests, earliestTest and latestTest. In the proceeding examples we'll see when the other response values are used. The results for this example at the time of this writing are:
These results signify that AWS S3 US West region storage will generally provide the fastest uplink throughput from GoGrid US West cloud servers and may be the best service to use for backups (subject to other decision making criteria like price and support).
In the previous example we obtained network performance results based on our intercloud network testing. These tests are run periodically throughout the day to test throughput and latency between and within cloud services and Internet data centers. We also host a browser-based cloud speedtest to track throughput and latency between cloud services and primarily consumer-based high-speed Internet connections such as DSL and Cable. Users of the cloud speedtest select one or more cloud services to test, a test file size (1-5MB for download tests or 0.5-2.5MB for upload tests) and test to perform (uplink, downlink or latency). The speedtest then uploads/downloads the test file to/from the select cloud services and displays the latency and/or throughput results. We use MaxMind's GeoIP databases to track where the user is (city, state, country), the name of their ISP, and their connection speed using their IP address. This is a generally reliable method for obtaining this data with accuracy of about 99.8%. In addition to allowing Internet users to run this test for free, we also pay about 1000 users per month to run the test using Amazon's Mechanical Turk. All of these results are stored in our database and accessible through the getNetworkBenchmarkResults web service.
In this example, we want to find the CDN (Content Delivery Network) with the lowest throughput in Europe. We used the getRegionsweb service to discover that the region code for Europe is eu. CloudHarmony currently collects network benchmark metrics for about a dozen different CDNs. However, in this example, we've narrowed our CDN choices down to four: AWS CloudFront, MaxCDN, Edgecast or Akamai (resold by VPS.net). The request will be fairly simple, using the serviceId, testId, endpoint_region and metric parameters. The serviceId parameter supports multiple IDs each separated by a pipe character, so we will use that to specify the IDs of each of these 4 CDNs.
At the time of this writing, the results from these benchmarks were:
So in this example, the clear winner was Akamai by a margin of about 35%. However, latency is not bad for any of these CDNs.
In this example, we'll use the endpoint_isp and endpoint_state parameters to view performance of the Internap and AWS CloudFront CDNs in California, grouped by ISP. The endpoint_isp parameter can either be the name (or partial name) of an ISP such as Verizon, or a wildcard character * to indicate that all ISPs should be returned in the results. In this example, we'll use the wildcard option so the results are grouped by ISP. We will also use the minNumTests parameter so that only results with at least 5 tests completed are returned. The order=asc parameter is also used signifying that the slowest ISPs will show first in the results.
The response includes the average downlink throughput value, name of the isp, and the region identifier (us_west_pacific for all results in this example). Because more than 10 results are returned, we'll have to use the ws-offset=10 parameter to view the second page of results, ws-offset=20 for the third page and so on.
In this example, we'll determine which out of a handful of CDNs provides the best overall downlink throughput in the APAC region. We used the getRegions web service to discover that the region code for APAC is asia_apac. In this example, we'll evaluate Akamai, Edgecast, CloudFront, Microsoft Azure CDN and Limelight (resold by Rackspace Cloud).
So in this example it appears that Microsoft's Azure CDN service provides almost 20% better downlink throughput in APAC countries with almost 500 tests recorded.
In this example, we'll use the endpoint_city parameter to determine which of a handful of cloud service provides has best downlink throughput in New York City. We will evaluate the following cloud server providers: AWS EC2 (US East region), GoGrid (US East region), Storm on Demand, Speedyrails (Quebec, CA), VoxCLOUD (New York) and Rackspace Cloud Servers (Texas data center). Because we are dealing with multiple services and multiple data centers, the serviceId and dataCenter parameters need to corresponding with the IDs of all 4 services and and data center locations. The web service will ignore data centers that are not valid for a given service (i.e. only Speedyrails has a data center in Quebec and only Voxel has a data center in New York).
So, with a limited number of test results (less than 100 results should not be considered to be reliable), AWS EC2 US East, Speedyrails and VoxCLOUD New York appear to provide the fastest downlink throughput to New York City (primarily consumer) Internet connections.
For now, we are offering free access to these web services for up to 10 requests per rolling 24-hour period. After 10 requests, you will receive a 503: Service Unavailable http response. This is a beta service and usage and terms are subject to change. If you would like an increased quota or professional support, please contact us. We'd of course also appreciate feedback and bug reports (send to info [at] cloudharmony.com).
Great post. Here’s an article which compares the popular cloud database services - Caspio, Amazon, Database.com, and Microsoft SQL http://blog.caspio.com/web-database/comparing-cloud-database-services/