Over the years we've spent a good amount of time testing and thinking about how to compare cloud services. Some services, like content delivery networks (CDN), managed DNS and object storage are relatively easy because they have few deployment options and similar features between providers.
Comparing cloud compute or servers is a different story entirely. Because of the diverse deployment options and dissimilar features of different services, formulating relevant and fair comparisons is challenging to say the least. In fact, we've come to the conclusion that there is no perfect way to do it. This isn't to say that you can't - but if you do, or if you are handed a third party comparison to look over, there are some things you should keep in mind - and watch out for (we've seen some poorly constructed comparisons).
The purpose of this post is to highlight some of these considerations. To do so, I'll present actual comparisons from testing we've done recently on Amazon EC2, DigitalOcean, Google Cloud Platform, Microsoft Azure and SoftLayer. I won't declare any grand triumphs of one service over another (this is something we try to avoid anyway). Instead, I'll just present the numbers as we observed them with some cursory commentary, and let you draw your own conclusions. I'll also discuss value and some non-performance related factors you might want to consider.
This post is essentially a summary of a report we've published in tandem. The report covers the same topics, but in more detail. You can download the report for free.
Apples and Oranges
Before you start running tests, you first need to pick some compute instances from different services you are considering. Ideally your choices will be based on your actual requirements and workloads.
Some comparisons I've seen get off to a bad start here - picking instances that are essentially apples and oranges. For example, I still see studies that compare older Amazon EC2 m1 instances (which have been replaced with newer classes) to the latest and greatest of other services. I've also seen comparisons where instances have dissimilar CPU cores (CPU benchmark metrics are often proportional to cores). Watch out for these types of studies, because often the conclusions they come to are inaccurate and irrelevant.
Before comparing compute services, you should have an idea about the type of workloads you'd like to compare. Different workloads have different performance characteristics. For our comparisons, we based testing on two workloads - web and database servers.
For the web server workload, our focus was CPU, disk read and external network performance. Because web servers usually don't store mission critical data, we picked instances with generally faster local drives (SSD where possible). We looked for instances with a ratio of 1-2X CPU cores-to-memory (e.g. 1 core/2GB memory) - sufficient for many web server workloads.
For the database server workload, our primary focus was CPU, disk read and write, memory and internal network performance. Because database servers are typically a critical component in an application stack, we chose off instance storage instead of local because of its better resilience. If a host system that fails, off instance storage volumes can usually be quickly restored on a new compute instance. We looked for instances with a ratio of 2-4X CPU cores-to-memory for this workload.
We picked three instance sizes for each workload - small, medium and large. In order to provide an apples-to-apples comparison, we chose from current instance classes with each service and matched the number of CPU cores precisely.
Based on the criteria above - the tables below show the instances we picked for each service and workload. More details on our reasoning for these selections are covered in the report.
CPU cores-to-memory ratios are often where it is nearly impossible to match services exactly. Our primary consideration was to match CPU cores - since this is what affects CPU benchmark metrics the most.
Web Server Comparisons
On July 1 2014, Amazon announced T2 instances with burstable CPU capacity. This instance class offers 1 to 2 CPU cores and provides bursting using on a predictable, credit based method. CPU bursting is nothing new, but the T2 implementation with a predictable, credit based burst model is, and offers good value for workloads that fall within its 10-20% bursting allowance. Because workloads with temporary bursting are common, we have included the t2.medium instance in the small web server CPU performance and value analysis (in addition to the c3.large instance type).
|Compute Service||Small Web Server||Medium Web Server||Large Web Server|
|Amazon EC2||c3.large + t2.medium||c3.xlarge||c3.2xlarge|
|DigitalOcean||4 GB / 2 Cores||8 GB / 4 Cores||16 GB / 8 Cores|
|Google Compute Engine||n1-highcpu-2||n1-highcpu-4||n1-highcpu-8|
|Microsoft Azure||Medium (A2)||Large (A3)||Extra Large (A4)|
|Rackspace||Performance 1 2GB||Performance 1 4GB||Performance 1 8GB|
|SoftLayer||2 GB / 2 Cores||4 GB / 4 Cores||8 GB / 8 Cores|
Database Server Comparisons
|Compute Service||Small Web Server||Medium Web Server||Large Web Server|
|DigitalOcean||8 GB / 4 Cores||16 GB / 8 Cores||48 GB / 16 Cores|
|Google Compute Engine||n1-standard-4||n1-standard-8||n1-standard-16|
|Microsoft Azure||Large (A3)||Extra Large (A4)||A91|
|Rackspace||Performance 2 15GB||Performance 2 30GB||Performance 2 60GB|
|SoftLayer||8 GB / 4 Cores||16 GB / 8 Cores||32 GB / 16 Cores|
1 The A9 instance is part of Azure's new compute intensive instance class. It is based on new hardware and a higher CPU cores-to-memory ratio (7X) than the other Azure instances included in the comparisons.
Once you've picked services and compute instances to compare, and workload to focus on - the next step is to choose benchmarks that are relevant to those workloads. This is another area where I've seen some bad comparisons - often arriving at broad conclusions based on simplistic or irrelevant benchmarks.
The benchmarks we chose are SPEC CPU 2006 for CPU performance, fio for disk IO, STREAM for memory, and our own benchmarks (based on standard Linux tools) for internal and external network testing. More details about these benchmarks, and the runtime configurations we used are provided in the report.
The graphs below show some of the results from our testing. Complete results are available in the full report.
With the exception of Microsoft Azure (and occasionally SoftLayer), all of the services we tested use current generation Sandy or Ivy bridge processors resulting in similar CPU metrics.
SPEC CPU 2006 is an industry standard benchmark for measuring CPU performance using 29 different CPU test workloads. A higher number in these graphs represents better CPU performance. SPEC CPU 2006 has two components - integer and floating point. Only integer results are included because floating point results were similar. Complete results are available in the report.
Due to SPEC rules governing test repeatability (which cannot be guaranteed in virtualized, multi-tenant cloud environments), the results below should be taken as estimates.
In the web server tests Amazon EC2 edged out other services - likely due to slightly faster hardware - 2.8 GHz Ivy Bridge. This was particularly apparent for the t2.medium instance in peak/burst operating mode in the small web server comparisons. The report lists all CPU architectures we observed for each service. Azure predictably performed poorer due to its use of older AMD 4171 processors.
In the database server tests Rackspace Performance 2 instances were slightly faster except for the Large Database Server where Azure's new A9 compute intensive instance performed best.
One factor many comparisons overlook is performance variability. Cloud services are usually virtualized and multi-tenant and these factors can introduce performance variability due to sharing of resources.
To capture CPU performance variability, we ran SPEC CPU multiple times on multiple instances from each service and measured the relative standard deviation from the resulting metrics. In these graphs a higher value represents greater variability.
For web server tests CPU variability was highest for SoftLayer and Rackspace. For SoftLayer the cause of this was their use different CPU models, some older and some newer (we observed 6 different architectures on SoftLayer from 2009 X3470 to 2013 Ivy Bridge). For Rackspace and DigitalOcean the CPU architecture was consistent, so the reason may be resource contention and/or use of hyperthreading and floating cores.
For database server tests SoftLayer variability was again the highest due to changing CPU types across instances. Rackspace Performance 2 instances were less variable than Performance 1 instances.
Our disk performance tests were conducted using 6 block sizes (4k, 16k, 32k, 64k, 128k and 1024k) and asynchronous IO using an optimal queue depth for each service, workload and block size. For the web server comparisons our testing used 100% read workloads. For database server comparisons we used mixed read + write workloads (80% write and 20% read). More details about the fio test settings are provided in the report.
At the time of testing, Google, Microsoft and SoftLayer did not offer a local SSD storage (Google recently announced external SSD storage). Because of this, web server disk performance for these services was notably slower compared to that of services with a local SSD storage option.
In these graphs, a higher value signifies faster performance. Our fio test results include 12 sets of test per instance (6 block sizes with both random and sequential IO). For brevity purposes, I've provided a sampling of these results in this post. The complete results are available in the report.
For the large web server random read tests, Amazon EC2 and Rackspace performed best. Although DigitalOcean also uses local SSD storage, its performance was consistently slower that other local SSD based services.
For the database server disk tests we used Amazon EC2 standard and EBS optimized instances with General Purpose (SSD) and provisioned IOPS (PIOPS) EBS volumes. For Rackspace we used both SSD and SATA off instance storage. DigitalOcean was excluded from the database server tests because they do not have an off instance storage option. For the small database server the Amazon EBS PIOPS volumes were provisioned for 1500 IOPS, and the Rackspace database servers used SATA storage volumes.
For the medium database server the Amazon EBS PIOPS volumes were provisioned for 3000 IOPS. The Rackspace database servers used SSD storage volumes.
For the large database server the Amazon EBS PIOPS volumes were provisioned for 4000 IOPS. The Rackspace database servers used SATA storage volumes.
Disk IO is a performance characteristic where variability can be very high for cloud environments. This is often due to the same physical disk drives being shared by multiple users. For off instance storage variability can be magnified due to network effects.
To measure disk performance consistency, we captured relative standard deviation of IOPS from many test iterations on multiple instances. In these graphs a higher value signifies greater variability. These graphs provide a subset of the analysis included in the report.
For the web server tests, DigitalOcean consistently demonstrated the highest variability of all services.
For all three database server test iterations Amazon EBS PIOPS volumes were the least variable, and Rackspace SATA volumes had the highest variability.
The medium Rackspace database servers used SSD external volumes which was less variable than SATA volumes. IO for Amazon EBS PIOPS volumes was very consistent. Microsoft Azure consistency improved with this instance type - perhaps because it is a larger type.
Rackspace SATA external volumes demonstrated high variability in the large database server tests.
Memory performance is usually dependent on CPU architecture, with newer CPU models usually having faster memory. Unlike CPU and disk, memory is often not shared and thus provides consistent performance.
We ran memory benchmarks using the STREAM benchmark. STREAM uses multiple memory tests to measure memory bandwidth. In these graphs a higher value signifies faster memory bandwidth.
Memory performance results were similar for all database sizes with the exception of SoftLayer (due to use of multiple CPU types), so I've only included results for the medium and large database instances. Amazon EC2 and Google were consistently the fastest for memory tests.
Microsoft Azure A9 instance performed well in the large database server group due to its use of newer Intel E5-2670 processors (as opposed to old AMD 4171 processors used on other Azure instances).
External Network Performance
Web servers typically respond to requests from users over the Internet. The speed at which web servers respond is dependent on the distance between the user and web server, and the Internet connectivity provided by the service.
Measuring external network performance is complex due to a large number of variables (there are tens of thousands of ISPs and infinite routing possibilities). We host a cloud speedtest that lets users run network tests to measure their connectivity to different cloud services. Using a Geo IP database, we summarize the results of these tests regionally. These summaries are the basis for the data provided in the graphs below. In these graphs we provide mean latency between cloud services and users located in different geographical regions. A lower value signifies better latency - a shorter logical path between users and the service.
Compute services often allow users to provision instances in data centers located in different regions. These results are based on the most optimal service region for each and continent.
|Region||Amazon EC2||DigitalOcean||Google Compute Engine||Microsoft Azure||Rackspace|
|North America||us-east-1||NY2||us-central1||South Central US||ORD|
|South America||sa-east-1||NY2||us-central1||South Central US||DFW|
SoftLayer is excluded from these tests because at the time of writing, we did not have it available on the cloud speedtest.
Network connectivity in Asia and South America is generally slower than North America and Europe. Because of this, every service performed worse in those regions.
Internal Network Performance
Database servers often interact with other servers located in the same data center over the internal network. For example, a web server might query a database server to display dynamic content on a website. Because of this, we chose to measure internal network performance for the database workload. To do so, we used a custom test that uses ping and http to measure latency and throughput performance within the internal network of each service.
The data below is based on interaction between each of the three types of web and database server instances - small, medium and large. To maximize performance, we provisioned instances using the following optimal network settings for each service:
- Amazon EC2: VPC + placement group
- DigitalOcean: Private IP addresses in NY2 region
- Google: Use of the most recent asia-east1 data center
- Microsoft Azure: Affinity group and virtual private network
- Rackspace: Private IP addresses
- SoftLayer: Use of fastest network option - 1 Gb/s
For latency tests a lower value signifies better performance, while for throughput tests a higher value is better.
Use of Amazon EC2 placement groups and VPC provided significantly higher throughput compared to other services - nearly 9 Gb/s. SoftLayer and DigitalOcean networks are likely capped at 1 Gb/s.
For some services, uplink and downlink network throughput performance was asymmetrical. Rackspace for example appears to cap the uplink (outbound traffic) but not the downlink (inbound traffic) - even on the internal network interface. The caps listed on the Rackspace website range between 200 Mb/s to 1,600 Mb/s for Performance 1 compute instances.
Network latency for all services was less than 0.7ms. Amazon EC2 and SoftLayer had the best latency at between 0.10 and 0.15ms.
When picking a service - you may want to look for the best combination of performance and price - or in other words, the best value.
In this section, I estimate value for each service using CPU performance and hourly costs for each service and instance type. Because services sometimes offer different prices based term commitments, I included the following term options (where applicable):
- On Demand: hourly with no commitment - the most common purchasing method
- 1 Month: purchase with a 1 month commitment
- 1 Year: purchasing with a 1 year commitment
- 3 Years: purchasing with a 3 year commitment
Value was carried over for services not offering a particular term option. All of the prices are hourly normalized. If a term option has a setup fee, that fee was added to the hourly cost by dividing it by the total number of hours in the term.
The benchmark metric used for this value analysis is SPEC CPU 2006. The value metric provided is a ratio of SPEC CPU 2006 to the hourly cost for each service and instance type. More detail and data is available in the report.
In peak/burst operating mode the t2.medium provides the best value by a significant margin in the small web server category. Keep in mind that this level of performance is achievable for between 10-20% of the total operational time. The the baseline (non-burst) value is substantially lower (about 5X).
Google's new monthly discounts offer a good value without requiring any upfront setup fees (if you keep the instance live for a month you automatically get a discount). DigitalOcean provides good value for on-demand instances. Amazon EC2 is generally the best value for 1 or 3 year terms. Microsoft Azure value is low due to use of older CPU hardware resulting in poor performance on CPU benchmarks.
DigitalOcean value for database server instances was best for on-demand pricing, while Amazon EC2 was best with for 1 and 3 year terms.
Although Microsoft Azure performed well in the large database server comparisons, its value was low due to its higher price tag (partially attributable to its 7X CPU cores-to-memory ratio).
When choosing a compute service, there are other things you should consider unrelated to performance or price. Depending on your workloads, these may be of greater or lesser importance.
Service reliability is likely important to any organization. Quantifying reliability is difficult because there are many possible points of failure. We maintain compute instances with most compute services and use them to track outages and measure availability. Although maintaining a single compute instance with each service isn't a full proof method for measuring availability, we do manage to capture many service related outages this way. The table below shows our availability data for each service over the past 30 days. These metrics are based on mean availability for each service across all service regions. Our cloud status page provides realtime status and lets you view availability and outages over different periods.
|Service||30 Day Availability||Outages||Downtime|
|Google Compute Engine||99.9862%||10||17.85 mins|
|Microsoft Azure||99.9922%||23||25.5 mins|
Most compute services impose limits on the number of instances you can have active at a given time. These limits may affect you if you experience rapid growth, or have elastic workloads with high usage requirements during peak times. Although you can typically request an increase to quotas, the default limits can convey the scale at which a service operates. Services operating at a larger scale may be better able to support rapid growth and elastic workloads.
Each service typically has a different procedure for obtaining quote increases. Our experience with these has been mixed across services. With Amazon and Google, for example, we have often obtained increases within hours, while with some others responses have slower and capacity more limited.
These tables list both the quota policies and how they would affect provisioning of compute instances of different sizes.
|Amazon EC2||20 instances per region for most compute instance types|
|DigitalOcean||10 compute instances (Droplets)|
|Google Compute Engine||24 CPU cores per region|
|Microsoft Azure||20 CPU cores|
|Rackspace||128 GB memory|
|SoftLayer||20 compute instances per day, additional instances reviewed manually for approval|
|Instance Size||Amazon EC2||DigitalOcean||Google Compute Engine||Microsoft Azure||Rackspace||SoftLayer|
|2 GB / 1 Core||160 (20 per region)||10||72 (24 per region)||20||64||20 daily1|
|4 GB / 2 Core||160 (20 per region)||10||32 (12 per region)||10||32||20 daily|
|8 GB / 4 Core||160 (20 per region)||10||18 (6 per region)||5||16||20 daily|
|16 GB / 8 Core||160 (20 per region)||10||9 (3 per region)||2||8||20 daily|
|32 GB / 16 Core||160 (20 per region)||10||3 (1 per region)||1||4||20 daily|
1Certain SoftLayer regions have more cloud capacity than others. We have experienced provisioning requests in some regions being cancelled due to lack of capacity, while in others they are successful.
Compute services often offer different storage options. These options may include:
- Local Volumes: Generally faster than off instance because they are directly connected to the host. They are often less resilient, however, because if the host system fails, the data is lost until it can be restored.
- Off Instance Volumes: More resilient and fault tolerant than local storage - if a host system fails, off instance volumes can often be quickly restored on a new instance.
- Drive Types: Sometimes services offer different storage tiers based on drive type - SSD for better performance, and SATA for larger capacity.
- Multiple Volumes: Does the service let you mount multiple storage volumes to a single compute instance, or is the disk capacity fixed? Usually local storage based services have a fixed drive capacity based on instance size.
- Snapshots: Does the service let you take snapshots or backups of disk volumes?
- Provisioned IO: Does the service let you provision a dedicated amount of IO? Provisioned IO provides improved performance consistency.
This table shows support of each of these storage options by each service in this post:
|Compute Service||Local||External||Drive Types||Multiple Volumes||Snapshots||Provisioned IO|
|Amazon EC2||Yes||Yes||SATA; SSD||Yes||Yes||Yes|
|Google Compute Engine||Yes (beta)||Yes||Unknown1||Yes||Yes||No|
1Google recently announced preview of SSD based off instance storage
Another consideration for compute services is the networking capabilities. The following are a few such considerations:
- IPv6: Does the service support IPv6?
- Multiple IPv4: Does the service let you assign multiple IPv4 addresses to a single compute instance?
- Private IP: Does the service provide private/internal IP addressing wherein compute instances can communicate without crossing public links and without incurring network usage charges?
- Load Balancing: Does the service allow you to load balance multiple servers?
- Health Checks: When Load Balancing is used - can the instances be monitored and automatically taken out of circulation if they become unresponsive?
This table shows support of each of these networking capabilities by each service in this post:
|Compute Service||IPv6 Support||Multiple IPv4||Private IP||Load Balancing||Health Checks|
|DigitalOcean||No||No||Partial (3 of 6 regions)||No||No|
|Google Compute Engine||No||No||Yes||Yes||Yes|
1 IPv6 supported when used with elastic load balancer (ELB) only
2 Must request with support ticket with Rackspace and provide valid justification
Data Center Locations
If your users are dispersed globally - or located primarily in a specific geographical region, you may want to consider where a services' data centers are located. The table below lists the number of data centers for each service in 6 continents.
|Compute Service||North America||South America||Europe||Asia||Australia|
|Google Compute Engine||1||0||1||1||0|
Security is another concern many have with the cloud. A services' security capabilities may be an important consideration. Although operating systems usually support security features and software based firewalls, it is often better to deal with security separately, outside of the operating system entirely. Some common security features include:
- Firewall: Does the service provide an external firewall to filter network traffic before it reaches your compute instance? When used, a compute instance's public IP address sits on the firewall, which in turn forwards permitted network traffic to the instance.
- VPN: Does the service support secure connectivity between compute instances and an external network? VPN allows you to connect to your public cloud compute instances securely using private IP addressing.
- VPC: Does the service support physically or logically isolated networks? This means your compute instances can communicate without the ability of other users to snoop on your traffic.
- PCI DSS Compliance: Has the service been audited and certified by the payment card industry? This is often a requirement if you intend to capture and/or store credit card numbers on your servers.
This table lists support for these security features by each compute service in this post:
|Compute Service||Firewall||VPN||VPC||PCI DSS|
|Google Compute Engine||Yes||No||Yes||No|
1 Requires additional subscription to Brocade Vyatta vRouter starting at $160 per month for each compute instance
If you are using a cloud compute service, you likely use other types of cloud services. When evaluating compute services you should consider the provider's ability to fulfill other hosting requirements you may have. Some other types of cloud services include:
- Object Storage: Lets you read and write files on a common platform concurrently accessible by multiple clients.
- Content Delivery Network (CDN): A distributed network of servers you can use to more efficiently host content. When users request content from a CDN, they are routed to the closest server which reduces delivery time and offloads traffic from your servers.
- Managed DNS: Converts textual hostnames to numeric IP addresses. Managed DNS services host global networks of servers to respond to queries more efficiently.
- Database-as-a-Service (DbaaS): Most application stacks include a database server. Some services provide managed database hosting that deals with many common complexities - setup, configuration, replication, backups, patches, upgrades, etc.
- Platform-as-a-Service (Paas): PaaS is a full stack for deploying an application including web servers, database, and other supporting software.
This table lists support for each of these other cloud service types by each provider included in this post:
|Compute Service||Object Storage||CDN||DNS||DbaaS||PaaS|
1 Rackspace resells Akamai CDN using a subset of 219 Akamai POPs
2 SoftLayer resells Edgecast CDN
Comparing compute services is a challenging task. I've covered a lot of ground in this post including how to properly choose instances from different services, picking relevant benchmarks, some actual comparisons of services, estimating value, and other considerations. The biggest take away I'd hope for is a better understanding about how to compare compute services accurately, and identify comparisons that are of questionable quality.
It should also be noted that because compute services are frequently updated, the validity of the benchmark metrics in this post are time limited.
If you'd to know more, the full report download contains 120 pages of graphs, tables and additional commentary.
Correction 7/16/2014: Post and report have been updated to reflect availability of local storage, reduced pricing, and a compute region in South America for Microsoft Azure.