NOTE: This post has been updated after it’s original writing. The original CPU performance metrics did not accurately depict performance on multi-core servers. The updated post utilizes an improved method of calculating CPU performance that applies more weight to multi-core aware benchmarks (see benchmarks description below for more info).
Over the past couple of months we’ve spent some time benchmarking about 150 different cloud server configurations with 20 different vendors. This included all 8 AWS EC2 instances types (m1.small – m2.4xlarge) in all 4 regions (32 servers total for EC2). The benchmark suite we ran includes about 100 different benchmarks from synthetic benchmarks measuring raw CPU performance such as Unixbench and Geekbench to higher level application benchmarks such as mysql-bench, pgbench, tpcc-mysql and blog bench. This post will be the first in a series highlighting the results of these benchmarks. In it, we’ll focus purely on raw CPU performance. Future posts will focus on other aspects of performance such as disk IO and application specific performance metrics.
We believe choosing a cloud provider should be based on a variety of factors including performance, price, support, reliability/uptime, scalability, network performance and features. We’ve previously written a few posts regarding network performance are continue to compile network performance and uptime statistics for most of the major cloud providers. With all of hype surrounding the cloud, our goal, is to provide objective information and analysis to enable educated decisions pertaining the adoption of, and migration to cloud services.
All benchmarked cloud servers were configured almost identically in terms of OS and software, CentOS 5.4 64-bit (or 32-bit in the case of EC2 m1.small and c1.medium and IBM’s Development Cloud where 64-bit is not supported).
Most IaaS/server clouds are based on hypervisor/virtualization technology and running in multi-tenant environments (multiple virtual servers running on a single physical host). Different hypervisors support different methods of CPU allocation/sharing including fixed/weighted, burstable, and others. Because of this, it is difficult to compare CPU performance in different clouds. Vendors often use different terminology to define cloud server CPUs including ECU (EC2), VPU (vCloud), GHz (KVM), CPUs, Cores, and more. Many provide an approximation of how that terminology relates to physical resources (e.g. 1 ECU = 1.0-1.2 GHz 2007 Xeon), but this is generally not sufficient for an objective comparison of providers.
Amazon’s EC2 in addition to being one of the oldest and most mature cloud server platforms, also provides clearly defined CPU tiers across its 8 different instance sizes. These are defined in terms of ECUs (EC2 Compute Unit) where 1 ECU is the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor. Their instance sizes includes the following:
- Small/m1.small (32-bit) = 1 ECU
- Large/m1.large = 4 ECUs
- High-CPU Medium/c1.medium (32-bit) = 5 ECUs
- High-Memory Extra Large/m2.xlarge = 6.5 ECUs
- Extra Large/m1.xlarge = 8 ECUs
- High-Memory Double Extra Large/m2.2xlarge = 13 ECUs
- High-CPU Extra Large/c1.xlarge = 20 ECUs
- High-Memory Quadruple Extra Large/m2.4xlarge = 26 ECUs
With a few exceptions, most of our CPU benchmarks showed clear upward scaling (although not always proportional) from m1.small (1 ECU) up to m2.4xlarge (26 ECUs). Because of these factors, we feel that the ECU metric provides a good, standard, understandable metric for comparing cloud servers not only within EC2, but also within other IaaS clouds as well. However, although it is based on the ECU, there will be some subtle differences (as described below), so we will refer this new metric as a CCU (CloudHarmony Compute Unit)
To calculate the CCU metric we selected 19 CPU benchmarks that showed clear upward scaling on smaller to larger sized EC2 instances. We use the average of the highest scores in all 4 EC2 regions (generally from the m2.4xlarge 26 ECU instance) to produce a 100% baseline for each of these 19 benchmarks. Each instance is then assigned a relative score as a ratio of that instance’s score to the highest average score (<= 100). The relative scores for all 19 benchmarks are then aggregated to produce a CPU comparison score (CCS) for each instance. In calculating the CCS, some benchmarks are weighted higher than others. For example, the Geekbench and Unixbench results are weighted 200 points for the baseline, while opstone and john-the-ripper are weighted 33 points each (the remaining benchmarks are all weighted 100). We then use these results to create a CCU evaluation table where the left column in the table is the # of CCUs, and the right column is the average CCS corresponding with that CCU value. This table was then populated with 5 rows, one for EC2 instance sizes m1.small, m1.large, m2.xlarge, m2.2xlarge and m2.4xlarge, using an average of the CCS for those instances in all 4 regions. Once the comparison table was populated, we use the same algorithm to compute CCS values for every cloud server benchmarked. To translate from CCS to CCU, we determine the closest matching row(s) to the CCS for a given cloud server using the left column, and then compute an CCU value using the right column (if the CCS falls between 2 rows, a proportional average CCU value is calculated).
Example CCU Evaluation Table
Example CCU Calculation
Eval_CCS = 1524
Enter table from top and find first row where Eval_CCS >= Column_2 (Row 2)
Since Eval_CCS resides between rows 1 and 2, find the proportional midpoint between ECUs:
CCU = 13 + ((1524 – 1405)/(1574 – 1405))*(26 – 13)
CCU = 13 + 9.15
CCU = 22.15
During EC2 benchmarking we observed the CPU architecture reported for each instance type:
- Small (m1.small) – US East Region Only: AMD Opteron 2218 2.6 GHz
- Small, Large, Extra Large (m1.small, m1.large, m1.xlarge): Xeon E5430 “Harpertown” 2.66 GHz
- High-CPU Medium and Extra Large (c1.medium, c1.xlarge): Xeon 5410 “Harpertown” 2.33 GHz
- High-Memory Extra Large, 2 Extra Large, 4 Extra Large (m2.xlarge, m2.2xlarge, m2.4xlarge): Xeon X5550 “Nehalem” 2.66 GHz
We also noted the following EC2 ECU discrepancies with most of the 19 CPU performance benchmarks performed:
- The m1.large (4 ECU) always outperformed the High-CPU c1.medium (5 ECU) instance. This might be attributed to the m1.large being 64-bit vs 32-bit for the c1.medium
- Even the lowest High-Memory instance (m2.xlarge – 6.5 ECU) out performs the larger m1.large (8 ECU) and c1.xlarge (20 ECU) instances in many cases. This is most likely due to the newer and faster “Nehalem” CPUs used by the High-Memory instances
- The performance increase between m2.4xlarge (13 ECU) and m2.2xlarge (26 ECU) was minimal (15-20% higher based on CCU)
Because of these discrepancies, we used only the m1.small, m1.large, m2.xlarge, m2.2xlarge and m2.4xlarge instance sizes to create the CCU comparison table used to calculate CCUs for other cloud servers.
The following are the 19 benchmarks we use to compute the CCU comparison metrics (benchmarks prefixed with ** are multi-core aware):
- **c-ray [weight=100]: This is a test of C-Ray, a simple raytracer designed to test the floating-point CPU performance. This test is multi-threaded (16 threads per core), will shoot 8 rays per pixel for anti-aliasing, and will generate a 1600 x 1200 image.
- **crafty [weight=100]: Crafty is a popular open-source chess engine that can be used to benchmark your CPU speed and is part of SPEC2000 benchmark. The benchmark itself is very basic. It analyzes pre-determined chess games positions and calculates the number of “nodes” (moves) per second till certain “depth” is reached and displays the total NPS as well as the average NPS.
- dcraw [weight=100]: This test times how long it takes to convert several high-resolution RAW NEF image files to PPM image format using dcraw.
- espeak [weight=100]: This test times how long it takes the eSpeak speech synthesizer to read Project Gutenbergs The Outline of Science and output to a WAV file.
- **geekbench [weight=200]: Geekbench provides a comprehensive set of benchmarks engineered to quickly and accurately measure processor and memory performance. Designed to make benchmarks easy to run and easy to understand, Geekbench takes the guesswork out of producing robust and reliable benchmark results.
- **graphics-magick [weight=100]: This is a test of GraphicsMagick with its OpenMP implementation that performs various imaging tests to stress the systems CPU.
- **hmmer [weight=100]: This test searches through the Pfam database of profile hidden markov models. The search finds the domain structure of Drosophila Sevenless protein.
- john-the-ripper-blowfish [weight=33]: This is a benchmark of John The Ripper, which is a password cracker.
- john-the-ripper-des [weight=33]: This is a benchmark of John The Ripper, which is a password cracker.
- john-the-ripper-md5 [weight=33]: This is a benchmark of John The Ripper, which is a password cracker.
- mafft [weight=100]: This test performs an alignment of 100 pyruvate decarboxylase sequences.
- nero2d [weight=100]: This is a test of Nero2D, which is a two-dimensional TM/TE solver for Open FMM. Open FMM is a free collection of electromagnetic software for scattering at very large objects. This test profile times how long it takes to solve one of the included 2D examples.
- **openssl [weight=100]: This test measures the RSA 4096-bit performance of OpenSSL.
- opstone-svd [weight=33]: CPU Singular Value Decomposition test.
- opstone-svsp [weight=33]: CPU Sparse-Vector Scalar Product test.
- opstone-vsp [weight=33]: CPU Vector Scalar test
- sudokut [weight=100]: This is a test of Sudokut, which is a Sudoku puzzle solver written in Tcl. This test measures how long it takes to solve 100 Sudoku puzzles.
- tscp [weight=100]: CPU performance benchmark based on TSCP (Tom Kerrigan’s Simple Chess Program).
- **unixbench [weight=200]: UnixBench provides a basic indicator of the performance of a Unix-like system. Multiple tests are used to test various aspects of the system’s performance. These test results are then compared to the scores from a baseline system to produce an index value, which is generally easier to handle than the raw scores. The entire set of index values is then combined to make an overall index for the system. The parallel results are used when multiple CPUs exist for a cloud server.
We credit the Phoronix Test Suite for making it easier to run many of these benchmarks.
The following results are broken down by cloud server vendor. If the vendor utilizes multiple data centers, multiple tables are displayed one for each data center. A total of 140 different cloud server configurations are included in this post. Each table shows our server identifier, the CPU architecture our benchmark server was placed on, the amount of memory for the server, raw Geekbench results linked to the full results page, raw Unixbench results linked to the full results page, and finally, the CCU score for that server instance.
EC2 is one of the oldest, most widely used, and mature cloud server platforms. EC2 currently supports 4 regions (US East, US West, EU West and APAC) and 10 availability zones. Each region consists of 2 or more availability zones each of which is basically a different physical data center in close proximity to the other availability zone in that region. EC2 uses the EC2 Compute Unit (ECU) term to describe CPU resources for each instance size where one ECU provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.
Although the CCU metric is based on EC2’s ECU, the comparison table used to compute CCUs is based on only 5 instances sizes (m1.small, m1.large, m2.xlarge, m2.2xlarge and m2.4xlarge) and an average scores from all 4 regions. Because of this, the EC2 instance CCU will not be precisely equal to it’s ECU allocation.
EC2 offers multiple pricing options including straight hourly, reserve (upfront reserve fee in exchange for lower hourly), and spot (bid pricing). The pricing shown in these table is for straight hourly pricing.
|Amazon Web Services (AWS) [US East]|
|m2.4xlarge [26 ECUs]||Xeon X5550||68.4||2.4/hr||5877||1511||27.25|
|m2.2xlarge [13 ECUs]||Xeon X5550||34.2||2.4/hr||5163||1332||14.89|
|linux.c1.xlarge [20 ECUs]||Xeon E5410||7||0.68/hr||5118||780||8.78|
|m2.xlarge [6.5 ECUs]||Xeon X5550||17.1||0.5/hr||4049||932.1||7.05|
|m1.xlarge [8 ECUs]||Xeon E5430||15||0.68/hr||4256||938.6||5.15|
|m1.large [4 ECUs]||Xeon E5430||7.5||0.34/hr||3092||663.4||4.08|
|c1.medium [5 ECUs]||Xeon E5410||1.7||0.17/hr||2635||776.2||3.43|
|m1.small [1 ECU]||Opteron 2218||1.7||0.085/hr||1726||179.7||0.92|
|Amazon Web Services (AWS) [US West]|
|m2.4xlarge [26 ECUs]||Xeon X5550||68.4||2.68/hr||6109||1520||27.45|
|m2.2xlarge [13 ECUs]||Xeon X5550||34.2||1.34/hr||5475||1329.2||15.9|
|c1.xlarge [20 ECUs]||Xeon E5410||7||0.76/hr||4693||785||8.21|
|m2.xlarge [6.5 ECUs]||Xeon X5550||17.1||0.57/hr||3883||945.4||6.85|
|m1.xlarge [8 ECUs]||Xeon E5430||15||0.76/hr||4185||916.8||5.14|
|m1.large [4 ECUs]||Xeon E5430||7.5||0.38/hr||3026||643.4||3.95|
|c1.medium [5 ECUs]||Xeon E5410||1.7||0.19/hr||2962||715.5||3.45|
|m1.small [1 ECU]||Xeon E5430||1.7||0.095/hr||1312||277.2||1.04|
|Amazon Web Services (AWS) [EU West]|
|m2.4xlarge [26 ECUs]||Xeon X5550||68.4||2.68/hr||6188||1489.7||27.45|
|m2.2xlarge [13 ECUs]||Xeon X5550||34.2||1.34/hr||5368||1337.6||15.19|
|c1.xlarge [20 ECUs]||Xeon E5410||7||0.76/hr||4926||787||8.55|
|m2.xlarge [6.5 ECUs]||Xeon X5550||17.1||0.57/hr||3836||945.9||6.79|
|m1.xlarge [8 ECUs]||Xeon E5430||15||0.76/hr||4147||934.7||5.14|
|m1.large [4 ECUs]||Xeon E5430||7.5||0.38/hr||3160||644||4.12|
|c1.medium [5 ECUs]||Xeon E5410||1.7||0.19/hr||2710||730.5||3.43|
|m1.small [1 ECU]||Xeon E5430||1.7||0.095/hr||1288||277.3||1.02|
|Amazon Web Services (AWS) [APAC]|
|m2.4xlarge [26 ECUs]||Xeon X5550||68.4||2.68/hr||3472||1465.8||12.46|
|m2.2xlarge [13 ECUs]||Xeon X5550||34.2||1.34/hr||3357||1228.4||9.62|
|m2.xlarge [6.5 ECUs]||Xeon X5550||17.1||0.57/hr||3469||821.2||6.27|
|c1.xlarge [20 ECUs]||Xeon E5410||7||0.76/hr||3042||906.1||5.66|
|m1.xlarge [8 ECUs]||Xeon E5430||15||0.76/hr||3271||867.4||4.65|
|m1.large [4 ECUs]||Xeon E5430||7.5||0.38/hr||2594||598.9||3.92|
|c1.medium [5 ECUs]||Xeon E5410||1.7||0.19/hr||2796||415.8||3.23|
|m1.small [1 ECU]||Xeon E5430||1.7||0.095/hr||1522||183.5||1.02|
Rackspace Cloud servers showed a very flat CPU performance variation between server instance sizes. All instances we benchmarked in both data centers utilized homogenous Opteron 2374 “Shanghai” 2.2 GHz hardware. Rackspace states that their CPU provides a minimum allocation based on instance size with bursting allowed for all instance sizes.
We tested servers in both their Dallas as well as the newer Chicago data centers. However, Rackspace does not allow users to chose which data center to deploy servers to. Their use of multiple data centers appears to deal more with capacity issues rather than to offer user choice. When you create an account, that account is assigned to a specific data center, and from that point forward you will only have the option to deploy to that assigned data center.
|Rackspace Cloud [Dallas]|
|Rackspace Cloud [Chicago]|
Storm on Demand was launched a few months ago by Liquid Web. They provide 2 types of cloud servers. The first is traditional 2-48GB cloud server running on multi-tenant hosts. The second, called “Bare Metal”, allows you to select specific dedicated hardware (CPU, SATA or SAS disks, memory) to deploy your server on. Bare Metal servers are still virtualized, but do not share the underlying hardware with any other server instances.
Of all the IaaS vendors we reviewed, Storm offers by far the most diverse heterogenous infrastructure. This approach pays off big in terms of performance with 10 servers that scored 20+ CCUs. The only mismatched hardware we discovered was the cloud 16GB server running on Opteron 2350 hardware which performed poorly compared with the small 2, 4 and 8 GB servers. However, Storm has informed us that their 16GB Opteron 2350 hardware is being upgraded to Opteron 2378 which should improve performance on future benchmarks.
Storm’s 48GB cloud server was the top performer out of all of our benchmarked servers with 42.5 CCUs and a Geekbench score of 13020! This is most likely due to the very new and extremely fast Xeon X5650 “Westmere” hardware it runs on. The Intel i5 CPUs also performed very well with our CPU benchmarks and provide an excellent performance to price ratio (26.5 CCUs for $0.171/hr)!
|Storm Cloud [MI, US]|
|Cloud: 48gb||Xeon X5650||45.9||1.37/hr||13020||4448.5||42.87|
|Bare Metal: x3440-8gb||Xeon X3440||8||0.274/hr||7597||2760.1||27.39|
|Bare Metal: e5506x2-4gb||Xeon E5506||8||0.274/hr||7742||2906.7||27.24|
|Bare Metal: e5506x2-4gb||Xeon E5506||8||0.322/hr||7704||2880.1||27.13|
|Bare Metal: e5506x2-8gb||Xeon E5506||8||0.391/hr||7753||2837.5||26.72|
|Bare Metal: i5-750-2gb||Core i5 750||2||0.171/hr||6417||2464.8||26.6|
|Bare Metal: i5-750-4gb||Core i5 750||4||0.206/hr||6413||2450.1||26.47|
|Bare Metal: e5506x2-8gb||Xeon E5506||8||0.48/hr||7686||2811.5||26.51|
|Cloud: 32gb||Opteron 2378||30.4||0.69/hr||8326||2373.3||26.4|
|Cloud: 8gb||Xeon X3440||7||0.27/hr||6162||2284.7||21.41|
|Cloud: 4gb||Core i5 750||3.5||0.14/hr||4555||1465.1||9.33|
|Bare Metal: amd2350x2-32gb||Opteron 2350||32||0.713/hr||6531||1803.6||7.31|
|Cloud: 2gb||Core 2 Q9400||1.7||0.07/hr||3062||629.7||5.01|
|Cloud: 16gb||Opteron 2350||15.2||0.34/hr||4034||1214||4.73|
GoGrid’s servers showed a decent linear performance increase with larger sized instances. The largest 8GB instance was deployed on an E5450 2.99 GHz host (compared with the E5520 2.27 GHz hosts for the other instances) which performed significantly better than the smaller sized instances.
|GoGrid [CA, US]|
Voxel maintains 3 cloud data centers in the US (New York), EU (Amsterdam), and Asia (Singapore). All of our cloud servers were deployed on homogenous harware: Xeon L5520 “Nehalem” 2.26 GHz. They appear to use more of a fixed CPU allocation because there was a notable increase in performance on larger instance sizes.
|Voxel [NY, US]|
NewServers is fairly unique in that their “bare metal” cloud servers actually run on physical hosts. There is no hypervisor layer between the server and the underlying hardware. When you deploy a server, the OS image is written directly to the physical disk(s). Their “Fast” server performed very well and is one of the better values at $0.53/hr for 26.41 CCUs.
|NewServers [FL, US]|
While Linode doesn’t market itself as a “cloud”, we included it in the benchmarks because they are a good and very popular service and provide many of the features common to the cloud including auto-provisioning, disk imaging. All instance sizes we benchmarked deployed on homogenous Xeon L5520 2.26 GHz hardware. The servers show a very flat CPU performance variation between instance sizes. We only benchmarked servers in their Atlanta data center. Linode also maintains data centers in Fremont CA, Dallas TX, Newark NJ, and London UK.
|Linode VPS Hosting [Atlanta]|
All our benchmark servers deployed on to either Xeon X3460 2.8 GHz or E5520 2.27 GHz hardware. SoftLayer is another provider where CPU performance was very flat. Disk I/O was also painfully slow, but that is a topic for another post.
Terremark is one of the first VMWare vCloud providers. Deployment of servers in vCloud allows the user to select both desired memory and VPUs (the vCloud term for CPUs). All of our benchmark servers deployed on homogenous Opteron 8389 2.91 GHz hardware. CPU performance varied from benchmark to benchmark. Unixbench’s parallel benchmark did show a notable increase in performance from 1 to 8 VPUs. However, other benchmarks did not show much increase leading to a flat CCU metric across all instance sizes and VPU combinations we benchmarked.
|Terremark vCloud Express [FL, US]|
OpSource is another VMWare based cloud. OpSource allows you to configure cloud severs with 1-4 “CPUs”. CPU performance was very flat between instances of varying sizes even showing a decrease in performance from smaller to larger sized instances. All servers deployed to identical Xeon X7460 2.66 GHz hardware.
|OpSource Cloud [VA, US]|
Speedyrails is a VPS provider based out of Quebec Canada. All benchmark servers deployed on homogenous hardware: Xeon E5520 2.27 GHz.
|Speedyrails [QC, CA]|
Zerigo is a VPS and Cloud Server vendor based out of Denver, CO. All benchmark servers deployed on identical hardware, Opteron 2374 2.20 GHz.
|Zerigo [CO, US]|
While there was a notable increase in performance with larger sized instances, the overall CPU performance was not great. Hardware was also homogenous between different instance sizes.
|ReliaCloud Cloud Services [MN, US]|
IBM’s Development & Test Cloud is a free cloud service intended for only development and testing. Only 3 32-bit instance sizes are supported: small, medium and large. Instances are time limited to about 1 week with the option to extend. The large instance performed very well overall.
|IBM Development & Test Cloud [NY, US]|
BlueLock is another VMWare vCloud provider. As with the other VMWare providers, they appear to use a homogenous hardware environment for all instance sizes. However, unlike most other homogenous platforms, BlueLock’s instances showed a notable increase in performance on larger sized (more CPUs) instances. Strangely, the 4CPU/8GB instance outperformed the 8CPU/16GB instance.
|BlueLock [IN, US]|
Cloud Central is a new cloud server provider based out of Australia. All benchmark instances deployed to homogenous AMD Opteron 2.20 GHz hosts. Prices shown are Australian Dollar.
|Cloud Central [AU]|
RimuHosting is a VPS provider based out of New Zealand. They maintain data centers in Australia, New Zealand, London, and Texas. We benchmarked 2GB instances in their Auckland NZ and Dallas TX data centers.
|RimuHosting [TX, US]|
ElasticHosts is a UK based cloud provider. They currently maintain 2 data centers in the UK and a new data center in Dallas, TX. We only benchmarked their London Peer1 data center. ElasticHosts runs on the Linux KVM hypervisor. This hypervisor is unique in that it allows you to selected a specific MHz metric when deploying cloud servers. We benchmarked 2 GHz – 20 GHz cloud servers. Although the hardware environment appears to be homogenous, the benchmarks showed a clear increase in performance on larger sized instances.
Flexiscale is a UK based cloud server provider that has been around for a few years. They were recently acquired and renamed to Flexiant. They are currently in beta release of their Flexiscale 2.0 cloud server platform. These results were from testing 2.0 platform servers. Flexiant adopted a point-based subscription model for purchasing cloud servers. See their website for more details.
This is our first attempt at defining a standard CPU performance metric for comparing servers in multiple clouds. We acknowledge that it is not perfect and hope to make improvements over time. Please comment on this post if you have any suggestions on how we might improve our methods. We intend to continually run these benchmarks (every couple of months) to improve the quality and quantity of our data available as well as to check for upgrades and improvements made by the providers.
One take-away point we observed is that heterogenous hardware environments (where host hardware is configured with faster CPUs for larger sized instances) appears to be more conducive to true cloud CPU scaling. Of the 20 server clouds we benchmarked, only 4 appear to be providing such an environment: EC2, Storm on Demand, GoGrid and NewServers.