Tuesday, March 2, 2010

Cloud Server Performance Benchmarking

We are in the process of developing a benchmark suite to run in the cloud and use as a basis for comparing different IaaS (cloud server) vendors. There are lots of uses for cloud servers be it web, application or database servers; scientific computing; video encoding; etc. In establishing our benchmark suite we'd like to be as comprehensive as possible in order to provide decent coverage of most computational needs. Our current list of benchmarks includes the following:

All benchmarks will be run on similar CentOS 64-bit server instances. Are there any benchmarks you'd like to see that are not in the list? We'd appreciate any comments, suggestions or feedback.

Saturday, February 27, 2010

Cloud Storage Showdown Part 2 - What is the best storage for your cloud server?

In the previous post we discussed cloud storage for consumers. Probably a more common use case for cloud storage is to enable backups and/or extended storage from cloud servers and platforms. Over the past month, we have used our network of 25 global servers running in various public clouds to measure bandwidth throughput and latency to and from various cloud storage services including Microsoft's Azure Blob Storage, Amazon's Simple Storage Service (S3), SoftLayer's CloudLayer Storage, Nirvanix Storage Delivery Network, Rackspace Cloud Files, and Box.net. We conducted these bandwidth tests by reading and writing a 3MB test file (10MB for storage services in the same cloud such as EC2 to S3) to/from each storage services at random times twice daily. The results are separated by cloud server provider. The table displays all of the storage services tested within that cloud ordered by fastest downlink. We also tested storage in Microsoft's Azure platform.

The purpose of this test wasn't to measure the maximum throughput capacity between server and storage service, but rather to provide a comparison between different storage services. The 3MB test file is not sufficiently large for maximum capacity to be determined (for same cloud storage services the throughput will be more accurate because of the larger 10MB file size). Actual throughput for larger files will most likely be higher than the throughput calculations displayed here.

Bandwidth and storage pricing are other factors to consider when selecting a storage service. For example, Amazon does not charge for bandwidth to/from its S3 storage service and EC2 instances running in the same region. However, there are good reasons to use an external storage service for backups. If, for example, you use EC2 and store your backups using S3 in the same region, and that region happens to go down entirely for an extended period (an unlikely scenario of course), you will be without any means of recovering your data until the region is brought back online. For added fault tolerance, you may decide to keep backups in a separate Amazon S3 region (and pay the bandwidth costs) or even in a separate cloud like Microsoft's Azure.

UPDATE: The values displayed in the results tables for S3 EU West are incorrect. The bucket those values were calculated from was actually in the US East region. Updated values are displayed above the tables.

Tests were performed using a small Azure instance.

South Central US (TX)
North Central US (IL)
North Europe (Amsterdam)
Southeast Asia (Singapore)

Tests were performed using an m1.small instance in all regions.

US East Region
EU West Region
US West Region

All instances are Linode 360s

Newark, NJ
Atlanta, GA
Dallas, TX
Rackspace Cloud Files is run out of Dallas as well. Linode also provides GigE uplinks with all servers. This explains the very high downlink throughput result
London, UK
Fremont, CA

Our VoxCloud servers are the smallest 2GB model

New York
Singapore
Most likely Microsoft Azure and Voxel run out of data centers in very close proximity
Amsterdam, NL

GoGrid (CA, US)
We run a 512MB instance with GoGrid. Box.net tends to perform very well against US west coast servers.

Our rackspace node is also run on a 512MB instance. Throughput to and from Rackspace's own Cloud Files storage service was very good.



ReliaCloud (MN, US)



Dallas, TX
Auckland, NZ

Sunday, February 14, 2010

Cloud Storage Showdown Part 1 - Cloud to Consumer

This is a follow up to our previous post Cloud Speed Test Results where we analyzed the results from our custom cloud speedtest as they pertained to cloud servers and content delivery networks. In this post, we'll use the same test results to analyze the bandwidth performance of 4 cloud storage services including Amazon's Simple Storage Service (S3) (all 3 regions), Rackspace's Cloud Files, Nirvanix's Storage Delivery Network (SDN), and Microsoft's Azure Blog Storage. To date, about 1350 unique users in 115 countries have run the speedtest. This speedtest tracks the amount of time required to download a 1MB test file from these 6 cloud storage services, and records the transfer rate for each test run. We use MaxMind's geoip database to track where the user is running the test from (country + state/province for US/Canada). The order of the tests is always random and we enforce a time limit such that users with dial up or slow connections are excluded. We also limit individual users to running the speedtest only once per day. 90% of the speedtests were run by users on residential high-speed Internet connections, thus this post is focused on Cloud to Consumer bandwidth performance. In the next post (Part 2), we'll focus on Cloud to Cloud or Intracloud storage bandwidth performance.

As you'll see in the results below, consumer cloud storage performance is highly dependent on the consumer's geographical location. Although cloud vendors may tell you it isn't important where their services are run out of, this is not entirely true. In the previous post we separated the results into very general geographical regions (US and non-US). Since this post is more focused on consumer cloud storage usage, and because there is a large difference in bandwidth performance based on geographical origin, we have broken these results into 8 geographical regions.

A few admin notes regarding the speedtest results:
  • The results are based on download performance. Consumers will generally not achieve this same throughput for uploads due to ISP uplink caps
  • Nirvanix's SDN is unique in that it is geographically optimized like a content delivery network (users connect to the fastest/closest of 4 storage nodes). We used a 4-node SDN (US West, US East, Germany and Japan) in conducting the Nirvanix tests
  • Rackspace Cloud Files is not directly browser accessible and thus not compatible with our speedtest. The results shown below are based on performance for Rackspace Cloud Servers which to our knowledge are run out of the same data centers. Also, based on other bandwidth tests we have conducted, there is a very close correlation between Rackspace Cloud Files/Cloud Servers bandwidth performance
  • Results for Azure storage are based on Azure Platform performance. These results should be most closely correlated with the South Central US storage region. Now that Azure is out of CTP, we will be including the other regions in future tests: US North Central, North Europe and Southeast Asia
Global Results
The global results (all tests performed) show a fairly close spread between the 6 storage services with the exception of S3 US West which seems to be more geographically sensitive.
US Results
In the US, tests were run from all 50 states. S3 US East, Azure Storage and Rackspace Cloud Files were all pretty close in the top 3 spots. Surprisingly, Amazon's S3 region in Europe performed significantly better than the new S3 US West region in the US.
US West Results
US West (AK, AZ, CA, HI, ID, NM, NV, OR, UT, WA) is where Amazon's new S3 US West region really shined out performing other US services by 20-40%.
US Central Results
In US Central (CO, IA, IL, IN, KS, LA, MI, MN, MO, MS, ND, NE, OH, OK, SD, TX, WI, WY), Rackspace's Cloud Files service was the clear top performer followed by Azure (7% slower) and S3 US East (13% slower).
US East Results
In US East (AL, AR, CT, DC, DE, FL, GA, KY, MA, MD, ME, NC, NH, NJ, NY, PA, RI, SC, TN, VA, VT, WV), Amazon's S3 US East region was the top performer. Surprisingly, S3 EU West was not close behind. Azure followed at a distant third (15% slower).
Non US Results
Outside of the US is where Nirvanix's storage outperformed other services. Azure also performed well followed by S3 US East.
Europe Results
In Europe, Nirvanix's SDN did extremely well, out performing second place S3 EU West by 20%. Azure performed comparably with S3 EU West. Rackspace Cloud Files was a bit further behind. S3 US West did not perform well in Europe.
Asia Results
In Asia Azure and Nirvanix were about equal in performance. Rackspace Cloud Files and S3 US West followed at about 15% slower. Amazon will be deploying a new AWS Asia region later this year.
Summary
Bandwidth is but one factor for consumers to consider when choosing a cloud storage service. Most of these storage services are not natively mountable (i.e. you can't browse them using your operating system's file browser), so third party tools like JungleDisk (S3 or Rackspace Cloud Files only), CloudBerry (S3 or Nirvanix) or Windows Explorer Virtual Network Drive (S3, Azure, Nirvanix) are necessary in order to use them. The decision on which storage service to use really depends on a variety of factors including geography, bandwidth, third party tools available and bandwidth/storage costs.

Wednesday, February 10, 2010

Cloud Speed Test Results

Over the past 2 months we've used Amazon's Mechanical Turk to pay users with high speed, mostly residential Internet connections, to run our cloud speedtest. To date, about 1100 unique users in 115 different countries have run the speedtest. This speedtest tracks the amount of time required to download a 1MB test file from 26 different cloud services (servers and content delivery networks), and records the transfer rate for each test and service run. We use MaxMind's geoip database to track where the user is downloading from (country + state/province for US/Canada). The order of the tests is always random and we enforce a time limit such that users with dial up or slow connections are excluded. We also limited individual users to running the speedtest only once per day. To date, about 1300 tests have been performed. About 60% of the tests were performed from US-based connections.

The purpose of these tests is to compare bandwidth speed from various cloud services to high-speed Internet connections. Businesses can use this data to determine which cloud services will provide the best throughput for their customer base.

US Speedtests to Cloud Servers
We got a fairly diverse group of users to run the speedtest from US-based connections covering all 50 states. Voxel, IBM and EC2 (US East) were the top 3 in this category. IBM's service is still in beta and not intended for production use. The bottom 3 are located in Europe, explaining their slower bandwidth.
Non-US Speedtests to Cloud Servers
Most of the non-US speedtests were run by Canadian, European and Indian users. Linode's London data center performed very well in this category followed by EC2 (EU West) and Flexiscale.
Global Speedtests to Cloud Servers
This table shows the aggregate results from all speedtests (both US and non-US). The top 3 are the same as the US-based results, IBM, Voxel, and EC2 (US East).

US Speedtests to CDNs
The only major CDN we were unable to test is Akamai. This is because, unlike other CDNs, Akamai is still strongly opposed to a paygo model and the new realities of CDN pricing, and we are unable to commit to a $200/Mo, 50GB plan ($4GB!). Update: VPS.NET has announced they will be making Akamai CDN available on a zero commit paygo plan. We will add Akamai to the speedtest as soon as we are able to get setup with an account. More info is available here: http://bit.ly/98IQkC

As with our previous pingdom-based tests, CacheFly performed the best in all of our tests. Edgecast also performed very well. Edgecast is available with a zero-commit paygo plan through GoGrid through which it is very good from a price/features/performance perspective. Surprisingly, the #2 CDN (marketshare-wise) Limelight (via RackspaceCloud CDN) performed very poorly in all of our tests.
Non-US Speedtests to CDNs
Outside of the US, both CacheFly and Edgecast performed very well. Amazon's CloudFront came in 3rd, about 18% slower than CacheFly.
Global Speedtests to CDNs
Globally, CacheFly and Edgecast were neck-to-neck, followed by CloudFront at about 15% slower.
Summary
Bandwidth is certainly not the only consideration when choosing a cloud service vendor. However, it is one factor we believe businesses should consider in combination with quality of support, pricing, features and reliability. The combination of these factors should allow users to make a fairly objective decision about which vendor will provide the best overall service for their business.

On a side note, if you intend to use cloud services for internal-facing applications (e.g. virtual private clouds), bandwidth throughput to your business locations should be a very significant factor in your decision making process.

Our speedtest is available publicly in beta release at http://cloudharmony.com/speedtest

Friday, January 15, 2010

RE: BitSource - Rackspace Cloud Servers versus Amazon EC2: Performance Analysis

The BitSource was recently hired by Encoding.com to conduct a performance comparison between EC2 and Rackspace Cloud. The full details of their analysis are available here. Here is a summary of their findings:

On CPU Performance
"On average, Cloud Servers was more than twice as fast as Amazon EC2 at compiling the Linux kernel across all instance sizes."

On Disk I/O
"Disk I/O results show that Cloud Servers consistently have much better write and random write performance than EC2 across most sizes."

They used a combination of IOZone and Linux kernel compiling to conduct their analysis. Here our opinion on this (also commented at the bottom of their article):


Our Comments
We've also compared EC2 and Rackspace performance using geekbench, specjvm, hdparm, mysqlbench and unixbench with one example result set:

Rackspace 4GB Instance:
Geekbench: 2841
hdparm buffered disk reads: 165 MB/sec
SPECjvm Composite result: 51.99
mysqlbench: 1330 wallclock seconds
unixbench: 777

EC2 m1.large instance (w/ EBS local storage):
Geekbench: 3113
hdparm buffered disk reads: 65 MB/sec
SPECjvm Composite result: 36.51
mysqlbench: 1327 wallclock seconds
unixbench: 663

The disk I/O results are a bit better with rackspace based on low level measurements, but at a higher application level like mysqlbench they are very similar. CPU/memory IO performance is also a mixed bag with the EC2 instance performing better with geekbench and Rackspace performing better with unixbench and specjvm.

However, I think it is a bit of a stretch to put Rackspace cloud on the same playing field as EC2. EC2 is a much more mature platform with many many more features like multiple data centers, instance independent storage (ebs), auto scaling and monitoring, load balancing, vpc, and more. Rackspace cloud is basically just VPS with on-demand pricing. Of course they provide free support and an excellent CDN offering based on Limelight (much better than cloudfront), but their IaaS offering leaves a lot to be desired.

Also, on pricing, EC2 is much better on the high end, particular with reserve instances (i.e. an 8GB m1.large reserve instance is $0.17/hr over a 3 year period while an 8GB rackspace instance is 3x as expensive at $0.48/hr). We've also done some public bandwidth testing and all 3 EC2 regions provided generally faster downlink throughput than the Rackspace Cloud Dallas data center.

I think Rackspace is off to a good start with their Slicehost acquired cloud. My hat really goes off to their marketing team too.

Here are links to the geekbench tests we ran:

EC2 m1.large - score: 3113

4GB Rackspace Cloud Server - score 2841

Sunday, January 10, 2010

Cloud Speed Test


We are working on a cloud speed test. This speedtest functions by using flash to download a 1MB test file and upload 1/2 MB to and from various cloud services. It then measures the amount of time it takes to complete those operations and provides a transfer rate for each.

What we are hoping to accomplish with this is to both allow users the measure their bandwidth performance against various cloud services, and also to aggregate this data in order to provide a more accurate analysis of the bandwidth performance those cloud services provide. In the previous blog post we summarized bandwidth performance where we used pingdom over a period of a couple months as a means of measurement. The speedtest will provide us with a much larger and more diverse test population. The end result we are looking for is to be able to allow users to view overall, time-based and geographically targeted bandwidth performance measurements for public clouds and services.

The speedtest is still very much in beta form. It does not currently allow you to filter the cloud services you'd like to test. However, it is mostly functional and we'd appreciate anyone trying it out and providing feedback. If you allow the test to run all the way through, it will download about 40 MB and upload 10 MB to cloud services and provide you with transfer rates for each based on your Internet connection.


Saturday, January 9, 2010

Bandwidth Disparity in the Cloud

Over the past 2 months we've been using Pingdom to monitor downlink throughput for various cloud services. We placed a 5 MB test file in each of these cloud services and then configured Pingdom to pull that file every 15 minutes. Pingdom maintains 15 globally placed servers including 1 US West (Los Angeles), 2 US East (NY, VA), 1 US Central (Chicago), 5 US South and SW (Dallas, Houston, Atlanta), 1 Canada (Montreal), and 4 Europe (London, Stockholm, Frankfurt, Amsterdam). Every 15 minutes it pulls that test file from each of the public clouds using a round-robin monitor selection. Although, I agree this is not the best method for calculating downlink throughput, it was affordable and easy to setup and I believe it can at least provide some comparative value at a high level.

The results of these tests have show a good amount of bandwidth disparity between the different cloud providers.

Cloud Servers (IaaS):
The greatest disparity in these tests was between cloud server providers. We setup the smallest instance possible with each of the public clouds (i.e. EC2's m1.small, Linode 360, etc.), running Linux and Apache with a 5MB static test file. The results show average downlink speeds ranging from 4.45 Mb/s with Terremark's vCloud service to 23.42 Mb/s with Amazon EC2 US East (which is coincidentally higher than 3 of the CDNs we also tested).

Service/Rate (Mb/s)


Content Delivery Networks (CDN):
If you happen to not know what a CDN is, it is basically a service that allows you to delivery Internet content faster to your users that you could hosting it on your own servers. CDNs enable this by maintaining many "edge" servers located throughout the world, each with a a copy of your files. They then use some DNS magic such that when a user in the US requests a file it is pulled from a US server while a user in Europe gets it from a Europe-based server. The end result is that the user gets the file from the closest and/or fastest server to them. This also offloads a lot of static content bandwidth from your servers.

To test these CDNs, we setup accounts with each uploaded a 5MB test file to them for testing. The only exception to this is CacheFly which provides a 5MB test file for free and Akamai which we were able to find a 5MB test file on (thanks DNC).

The big surprise for us here was that Akamai, the "big dog" amongst CDNs was not the top performer. Other than that the remaining 3 in the top 4 we as we had expected, Akamai, Limelight and EdgeCast.

Service/Rate (Mb/s)


Cloud Platforms and Services (PaaS/SaaS):
Platforms and SaaS offer a higher level of cloud computing capabilities. Force.com for example, allows you to create databases and forms very quickly with their web-based management platform and Google Site's allows you to create a website within a few minutes. If your business software requirements can fit they "mold" of a PaaS or SaaS, they are oftentimes a great choice to facilitate a very quick time-to-market.

We setup accounts with each of these providers and posted a 5MB test file in each for testing. Force.com basically maintains it's own in-house CDN for static content delivery which explains it's significantly better performance.

Service/Rate (Mb/s)

Next Steps
We recognize that the biggest problem with this analysis is the limited scope of testing by only using Pingdom. We are developing a "Cloud Speedtest" that will pay to have run by hundreds of users with normal residential and commercial Internet connections. This speedtest will record both up and downlink throughput to various cloud services. We will aggregate that data with Pingdom, our own testing, and some other sources to produce a much better analysis of many cloud services.