Today Amazon introduced the Hadoop framework as a service called Elastic MapReduce. This adds a new layer or dimension to the definition of cloud computing: Framework as a Service (FaaS). Using Amazon Elastic MapReduce (EMR) one can process large data sets on clusters of servers. The value proposition of EMR is to allow business, researchers, analysts and developers to easily process large amounts of data without worrying about the infrastructure setup (for example security settings when interacting between EC2 nodes among others), tuning and management of the map reduce framework. EMR is based on the popular open source Apache Hadoop map reduce framework.
If you already own an AWS account then it takes a few steps in AWS Console to get started and running with EMR. See the exhibit below for a high level component overview of how EMR works.

In your implementation of the Hadoop map reduce framework, the core components are the implementations of a mapper by realizing Hadoop’s “Mapper” interface and a reducer by realizing Hadoop’s “Reducer” interface. These two components should reside in Amazon’s Simple Storage Service (S3) as shown in exhibit above. The data set that is meant to be analyzed should be made available (for now) on S3 which is specified as input in AWS Console and the output from the aggregator (or reducer) is specified as output (can be on S3) in the console. After this setup, all that is left is to select the number of instances (EC2 nodes) that you want your map reduce job to be executed and stop the instances when the job is done. EMR also exposes other interfaces such as a Ruby based command line interface and a Web Services API to manage the map reduce jobs.
The software inventory used by EMR is as follows:
1. Debian GNU/Linux v 5.0
2. Hadoop v0.18.3 (with patches)
3. Java v1.6
4. Perl v5.10
5. Ruby v1.8
6. Python v2.5
7. PHP v5.2
8. R v2.7
The introduction of Amazon’s Elastic Map Reduce (EMR) would have an impact on the sales of Cloudera, who offer services such as installation, training and configuration of Hadoop on Amazon EC2 infrastructure.

The pricing of EMR is shown in the table above (image courtesy: Amazon). However, note the pricing is based on per instance hour, meaning if you have a job using 100 instances for 20 minutes, you will be charged for 100 instance hours. More details on pricing is available here.