Friday, October 7, 2022
HomeBig DataAmazon EMR on EKS will get as much as 19% efficiency enhance...

Amazon EMR on EKS will get as much as 19% efficiency enhance operating on AWS Graviton3 Processors vs. Graviton2

Amazon EMR on EKS is a deployment possibility that lets you run Spark workloads on Amazon Elastic Kubernetes Service (Amazon EKS) simply. It lets you innovate quicker with the most recent Apache Spark on Kubernetes structure whereas benefiting from the performance-optimized Spark runtime powered by Amazon EMR. This deployment possibility elects Amazon EKS as its underlying compute to orchestrate containerized Spark functions with higher worth efficiency.

AWS regularly innovates to supply selection and higher price-performance for our clients, and the third-generation Graviton processor is the following step within the journey. Amazon EMR on EKS now helps Amazon Elastic Compute Cloud (Amazon EC2) C7g—the most recent AWS Graviton3 occasion household. On a single EKS cluster, we measured EMR runtime for Apache Spark efficiency by evaluating C7g with C6g households throughout chosen occasion sizes of 4XL, 8XL and 12XL. We’re excited to watch a most 19% efficiency achieve over the sixth technology C6g Graviton2 situations, which results in a 15% price discount.

On this publish, we focus on the efficiency take a look at outcomes that we noticed whereas operating the identical EMR Spark runtime on completely different Graviton-based EC2 occasion varieties.

For some use instances, such because the benchmark take a look at, operating an information pipeline that requires a mixture of CPU varieties for the granular-level price effectivity, or migrating an current utility from Intel to Graviton-based situations, we often spin up completely different clusters that host separate kinds of processors, reminiscent of x86_64 vs. arm64. Nonetheless, Amazon EMR on EKS has made it simpler. On this publish, we additionally present steerage on operating Spark with a number of CPU architectures in a typical EKS cluster, in order that we are able to save vital effort and time on establishing a separate cluster to isolate the workloads.

Infrastructure innovation

AWS Graviton3 is the most recent technology of AWS-designed Arm-based processors, and C7g is the primary Graviton3 occasion in AWS. The C household is designed for compute-intensive workloads, together with batch processing, distributed analytics, information transformations, log evaluation, and extra. Moreover, C7g situations are the primary within the cloud to function DDR5 reminiscence, which offers 50% greater reminiscence bandwidth in comparison with DDR4 reminiscence, to allow high-speed entry to information in reminiscence. All these improvements are well-suited for giant information workloads, particularly the in-memory processing framework Apache Spark.

The next desk summarizes the technical specs for the examined occasion varieties:

Occasion Title vCPUs Reminiscence (GiB) EBS-Optimized Bandwidth (Gbps) Community Bandwidth (Gbps) On-Demand Hourly Price
c6g.4xlarge 16 32 4.75 As much as 10 $0.544
c7g.4xlarge 16 32 As much as 10 As much as 15 $0.58
c6g.8xlarge 32 64 9 12 $1.088
c7g.8xlarge 32 64 10 15 $1.16
c6g.12xlarge 48 96 13.5 20 $1.632
c7g.12xlarge 48 96 15 22.5 $1.74

These situations are all constructed on AWS Nitro System, a group of AWS-designed {hardware} and software program improvements. The Nitro System offloads the CPU virtualization, storage, and networking capabilities to devoted {hardware} and software program, delivering efficiency that’s almost indistinguishable from naked steel. Particularly, C7g situations have included help for Elastic Material Adapter (EFA), which turns into the usual on this occasion household. It permits our functions to speak immediately with community interface playing cards offering decrease and extra constant latency. Moreover, these are all Amazon EBS-optimized situations, and C7g offers greater devoted bandwidth for EBS volumes, which may end up in higher I/O efficiency contributing to faster learn/write operations in Spark.

Efficiency take a look at outcomes

To quantify efficiency, we ran TPC-DS benchmark queries for Spark with a 3TB scale. These queries are derived from TPC-DS commonplace SQL scripts, and the take a look at outcomes usually are not corresponding to different revealed TPC-DS benchmark outcomes. Aside from the benchmark requirements, a single Amazon EMR 6.6 Spark runtime (suitable with Apache Spark model 3.2.0) was used as the information processing engine throughout six completely different managed node teams on an EKS cluster: C6g_4, C7g_4,C6g_8, C7g_8, C6g_12, C7g_12. These teams are named after occasion sort to tell apart the underlying compute assets. Every group can robotically scale between 1 and 30 nodes inside its corresponding occasion sort. Architecting the EKS cluster in such a means, we are able to run and examine our experiments in parallel, every of which is hosted in a single node group, i.e., an remoted compute surroundings on a typical EKS cluster. It additionally makes it potential to run an utility with a number of CPU architectures on the one cluster. Try the pattern EKS cluster configuration and benchmark job examples for extra particulars.

We measure the Graviton efficiency and price enhancements utilizing two calculations: whole question runtime and geometric imply of the overall runtime. The next desk reveals the outcomes for equal sized C6g and C7g situations and the identical Spark configurations.

Benchmark Attributes 12 XL 8 XL 4 XL
Process parallelism (spark.executor.core*spark.executor.situations) 188 cores (4*47) 188 cores (4*47) 188 cores (4*47)
spark.executor.reminiscence 6 GB 6 GB 6 GB
Variety of EC2 situations 5 7 16
EBS quantity 4 * 128 GB io1 disk 4 * 128 GB io1 disk 4 * 128 GB io1 disk
Provisioned IOPS per quantity 6400 6400 6400
Complete question runtime on C6g (sec) 2099 2098 2042
Complete question runtime on C7g (sec) 1728 1738 1660
Complete run time enchancment with C7g 18% 17% 19%
Geometric imply question time on C6g (sec) 9.74 9.88 9.77
Geometric imply question time on C7g (sec) 8.40 8.32 8.08
Geometric imply enchancment with C7g 13.8% 15.8% 17.3%
EMR on EKS reminiscence utilization price on C6g (per run) $0.28 $0.28 $0.28
EMR on EKS vCPU utilization price on C6g (per run) $1.26 $1.25 $1.24
Complete price per benchmark run on C6g (EC2 + EKS cluster + EMR worth) $6.36 $6.02 $6.52
EMR on EKS reminiscence utilization price on C7g (per run) $0.23 $0.23 $0.22
EMR on EKS vCPU utilization price on C7g (per run) $1.04 $1.03 $0.99
Complete price per benchmark run on C7g (EC2 + EKS cluster + EMR worth) $5.49 $5.23 $5.54
Estimated price discount with C7g 13.7% 13.2% 15%

The whole variety of cores and reminiscence are equivalent throughout all benchmarked situations, and 4 provisioned IOPS SSD disks have been hooked up to every EBS-optimized occasion for the optimum disk I/O efficiency. To permit for comparability, these configurations have been deliberately chosen to match with settings in different EMR on EKS benchmarks. Try the earlier benchmark weblog publish Amazon EMR on Amazon EKS offers as much as 61% decrease prices and as much as 68% efficiency enchancment for Spark workloads for C5 situations based mostly on x86_64 Intel CPU.

The desk signifies C7g situations have constant efficiency enchancment in comparison with equal C6g Graviton2 situations. Our take a look at outcomes confirmed 17–19% enchancment in whole question runtime for chosen occasion sizes, and 13.8–17.3% enchancment in geometric imply. On price, we noticed 13.2–15% price discount on C7g efficiency checks in comparison with C6g whereas operating the 104 TPC-DS benchmark queries.

Information shuffle in a Spark workload

Usually, massive information frameworks schedule computation duties for various nodes in parallel to attain optimum efficiency. To proceed with its computation, a node will need to have the outcomes of computations from upstream. This requires shifting intermediate information from a number of servers to the nodes the place information is required, which is termed as shuffling information. In lots of Spark workloads, information shuffle is an inevitable operation, so it performs an vital function in efficiency assessments. This operation might contain a excessive price of disk I/O, community information transmission, and will burn a major quantity of CPU cycles.

In case your workload is I/O certain or bottlenecked by present information shuffle efficiency, one advice is to benchmark on improved {hardware}. General, C7g presents higher EBS and community bandwidth in comparison with equal C6g occasion varieties, which can enable you optimize efficiency. Subsequently, in the identical benchmark take a look at, we captured the next further info, which is damaged down into per-instance-type community/IO enhancements.


Primarily based on the TPC-DS question take a look at end result, this graph illustrates the share will increase of knowledge shuffle operations in 4 classes: most disk learn and write, and most community obtained and transmitted. Compared to c6g situations, the disk learn efficiency improved between 25–45%, whereas the disk write efficiency enhance was 34–47%. On the community throughput comparability, we noticed a rise of 21–36%.

Run an Amazon EMR on EKS job with a number of CPU architectures

In case you’re evaluating migrating to Graviton situations for Amazon EMR on EKS workloads, we suggest testing the Spark workloads based mostly in your real-world use instances. If that you must run workloads throughout a number of processor architectures, for instance take a look at the efficiency for Intel and Arm CPUs, comply with the walkthrough on this part to get began with some concrete concepts.

Construct a single multi-arch Docker picture

To construct a single multi-arch Docker picture (x86_64 and arm64), full the next steps:

  1. Get the Docker Buildx CLI extension.Docker Buildx is a CLI plugin that extends the Docker command to help the multi-architecture function. Improve to the most recent Docker desktop or manually obtain the CLI binary. For extra particulars, try Working with Buildx.
  2. Validate the model after the set up:
  3. Create a brand new builder that offers entry to the brand new multi-architecture options (you solely should carry out this process as soon as):
    docker buildx create --name mybuilder --use

  4. Log in to your personal Amazon ECR registry:
    ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output textual content)
    aws ecr get-login-password --region $AWS_REGION | docker login --username AWS --password-stdin $ECR_URL

  5. Get the EMR Spark base picture from AWS:
    docker pull $SRC_ECR_URL/spark/emr-6.6.0:newest
    aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin $SRC_ECR_URL

  6. Construct and push a customized Docker picture.

On this case, we construct a single Spark benchmark utility docker picture on high of Amazon EMR 6.6. It helps each Intel and Arm processor architectures:

  • linux/amd64 – x86_64 (also called AMD64 or Intel 64)
  • linux/arm64 – Arm
docker buildx construct 
--platform linux/amd64,linux/arm64 
-t $ECR_URL/eks-spark-benchmark:emr6.6 
-f docker/benchmark-util/Dockerfile 
--build-arg SPARK_BASE_IMAGE=$SRC_ECR_URL/spark/emr-6.6.0:newest 
--push .

Submit Amazon EMR on EKS jobs with and with out Graviton

For our first instance, we submit a benchmark job to the Graviton3 node group that spins up c7g.4xlarge situations.

The next is just not an entire script. Try the full model of the instance on GitHub.

aws emr-containers start-job-run 
--virtual-cluster-id $VIRTUAL_CLUSTER_ID 
--name emr66-c7-4xl 
--execution-role-arn $EMR_ROLE_ARN 
--release-label emr-6.6.0-latest 
--job-driver '{
    "sparkSubmitJobDriver": {
    "entryPoint": "native:///usr/lib/spark/examples/jars/eks-spark-benchmark-assembly-1.0.jar",
    "sparkSubmitParameters": "........"}}' 
--configuration-overrides '{
"applicationConfiguration": [{
    "classification": "spark-defaults",
    "properties": {
        "spark.kubernetes.container.image": "'$ECR_URL'/eks-spark-benchmark:emr6.6",
        "": “C7g_4”

Within the following instance, we run the identical job on non-Graviton C5 situations with Intel 64 CPU. The full model of the script is offered on GitHub.

aws emr-containers start-job-run 
--virtual-cluster-id $VIRTUAL_CLUSTER_ID 
--name emr66-c5-4xl 
--execution-role-arn $EMR_ROLE_ARN 
--release-label emr-6.6.0-latest 
--job-driver '{
    "sparkSubmitJobDriver": {
    "entryPoint": "native:///usr/lib/spark/examples/jars/eks-spark-benchmark-assembly-1.0.jar",
    "sparkSubmitParameters": "........"}}'     
--configuration-overrides '{
"applicationConfiguration": [{
    "classification": "spark-defaults",
    "properties": {
        "spark.kubernetes.container.image": "'$ECR_URL'/eks-spark-benchmark:emr6.6",
        "”: “C5_4”


In Might 2022, the Graviton3 occasion household was made accessible to Amazon EMR on EKS. After operating the performance-optimized EMR Spark runtime on the chosen newest Arm-based Graviton3 situations, we noticed as much as 19% efficiency enhance and as much as 15% price financial savings in comparison with C6g Graviton2 situations. As a result of Amazon EMR on EKS presents 100% API compatibility with open-source Apache Spark, you’ll be able to shortly step into the analysis course of with no utility modifications.

In case you’re questioning how a lot efficiency achieve you’ll be able to obtain along with your use case, check out the benchmark resolution or the EMR on EKS Workshop. You too can contact your AWS Options Architects, who will be of help alongside your innovation journey.

Concerning the writer

blankMelody Yang is a Senior Large Information Resolution Architect for Amazon EMR at AWS. She is an skilled analytics chief working with AWS clients to supply finest apply steerage and technical recommendation in an effort to help their success in information transformation. Her areas of pursuits are open-source frameworks and automation, information engineering and DataOps.



Please enter your comment!
Please enter your name here

Most Popular

Recent Comments