Amazon EMR Serverless introduces Shuffle-optimized disks delivering improved performance for I/O intensive workloads
Share
Services
[Amazon EMR Serverless](https://aws.amazon.com/emr/serverless/) is a serverless option in Amazon EMR that makes it simple for data engineers and data scientists to run open-source big data analytics frameworks without configuring, managing, and scaling clusters or servers. An EMR Serverless application uses workers to execute workloads, allowing users to configure ephemeral storage per worker based on the workload's needs. Today, we are excited to introduce Shuffle-optimized disks on Amazon EMR Serverless, offering increased storage capacity (up to 2TB) and higher IOPS delivering better performance for I/O-intensive Spark and Hive workloads.
Shuffle is a fundamental step in an Apache Spark or Apache Hive job, involving I/O intensive operations that redistributes or reorganizes data for parallel computations during operations like joins, aggregations, or transformations. Complex workloads with large datasets to shuffle require sufficient disk capacity and I/O performance for optimized shuffle processing. Shuffle-optimized disks offer up to 2TB of storage capacity and higher baseline IOPS, enabling you to efficiently run shuffle-heavy and I/O-intensive Spark and Hive workloads.
Shuffle-optimized disks are generally available on EMR release versions 7.1.0 in all AWS [Regions](https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/endpoints-quotas.html) where EMR Serverless is available, excluding AWS GovCloud (US) and China regions. For more information on Shuffle-optimized disks, visit the EMR Serverless [User Guide](https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/jobs-shuffle-optimized-disks.html). For pricing info on Shuffle-optimized disks, visit the [EMR Serverless pricing page](https://aws.amazon.com/emr/pricing/).
What else is happening at Amazon Web Services?
Amazon AppStream 2.0 users can now save their user preferences between streaming sessions
December 13th, 2024
Services
Share
AWS Elemental MediaConnect Gateway now supports source-specific multicast
December 13th, 2024
Services
Share
Amazon EC2 instances support bandwidth configurations for VPC and EBS
December 13th, 2024
Services
Share
AWS announces new AWS Direct Connect location in Osaka, Japan
December 13th, 2024
Services
Share
Amazon DynamoDB announces support for FIPS 140-3 interface VPC and Streams endpoints
December 13th, 2024
Services
Share