AWS Glue 4.0 now supports Streaming ETL
Share
Services
[AWS Glue](/glue/) now supports Streaming ETL in version 4.0, a new version of AWS Glue that accelerates data integration workloads in AWS. AWS Glue 4.0 upgrades data integration engines, including an upgrade to [Apache Spark 3.3.0](https://spark.apache.org/releases/spark-release-3-3-0.html) and to [Python 3.10](https://docs.python.org/3/whatsnew/3.10.html).
AWS Glue streaming ETL jobs continuously consume data from streaming sources, clean and transform the data in-flight, and make it available for analysis in seconds. This release includes an optimized [state-management store](https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#state-store) to build efficient streaming solutions across micro-batches. This makes it easier to remove duplicates in a stream and to perform stream-based aggregations. You can also add a new column that indicates when a corresponding record was received by the stream for better data observability. This version also supports IAM authentication for Amazon Managed Streaming for Apache Kafka Serverless.
AWS Glue 4.0 Streaming ETL is now available in the same [AWS regions](/about-aws/global-infrastructure/regional-product-services/) as AWS Glue, except for China and GovCloud.
To learn more, read about Streaming ETL jobs in our [documentation](https://docs.aws.amazon.com/glue/latest/dg/add-job-streaming.html).
What else is happening at Amazon Web Services?
Read update
Services
Share
Read update
Services
Share
Read update
Services
Share
Read update
Services
Share
Read update
Services
Share