Amazon Athena expands Apache Hudi support
Share
Services
[Amazon Athena](/athena/) has updated its integration with Apache Hudi to support new features and the latest 0.8.0 community release. Hudi is an open-source data management framework used to simplify incremental data processing in S3 data lakes. The updated integration enables you to use Athena to query Hudi 0.8.0 tables managed via Amazon EMR, Apache Spark, Apache Hive or other compatible services and includes new support for snapshot queries and reading bootstrapped tables.
Apache Hudi provides record-level data processing that can help you simplify development of Change Data Capture (CDC) pipelines, comply with GDPR-driven updates and deletes, and better manage streaming data from sensors or devices that require data insertion and event updates. The 0.8.0 release makes it even easier for you to migrate large Parquet tables to Hudi without copying data so you can query and analyze them via Athena. Furthermore, with Athena’s new support for snapshot queries, you can now have near real-time views of your streaming table updates.
To learn more about Athena's integration with Hudi, see [Using Athena to Query Apache Hudi Dataset](https://docs.aws.amazon.com/athena/latest/ug/querying-hudi.html) and the [Querying an Apache Hudi Dataset with Amazon Athena](https://aws.amazon.com/blogs/big-data/part-1-query-an-apache-hudi-dataset-in-an-amazon-s3-data-lake-with-amazon-athena-part-1-read-optimized-queries/) blog series.
What else is happening at Amazon Web Services?
Amazon Bedrock Model Evaluation now supports evaluating custom models
about 20 hours ago
Services
Share
Read update
Services
Share
AWS CodePipeline introduces new getting started experience
about 23 hours ago
Services
Share
Amazon Connect now supports using your customer’s initial chat message to personalize the customer experience
about 23 hours ago
Services
Share