AWS Glue Crawlers now supports Apache Hudi Tables
Share
Services
[AWS Glue Crawlers](https://docs.aws.amazon.com/glue/latest/dg/catalog-and-crawler.html) now supports Apache Hudi tables, allowing customers to query data in Apache Hudi tables directly from AWS analytics services like [Amazon Athena](https://aws.amazon.com/athena/). Apache Hudi is an open-source table format that brings database and data warehouse capabilities to the data lake. Apache Hudi helps data engineers manage continuously evolving data sets while maintaining query performance.
To query data from Apache Hudi tables, previously Amazon Athena users had to manually create a table within the [Glue Data Catalog](https://docs.aws.amazon.com/glue/latest/dg/catalog-and-crawler.html) and update partition changes to ensure the query results were current. With today’s launch, users can automatically register Apache Hudi tables into the Glue Catalog by running the Glue Crawler. Glue Crawler supports partitioned and non-partitioned Copy on write (CoW) and Merge on read (MoR) Hudi tables. Users can then query Glue Catalog Hudi tables across various [analytics services](https://aws.amazon.com/big-data/datalakes-and-analytics/) and apply Lake Formation fine-grained permissions. With Glue Crawlers, users can also migrate data from other Hudi Catalogs to the Glue Catalog.
To get started, users will need to create, run, or schedule a Glue Crawler, and provide one or more Amazon S3 paths to Hudi tables. With each run, Glue Crawler will extract schema, partition information, and update the Glue Catalog with the schema, partition changes and the latest Hudi metadata file location.
AWS Glue Crawler’s support for Hudi tables is available in all commercial regions where AWS Glue is available; see the [AWS Region Table](https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/). To learn more, visit the AWS Glue Crawler [documentation](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-crawling.html).
What else is happening at Amazon Web Services?
Amazon AppStream 2.0 users can now save their user preferences between streaming sessions
December 13th, 2024
Services
Share
AWS Elemental MediaConnect Gateway now supports source-specific multicast
December 13th, 2024
Services
Share
Amazon EC2 instances support bandwidth configurations for VPC and EBS
December 13th, 2024
Services
Share
AWS announces new AWS Direct Connect location in Osaka, Japan
December 13th, 2024
Services
Share
Amazon DynamoDB announces support for FIPS 140-3 interface VPC and Streams endpoints
December 13th, 2024
Services
Share