October 1st, 2024

Amazon Data Firehose delivers data streams into Apache Iceberg format tables in Amazon S3

Services

[Amazon Data Firehose](https://aws.amazon.com/kinesis/data-firehose/) (Firehose) can now deliver data streams into Apache Iceberg tables in Amazon S3. Firehose enables customers to acquire, transform, and deliver data streams into Amazon S3, Amazon Redshift, OpenSearch, Splunk, Snowflake, and other destinations for analytics. With this new feature, Firehose integrates with Apache Iceberg, so customers can deliver data streams directly into Apache Iceberg tables in their Amazon S3 data lake. Firehose can acquire data streams from Kinesis Data Streams, Amazon MSK, or Direct PUT API, and is also integrated to acquire streams from AWS Services such as AWS WAF web ACL logs, Amazon CloudWatch Logs, Amazon VPC Flow Logs, AWS IOT, Amazon SNS, AWS API Gateway Access logs and many others listed [here](https://docs.aws.amazon.com/firehose/latest/dev/create-name.html). Customers can stream data from any of these sources directly into Apache Iceberg tables in Amazon S3, and avoid multi-step processes. Firehose is serverless, so customers can simply setup a stream by configuring the source and destination properties, and pay based on bytes processed. The new feature also allows customers to route records in a data stream to different Apache Iceberg tables based on the content of the incoming record. To route records to different tables, customers can configure routing rules using JSON expressions. Additionally, customers can specify if the incoming record should apply a row-level update or delete operation in the destination Apache Iceberg table, and automate processing for data correction and right to forget scenarios. To get started, visit Amazon Data Firehose [documentation](https://docs.aws.amazon.com/firehose/latest/dev/apache-iceberg-destination.html), [pricing](https://aws.amazon.com/kinesis/data-firehose/pricing/), and [console](https://console.aws.amazon.com/firehose/home).