Amazon Bedrock introduces Priority and Flex inference service tiers
Share
Services
Today, Amazon Bedrock introduces two new inference service tiers to optimize costs and performance for different AI workloads. The new **Flex** tier offers cost-effective pricing for non-time-critical applications like model evaluations and content summarization while the **Priority** tier provides premium performance and preferential processing for mission-critical applications. For most models that support Priority Tier, customers can realize up to 25% better output tokens per second (OTPS) latency compared to standard tier. These join the existing **Standard** tier for everyday AI applications with reliable performance. These service tiers address key challenges that organizations face when deploying AI at scale. The Flex tier is designed for non-interactive workloads that can tolerate longer latencies, making it ideal for model evaluations, content summarization, labeling and annotation, and multistep agentic workflow, and it’s priced at a discount relative to the Standard tier. During periods of high demand, Flex requests receive lower priority relative to the Standard tier. The Priority tier is an ideal fit for mission critical applications, real-time end-user interactions, and interactive experiences where consistent, fast responses are essential. During periods of high demand, Priority requests receive processing priority, at a premium price, over other service tiers. These new service tiers are available today for a range of leading foundation models, including OpenAI (gpt-oss-20b, gpt-oss-120b), DeepSeek (DeepSeek V3.1), Qwen3 (Coder-480B-A35B-Instruct, Coder-30B-A3B-Instruct, 32B dense, Qwen3-235B-A22B-2507), and Amazon Nova (Nova Pro and Nova Premier). With these new options, Amazon Bedrock helps customers gain greater control over balancing cost efficiency with performance requirements, enabling them to scale AI workloads economically while ensuring optimal user experiences for their most critical applications. For more information about the AWS Regions where Amazon Bedrock Priority and Flex inference service tiers are available, see the [AWS Regions](https://docs.aws.amazon.com/bedrock/latest/userguide/service-tiers-inference.html) table Learn more about service tiers in our [News Blog](https://aws.amazon.com/blogs/aws/new-amazon-bedrock-service-tiers-help-you-match-ai-workload-performance-with-cost) and [documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/service-tiers-inference.html).
What else is happening at Amazon Web Services?
AWS Shield network security director now supports multi-account analysis
about 17 hours ago
Services
Share
Read update
Services
Share
Amazon EMR Managed Scaling is now available in 7 additional AWS regions
about 17 hours ago
Services
Share
Amazon EC2 X2iedn instances now available in AWS Europe (Zurich) region
about 24 hours ago
Services
Share
AWS DataSync introduces Terraform support for Enhanced mode
about 24 hours ago
Services
Share
Validate best practice compliance for SAP ABAP applications with AWS Systems Manager
about 24 hours ago
Services
Share