Amazon SageMaker AI Announces New observability capability For Inference Endpoints
Share
Services
Amazon SageMaker AI's new observability capability allows customers to operate production generative AI inference workloads with confidence by providing comprehensive visibility into token performance, GPU health, inference component placement, and autoscaling behavior. It takes away the manual work of searching CloudWatch for per-endpoint metrics, correlating latency spikes with GPU saturation or KV cache exhaustion and diagnosing why scaling operations are slow. This capability tracks inference performance metrics in real-time, including Time to First Token, inter-token latency, queue depth, and tokens per second, and surfaces them alongside infrastructure health so customers can identify and resolve issues in minutes rather than hours. SageMaker AI detailed observability transforms how customers monitor and optimize their inference fleet. The new pre-built SageMaker AI Insights dashboard in Amazon CloudWatch gives customers token latency, GPU utilization, inference component copy counts, scaling events, and cold start breakdowns in a single view with OpenTelemetry native metrics published automatically, no instrumentation required. This allows teams to quickly diagnose TTFT degradation, verify availability zone compliance, and tune autoscaling policies. Customers who have standardized on observability tools like Grafana can connect directly using the regional PromQL endpoint and import a pre-configured dashboard template. This capability helps customers self-serve operational issues and maximize the performance of their AI investments. SageMaker AI Inference observability is available in the following AWS Regions: US East (N. Virginia), US East (Ohio), US West (Oregon), US West (N. California), Canada (Central), South America (São Paulo), Europe (Ireland), Europe (Frankfurt), Europe (London), Europe (Stockholm), Europe (Zurich), Asia Pacific (Mumbai), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Asia Pacific (Seoul), and Asia Pacific (Jakarta). To learn more, visit the [Documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/monitoring-cloudwatch-detailed-observability.html) and [Amazon SageMaker AI](https://aws.amazon.com/sagemaker/ai/?trk=df5b8c0e-2f56-49fa-a37d-691eba3e8954&sc%5Fchannel=ps&trk=c5e59169-d233-4eef-b6e2-e3e751bfb3f2&ef%5Fid=:G:s&s%5Fkwcid=AL!4422!3!!e!!o!!amazon%20sagemaker%20ai!487441853!1151190521794397&msclkid=9745f631dad319a20f8e9c02eb1bda85) webpage.
What else is happening at Amazon Web Services?
Amazon Bedrock AgentCore increases default runtime quota limits
about 18 hours ago
Services
Share
Amazon ECS now provides real-time deployment observability in the AWS Management Console
about 19 hours ago
Services
Share
AWS Artifact now includes Assurance Assistant for compliance inquiries
about 20 hours ago
Services
Share
Amazon ECS now supports configurable deployment circuit breaker settings
about 23 hours ago
Services
Share
Amazon RDS announces Cross-Region Automated Backups in four additional AWS Regions
about 23 hours ago
Services
Share