AWS Media: cost-efficient streaming Bedrock AI for fan engagement
Back to News

AWS Media: cost-efficient streaming Bedrock AI for fan engagement

Published on June 2, 2026

Bedrock Cost and Agent Workflows



Executive Summary


Two AWS Media reference architectures illustrate different operational priorities: cost-optimized streaming infrastructure and an agentic AI companion for sports fans. One focuses on FinOps controls across compute, storage, networking, observability, and workflow orchestration. The other details a multi-agent, text-to-SQL system built on Amazon Bedrock and Amazon Athena over Apache Iceberg tables, with real-time ingestion and safety controls using Amazon Bedrock Guardrails.



Key Industry Developments


  • FinOps patterns for streaming workloads on AWS
  • Bedrock Streaming applies a multi-axis cost strategy that includes Amazon EC2 Spot Instances for Kubernetes worker nodes, AWS Graviton processors, API call rationalization, and storage lifecycle attention.
  • Production Kubernetes worker nodes run on EC2 Spot Instances with automatic failover to on-demand instances when Spot capacity is limited.
  • The platform uses instance reservations for managed services including Amazon RDS, Amazon ElastiCache, and Amazon OpenSearch Service, and uses capacity reservations for Amazon DynamoDB.
  • Storage, registry, and data-retention cost controls
  • Amazon S3 is used as the foundation for hosting a video library, with S3 storage classes, automatic transition policies, Intelligent-Tiering, and lifecycle management.
  • Amazon ECR cost growth from accumulated Docker images is addressed by enabling automatic purging of old images.
  • DynamoDB time to live (TTL) is used to automatically expire obsolete data, reducing storage and backup costs.
  • Network, data-transfer, and API-cost optimization techniques
  • VPC endpoints for Amazon S3 and Amazon DynamoDB are used to eliminate AWS Network Address Translation (NAT) Gateway transit costs.
  • Video-on-demand processing is redesigned so each request stays within its entry Availability Zone, eliminating inter–Availability Zone transfer costs.
  • S3 API costs are reduced by adding a cache and optimizing API calls, including removing redundant existence checks and implementing 404 handling.
  • Observability and workflow cost engineering
  • CloudWatch metrics are exported to Prometheus, and collected metrics are rationalized to reduce CloudWatch API call costs by 2.5 times.
  • Amazon CloudWatch Logs Infrequent Access is used for logs requiring extended retention to reduce ingestion and storage costs.
  • AWS Step Functions express mode is selected over standard mode to reduce costs for a workflow project.
  • Amazon SQS message handling is optimized via batch processing, including batch deletion that reduces deletion operation costs by 66%.
  • Agentic AI architecture patterns for fan-facing applications
  • Bundesliga’s Captain is an agentic AI companion embedded in the official Bundesliga app.
  • The chat-based service uses a text-to-SQL workflow: an LLM converts questions into queries executed by Amazon Athena against statistical data stored in Amazon S3 Tables.
  • Questions are classified by type and complexity using Amazon Nova 2 Lite, then routed to Amazon Nova Pro or Claude Sonnet 4 in Amazon Bedrock.
  • The research service is implemented as an autonomous agentic loop using the Strands Agents SDK with Amazon Bedrock and is designed to run on Amazon Bedrock AgentCore.


Real-World Use Cases


  • Autoscaling streaming infrastructure with Spot capacity and faster instance startup
  • Bedrock Streaming runs Kubernetes worker nodes 100% on EC2 Spot Instances and uses automatic failover to on-demand instances during limited Spot capacity.
  • Custom Amazon Machine Images (AMIs) are used to reduce startup time for new instances, and the deployment uses more than 10 instance types across three Availability Zones.
  • End-to-end cost controls across storage, queues, and databases
  • Video library storage on Amazon S3 uses storage classes, Intelligent-Tiering, and lifecycle management to manage long-lived content.
  • DynamoDB TTL automatically expires obsolete records, reducing storage and backup costs, and capacity mode is evaluated per table, including migrations from on-demand to provisioned mode after observation.
  • Amazon SQS batch processing reduces operational overhead, including a 66% reduction in deletion operation costs through batch deletion.
  • Reducing network and API overhead in high-volume request paths
  • VPC endpoints for S3 and DynamoDB remove NAT Gateway transit charges for traffic that would otherwise traverse NAT.
  • Video-on-demand processing keeps requests within the entry Availability Zone to avoid inter–Availability Zone transfer charges.
  • An image download API reduces S3 API costs by adding caching and optimizing request patterns (including removing redundant existence checks and implementing 404 handling).
  • Multi-agent fan assistant with governed text-to-SQL and real-time ingestion
  • Captain uses a multi-agent architecture that includes a Router Agent, Stats Agent, and Video Agent to handle different question types, including video-related requests.
  • The data foundation uses Amazon S3 Tables in Apache Iceberg format, with Amazon Athena as the serverless query engine.
  • Real-time ingestion uses Amazon Managed Streaming for Apache Kafka (Amazon MSK), AWS Lambda, and Amazon Data Firehose, with metadata registered in AWS Glue Data Catalog.
  • Amazon Bedrock Guardrails are used for input filtering (including blocking prompt injection attempts, inappropriate content, and off-topic requests) and for output grounding checks.


Why It Matters


  • Cost outcomes depend on engineering choices across multiple layers
  • The Bedrock Streaming example ties cost control to concrete mechanisms: Spot capacity with failover, reservations for managed services, DynamoDB capacity planning, and lifecycle-based storage management.
  • Network architecture choices (VPC endpoints, Availability Zone–localized processing) are treated as direct levers for reducing NAT and inter–Availability Zone transfer costs.
  • Operational telemetry can be a measurable cost driver
  • Exporting CloudWatch metrics to Prometheus and aligning collected metrics with what is used in Grafana reduces CloudWatch API call costs by 2.5 times, showing that metric selection and collection workflows affect spend.
  • Log retention strategy (CloudWatch Logs Infrequent Access) is used as a specific control for extended-retention requirements.
  • Agentic AI systems benefit from explicit routing, structured querying, and guardrails
  • Captain’s workflow combines classification (Amazon Nova 2 Lite), model routing (Amazon Nova Pro or Claude Sonnet 4), and text-to-SQL execution (Athena over S3 Tables) to produce data-driven responses.
  • Guardrails are applied both before generation (input filtering) and after generation (output grounding checks), providing a defined safety and quality-control workflow for a chat-based assistant.


Sources


  • https://aws.amazon.com/blogs/media/how-bedrock-streaming-optimizes-its-aws-costs/
  • https://aws.amazon.com/blogs/media/how-bundesliga-built-captain-an-ai-agent-for-fans-using-amazon-bedrock/