January 13, 2026 — Leads & Copy — DigitalOcean’s Inference Cloud Platform is providing 2X production inference throughput for Character.ai, a leading AI entertainment platform. Character.ai operates one of the most demanding production inference workloads in the market, handling over a billion queries per day, through a tightly integrated software and hardware collaboration with AMD.
Character.ai utilizes both proprietary and open-source models to power its high-volume, high-concurrency, latency-sensitive applications. By migrating these workloads to DigitalOcean’s inference cloud platform, Character.ai has achieved significantly higher request throughput while adhering to rigorous latency targets. Compared to standard, non-optimized GPU infrastructure, this transition has reduced the cost per token by 50% and substantially expanded usable capacity for their end users.
According to David Brinker, Senior Vice President of Partnerships at Character.ai, the results exceeded expectations. He said that DigitalOcean delivered reliable performance that unlocked higher sustained throughput and improved economics, which directly supports the growth of their platform.
This performance milestone builds on DigitalOcean’s momentum with large-scale AI customers like Character.ai, supporting platform expansion and richer multimodal experiences.
DigitalOcean worked closely with Character.ai and AMD to deploy AMD Instinct™ GPUs optimized specifically for inference workloads. DigitalOcean’s platform integrates hardware-aware scheduling and optimized inference runtimes to extract higher sustained performance per node.
AMD has invested in ROCm™, its open end-to-end AI software stack. The teams optimized ROCm with vLLM, AITER – AMD’s inference-focused runtime and optimization framework for transformer workloads – and deployment configurations for Character’s workloads on DigitalOcean AMD Instinct™ MI300X and MI325X GPUs, contributing to throughput improvement.
According to Vamsi Boppana, Senior Vice President of Artificial Intelligence at AMD, the collaboration with DigitalOcean helped Character.ai unlock higher sustained inference throughput and improved efficiency. He added that by combining AMD Instinct™ GPUs, the open ROCm™ software stack, and platform-level optimization, DigitalOcean’s Inference Cloud is delivering a scalable, cost-effective foundation for running large-scale, latency-sensitive AI workloads in production. Together, they are accelerating the builders who are defining the next generation of AI applications.
In collaboration with Character.ai, DigitalOcean engineers tuned distributed inference configurations to balance latency, throughput, and concurrency. In some production scenarios, these optimizations increased throughput by 2X under the same latency constraints, directly improving the total cost of ownership.
DigitalOcean’s Inference Cloud is designed to operate AI applications in production. The platform delivers a unified hardware-software paradigm, where orchestration and system-level tuning work together to deliver cost-efficiency, observability, and operational simplicity across production AI workloads at scale.
Paddy Srinivasan, Chief Executive Officer of DigitalOcean, stated that Character.ai runs one of the most demanding real-time inference workloads in the market. He added that this work shows what happens when advanced hardware meets a platform designed specifically for production inference, making large-scale AI applications easier and more economical to run.
The Character.ai deployment reflects a broader shift in how AI infrastructure is built and evaluated. As inference workloads scale, customers are prioritizing predictable performance, operational simplicity, and cost efficiency over raw hardware specifications.
For additional information on the specific testing methodologies, hardware configurations, and performance benchmarks used to achieve these results, see the technical deep-dive here.
DigitalOcean is an inference cloud platform that helps AI and Digital Native Businesses build, run, and scale intelligent applications with speed, simplicity, and predictable economics. The platform combines production-ready GPU infrastructure, a full-stack cloud, model-first inference workflows, and an agentic experience layer to reduce operational complexity and accelerate time to production. More than 640,000 customers trust DigitalOcean to deliver the cloud and AI infrastructure they need to build and grow.
Source: DigitalOcean
