MLCommons Releases MLPerf Inference v5.1 Benchmark Results, Showcasing AI Innovation

September 9, 2025 — Leads & Copy — MLCommons® has released the results of its MLPerf® Inference v5.1 benchmark suite, highlighting advancements in AI capabilities, models, and hardware/software systems.

The MLPerf Inference suite measures the speed at which systems run AI models across workloads. The open-source suite benchmarks system performance in a reproducible manner, fostering innovation, performance, and energy efficiency.

This round had a record 27 participants, featuring systems with five new processors and enhanced AI software frameworks. The v5.1 suite includes three new benchmarks.

According to Scott Wasson, Director of Product Management at MLCommons, the Inference 5.1 includes new benchmark tests, such as DeepSeek-R1 and interactive scenarios for LLM-based tests. Results showed substantial performance gains over prior rounds.

The Llama 2 70B benchmark was popular, with 24 submitters. Some systems improved by 50% over the 5.0 release. A heterogeneous system used software to load-balance inference across accelerators.

Version 5.1 expands the interactive scenario, testing performance under lower latency constraints. Three new benchmarks were introduced: DeepSeek-R1; Llama 3.1 8B; and Whisper Large V3.

DeepSeek R1 is the first reasoning model in the suite, tackling tasks by breaking down problems. Miro Hodak, MLPerf Inference working group co-chair, noted the importance of understanding how reasoning models perform.

Llama 3.1 8B, a smaller LLM, replaces an older one (GPT-J) for text summarization. Whisper Large V3 is a speech recognition model with high accuracy. Frank Han, co-chair, highlighted the need to benchmark beyond large language models.

Participating organizations included AMD, ASUSTek, Azure, Broadcom, Cisco, Coreweave, Dell, GATEOverflow, GigaComputing, Google, Hewlett Packard Enterprise, Intel, KRAI, Lambda, Lenovo, MangoBoost, MiTac, Nebius, NVIDIA, Oracle, Quanta Cloud Technology, Red Hat Inc, Single Submitter: Amitash Nanda, Supermicro, TheStage AI, University of Florida, and Vultr.

New accelerators tested: AMD Instinct MI355X, Intel Arc Pro B60 48GB Turbo, NVIDIA GB300, NVIDIA RTX 4000 Ada-PCIe-20GB, and NVIDIA RTX Pro 6000 Blackwell Server Edition.

David Kanter, head of MLPerf at MLCommons, emphasized the mission to provide trustworthy performance data.

New submitters: MiTac, Nebius, Amitash Nanda, TheStage AI, University of Florida, and Vultr. Lenovo and GATEOverflow submitted power measurements.

For MLPerf Inference v5.1 results, visit the Datacenter and Edge benchmark results pages.

MLCommons is an open engineering consortium focused on AI benchmarking, supported by over 125 members.

For MLCommons information, visit MLCommons.org or email participation@mlcommons.org.

Press Inquiries: contact press@mlcommons.org

Source: MLCommons

×

Welcome!

AIReporter.news is a Leads & Copy Publication

Leads & Copy is a Media “news tip” source, providing Industry Reporters story Leads, written as Publishable CP-style Copy.

By Subscribing you will receive Daily AI Story Leads via email 10:30 am ET Mon-Fri.