
Johannesburg, SA – RED HAT SUMMIT — //Red Hat, the world’s leading provider of open source solutions, and Google Cloud today announced an expanded collaboration to advance AI for enterprise applications by uniting Red Hat’s open source technologies with Google Cloud’s purpose-built infrastructure and Google’s family of open models, Gemma.
Together, the companies will advance enterprise-grade use cases for scaling AI by:
- Launching the llm-d open source project with Google as a founding contributor
- Enabling support for vLLM on Google Cloud TPUs and GPU-based virtual machines (VMs) to enhance AI inference
- Delivering Day 0 support for vLLM on Gemma 3 model distributions
- Supporting Red Hat AI Inference Server on Google Cloud
- Propelling agentic AI with Red Hat as a community contributor for Google’s Agent2Agent (A2A) protocol
Bolstering AI inference with vLLM
Demonstrating its commitment to Day 0 readiness, Red Hat is now an early tester for Google’s family of open models, Gemma, starting with Gemma 3, delivering immediate support for vLLM. vLLM is an open-source inference server that speeds the output of generative AI (gen AI) applications. As the leading commercial contributor to vLLM, Red Hat is driving a more cost-efficient and responsive platform for gen AI applications.
Additionally, Google Cloud TPUs, the high-performance AI accelerators powering Google’s AI portfolio, are now fully enabled on vLLM.
This integration empowers developers to maximise resources while achieving the performance and efficiency crucial for fast and accurate inference.
Recognising the shift from AI research to real-world deployment, many organisations face the complexities of a diverse AI ecosystem and the need to shift to more distributed compute strategies.
To address this, Red Hat has launched the llm-d open source project, with Google as a founding contributor. Building on the momentum of the vLLM community, this initiative pioneers a new era of gen AI inference. The goal is to enable greater scalability across heterogeneous resources, optimise costs and enhance workload efficiency – all while fostering continued innovation.
Driving enterprise AI with community-powered innovation
Bringing the latest upstream community advancements to the enterprise, Red Hat AI Inference Server is now available on Google Cloud.
As Red Hat’s enterprise distribution of vLLM, Red Hat AI Inference Server helps enterprises optimise model inference across their entire hybrid cloud environment.
By leveraging the robust and trusted infrastructure of Google Cloud, enterprises can deploy production-ready gen AI models that are both highly responsive and cost-efficient at scale.
Underscoring their joint commitment to open AI, Red Hat is also now contributing to Google’s Agent2Agent (A2A) protocol – an application-level protocol facilitating more seamless communication between end-users or agents across diverse platforms and cloud environments.
By actively participating in the A2A ecosystem, Red Hat aims to help users unlock new avenues for rapid innovation, ensuring AI workflows remain dynamic and highly effective through the power of agentic AI.
Supporting Quotes
Brian Stevens, senior vice president and Chief Technology Officer – AI, Red Hat, said: “With this extended collaboration, Red Hat and Google Cloud are committed to driving groundbreaking AI innovations with our combined expertise and platforms.
Bringing the power of vLLM and Red Hat open source technologies to Google Cloud and Google’s Gemma equips developers with the resources they need to build more accurate, high-performing AI solutions, powered by optimised inference capabilities.”
Mark Lohmeyer, vice president and general manager, AI and Computing Infrastructure, Google Cloud, added: “The deepening of our collaboration with Red Hat is driven by our shared commitment to foster open innovation and bring the full potential of AI to our customers.
As we enter a new age of AI inference, together we are paving the way for organisations to more effectively scale AI inference and enable agentic AI with the necessary cost-efficiency and high performance.”