The introductory blog introduced the concept of GPU-as-a-Service (GPUaaS). The next blog classifies GPUaaS providers and describes factors that are driving GPU demand.
Key GPUaaS Provider Classification:
GPUs are offered either as bare metal (one or more physical machines) or as a virtual machine (VM) to be consumed via APIs. The bare metal or VM instance may optionally have a Container-as-a-Service (CaaS) layer on top managed by Kubernetes.
We have classified key GPUaaS players to help end users choose the right provider:
- Hyperscalers like Google, AWS, Azure, Oracle and newer entrants like Lambda Labs, Coreweave, Digital Ocean etc, who provide bare metal, VMs or CaaS solutions, often packaged with their own PaaS or SaaS layers like LLM models, Pytorch, LLMOps/MLOps/RunOps.
- Traditional Telcos and Data centers with large commitments to buying GPUs to build “Sovereign AI Clouds” in their host countries (this is discussed later in the blog), If they are based on Nvidia, these Telcos and Data centers are called as “Nvidia Cloud Partners” (NCP) if their GPU commitment is sufficiently large. There are numerous NCPs consuming GPUs in the billions or tens of billions of dollars. The primary use case is LLM training and for this reason, they tend to favor bare metal instances.
- Small, regional or edge data centers with smaller commitments and a focus on other use cases beyond LLM training. They tend to offer bare metal and VM instances optionally with a CaaS layer.
- Startups, particularly those that may have started with a crypto currency use case or have significant capital deployed in the acquisition of GPUs – these players may also choose to build “industry clouds”. Their offerings include bare metal and VM instances, but often go beyond by offering PaaS or SaaS layers. If there is a vertical industry orientation, these PaaS or SaaS layers tend to be industry specific e.g. Fintech, life science, and more.
What is driving GPU demand? :
The primary use case for the massive growth in GPUs is LLM training. Massive data sets (at internet scale) across languages are driving an insatiable demand to lock up the latest GPUs to train state-of-the-art models. Goldman Sachs 2023 report had predicted training to drive most of Nvidia’s revenues in 2024 and 2025.
However, contrary to Goldman Sach’s opinion, we expect inference use cases to show up much earlier and drive the next wave of GPU growth because there is no choice; models will need to be deployed for end user applications in order to generate ROI on the initial LLM investment. By design we also expect inferencing to be at a scale that's an order of magnitude greater than learning (5x to 40x) of learning. It is anticipated that the inference use case will be fragmented across Nvidia, AMD, Qualcomm and emerging startups like Groq and Tenstorrent (additional reading here and topic for another blog). Over time, we also expect model fine tuning and Small Language Models (SLM) to drive additional GPU growth.
Another source that will continue to drive demand for GPUs is “Sovereign AI”. Sovereign AI as defined by Michael Dell in his recent blog is a nation’s capability to produce artificial intelligence using its own infrastructure and data. We expect countries’s governments and public sector to embrace this idea. Most will be reluctant to use hyperscaler AI Clouds for their AI initiatives.
In summary, we expect a step change in the way the market works – newer entrants including Telcos and Data Center companies will shape the industry as they spot a unique window of opportunity to win local AI Cloud business away from Hyperscalers. GPU demand and the number of GPUaaS providers will grow significantly in the next few years. New use cases will further contribute to this trend.
The next blog will cover additional GPUaaS and NCP topics – both business and technical.!
About us : Aarna.ml is an NVIDIA and venture backed startup building software that helps GPU-as-a-service companies build hyperscaler grade cloud services with multi-tenancy and isolation.