NVIDIA accelerated computing can significantly accelerate many different types of workloads. In this blog, I will explain how the same NVIDIA GPU computing infrastructure (all the way to fractional GPU) can be shared for different workloads, such as RAN (Radio Access Network) and AI/ML workloads, in a fully automated manner. This is the foundational requirement for enabling AI-RAN, a technology that is being embraced widely by the telecommunications industry, to fuse AI and RAN on a common infrastructure as the next step towards AI-native 5G and 6G networks. I will also show a practical use case that was demonstrated to a Tier-1 telco.
First some background before diving into the details: The infrastructure requirements for a specific type of workload (e.g., RAN or AI/ML) will vary dynamically, and the workloads cannot be statically assigned to the resources. This is particularly aggravated by the fact that RAN utilization can vary wildly, with the average being between 20%-30%. The unused cycles can be dynamically allocated to other workloads. The challenges in sharing the same GPU pool across multiple workloads can be summarized below:
- Infrastructure requirements may be different for RAN/5G & AI workloads
- Dependency on networking, such as switch re-configuration, IP/MAC address reassignment, etc.
- Full isolation at infra level for security and performance SLAs between workloads
- Multi-Instance GPU (MIG) sizing - Fixed partitions or dynamic configuration of MIG
- Additional workflows that may be required, such as workload migration/scaling
This means that there is a need for an intelligent management entity, which is capable of orchestrating both infrastructure as well as different types of workloads, and switch the workloads in a dynamic fashion. This is accomplished using AMCOP (Aarna Networks Multicluster Orchestration Platform, which is Aarna’s Orchestration platform that supports orchestrating infrastructure, workloads, and applications).
The end-to-end scenario works as follows:
- Create tenants for different workloads – RAN & AI. There may be multiple tenants for AI workloads if multiple user AI jobs are scheduled dynamically
- Allocate required resources (servers or GPUs/fractional GPUs) for each tenant
- Create network and storage isolation between the workloads
- Provide an observability dashboard for the admin to monitor the GPU utilization & other KPIs
- Deploy RAN components i.e. DU, CU, and NVIDIA AI Aerial (with Day-0 configuration) from RAN tenant
- Deploy AI workloads (such as an NVIDIA AI Enterprise serverless API or NIM microservice) from AI tenant(s)
- Monitor RAN traffic metrics
- If the RAN traffic load goes below the threshold, consolidate RAN workload to fewer servers/GPUs/fractional GPUs
- Deploy (or scale out) the AI workload (e.g. LLM Inferencing workload), after performing isolation
- If the RAN traffic load exceeds the threshold, spin down (or scale in) AI workload, and subsequently, bring up RAN workload
The demo for showcasing a subset of this functionality using a single NVIDIA GH200 Grace Hopper Superchip is described below. This uses a single GPU (which is divided into fractional GPUs, as 3+4 MIG configuration), which are allocated to different workloads.
The following functionality can be seen in the demo, as part of the end-to-end flow.
- Open the dashboard and show the RAN KPIs on the orchestrator GUI. Also, show the GPU and MIG metrics.
- Show all the RAN KPIs and GPU + MIG metrics for the past duration (hours / days)
- Show the updated RAN & GPU / MIG utilizations + AI metrics
- Initiate the AI load/performance testing and then show the AI metrics and GPU/MIG utilizations on the dashboard
- Query the RAG model (from a UE) from a custom GUI and show the response.
The functional block diagram of the demo configuration using AMCOP based solution is as shown.
Next Steps:
Over the next few years, we predict every RAN site to run on an NVIDIA GPU-accelerated infrastructure. Contact us for help on getting started with sharing NVIDIA GPU compute resources within your infrastructure. Aarna.ml’s AI-Cloud Management Software (also known as AMCOP) orchestrates and manages GPU-accelerated environments including with support for NVIDIA AI Enterprise software and NVIDIA NIM microservices. Working closely with NVIDIA, we have deep expertise with the NVIDIA Grace Hopper platform, as well as NVIDIA Triton Inference Server and NVIDIA NeMo software.