Reference Architecture

Download the On-Demand Multi-Tenancy Reference Architecture (RA)

Building a comprehensive GPUaaS solution requires addressing key challenges such as unified multi-tenancy, support for diverse workloads, and maximizing GPU utilization. Today’s AI cloud infrastructures are fragmented across compute, networking, storage, and PaaS, making seamless tenant onboarding and resource isolation complex.

Additionally, providers must evolve beyond static bare-metal allocations to dynamic offerings like Job Submission and Model Serving, ensuring infrastructure can be repurposed efficiently for various customer needs. With AI workloads ranging from LLM training to real-time inference and RAG, intelligent GPU orchestration is essential to optimize resource allocation and scalability.

The aarna.ml on-demand multi-tenancy Reference Architecture (RA) provides a holistic blueprint to address these challenges, enabling self-service, automated multi-tenant GPU management. It offers:

Unified Multi-Tenancy Management – Automated tenant onboarding with optimal isolation strategies.

Flexible Service Offerings – Support for IaaS and PaaS, including bare-metal, VMs, Kubernetes, model serving, and job scheduling.

Enhanced Resource Utilization – Smart GPU orchestration for dynamic scaling and efficient workload allocation.

E2E Orchestration - Enable Scalable, Efficient Al Cloud Infrastructure, Platform and Applications.

Download the RA now to explore how aarna.ml enables scalable, efficient, and intelligent AI cloud infrastructure.

On-demand Multi Tenancy Reference Architecture (RA) for NVIDIA Cloud Partners and AI Cloud providers

Download the On-Demand Multi-Tenancy Reference Architecture (RA)

Get your copy of Reference Architecture document

Unlock the full potential of your AI cloud with Aarna

Main links

Products

Solutions

Stay up to date on aarna.ml