Why GPU PaaS Is Incomplete Without Infrastructure Orchestration and Tenant Isolation

GPU PaaS alone isn’t enough — secure multi-tenancy starts with bare metal, network, and storage control.

Introduction

GPU Platform-as-a-Service (PaaS) is gaining popularity as a way to simplify AI workload execution — offering users a friendly interface to submit training, fine-tuning, and inferencing jobs. But under the hood, many GPU PaaS solutions lack deep integration with infrastructure orchestration, making them inadequate for secure, scalable multi-tenancy.

If you’re a Neocloud, sovereign GPU cloud, or an enterprise private GPU cloud with strict compliance requirements,  you are probably looking at offering job scheduling of Model-as-a-Service to your tenants/users. An easy approach is to have a global Kubernetes cluster that is shared across multiple tenants. The problem with this approach is poor security as the underlying OS kernel, CPU, GPU, network, and storage resources are shared by all users without any isolation. Case-in-point, in September 2024, Wiz discovered a critical GPU container and Kubernetes vulnerability that affected over 35% of environments. Thus, doing just Kubernetes namespace or vCluster isolation is not safe.  

You need to provision bare metal, configure network and fabric isolation, allocate high-performance storage, and enforce tenant-level security boundaries — all automated, dynamic, and policy-driven.

In short: PaaS is not enough. True GPUaaS begins with infrastructure orchestration.

The Pitfall of PaaS-Only GPU Platforms

Many AI platforms stop at providing:

  • A web UI for job submission
  • A catalog of AI/ML frameworks or models
  • Basic GPU scheduling on Kubernetes  

What they don’t offer:

  • Control over how GPU nodes are provisioned (bare metal vs. VM)
  • Enforcement of north-south and east-west isolation per tenant
  • Configuration and Management of Infiniband, RoCE or Spectrum-X fabric
  • Lifecycle Management and Isolation of External Parallel Storage like DDN, VAST, or WEKA
  • Per-Tenant Quota, Observability, RBAC, and Policy Governance  

Without these, your GPU PaaS is just a thin UI on top of a complex, insecure, and hard-to-scale backend.

What Full-Stack Orchestration Looks Like

To build a robust AI cloud platform — whether sovereign, Neocloud, or enterprise — the orchestration layer must go deeper.

Here’s what needs to be provisioned and isolated before a PaaS layer can even work properly:

Layer What Should Be Orchestrated & Managed
Compute Bare Metal provisioning, VM instantiation, GPU passthrough, MIG slicing
Network VXLAN/VRF overlay per tenant, BGP EVPN control plane, north-south and intra-tenant isolation
Storage Mounting external storage components such as DDN/VAST/WEKA etc. via tenant-controlled flows, storage quota, file system access with north-south network isolation (see above)
Security Namespace separation, Runtime isolation, audit logging
Observability Per-tenant logs, GPU usage, billing hooks

Only when all of these mentioned layers are orchestrated and managed, PaaS can deliver real value.

How aarna.ml GPU CMS Solves This Problem

aarna.ml GPU CMS is built from the ground up to be infrastructure-aware and multi-tenant-native. It includes all the PaaS features you would expect, but goes beyond PaaS to offer:

  • BMaaS and VMaaS orchestration: Automated provisioning of GPU bare metal or VM pools for different tenants.
  • Tenant-level network isolation: Support for VXLAN, VRF, and fabric segmentation across Infiniband, Ethernet, and Spectrum-X.
  • Storage orchestration: Seamless integration with DDN, VAST, WEKA with mount point creation and tenant quota enforcement.
  • Full-stack observability: Usage stats, logs, and billing metrics per tenant, per GPU, per model.

All of this is wrapped with a PaaS layer that supports Ray, SLURM, KAI, Run:AI, and more, giving users flexibility while keeping cloud providers in control of their infrastructure and policies.

Why This Matters for AI Cloud Providers

If you're offering GPUaaS or PaaS without infrastructure orchestration:

  • You're exposing tenants to noisy neighbors or shared vulnerabilities
  • You're missing critical capabilities like multi-region scaling or LLM isolation
  • You’ll be unable to meet compliance, governance, and SemiAnalysis ClusterMax1 grade maturity

With aarna.ml GPU CMS, you deliver not just a PaaS, but a complete, secure, and sovereign-ready GPU cloud platform.

Conclusion

GPU PaaS needs to be a complete stack with IaaS — it’s not just a model serving interface!

To deliver scalable, secure, multi-tenant AI services, your GPU PaaS stack must be expanded to a full GPU cloud management software stack to include automated provisioning of compute, network, and storage, along with tenant-aware policy and observability controls.

Only then is your GPU PaaS truly production-grade.

Only then are you ready for sovereign, enterprise, and commercial AI cloud success.

To see a live demo or for a free trial, contact aarna.ml