" "
Aarna.ml

Resources

resources

Blog

Amar Kapadia

IaaS vs. PaaS vs. SaaS: NCP Productization Options for a GPU-as-a-Service AI Cloud Offering
Find out more

Are you a data center provider, telco, NVIDIA Cloud Partner (NCP) or startup that has decided to offer a GPU-as-a-Service (GPUaaS) AI cloud? You need to rapidly decide what your offering is   going to look like. With multiple technical options, the ultimate decision depends on your customer requirements, the type of competition you are facing and your desired differentiation in an increasingly commoditized service.

Some first level decision points are whether your offering will be  Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS) or Software-as-a-Service (SaaS). Of course, these are not mutually exclusive, you may choose to offer a combination. Let’s dig into some details.

IaaS

IaaS largely means offering compute instances with GPUs to end users. This is probably the most common offering today. The sizing of these instances will vary based on the GPU capability, vCPU count, memory and storage sizing, and network throughput. Even with IaaS, there are some sub-options:

  • BMaaS or Bare-Metal-as-a-Service. A server like NVIDIA HGX or MGX could be offered as a service with a simple operating system. The benefit for a user is to be able to get the instance on-demand using a self service mechanism. The user has full control of that bare metal server and can release the instance when they are done, without incurring any CAPEX.
  • VM: If your customers need instances smaller than a single bare metal server (e.g. for inferencing), you will need to turn to virtualization. With virtual machines, you can offer fractional servers. With the VMware cost increases and OpenStack increasingly becoming a legacy technology, your choice is realistically limited to Kubernetes (see Kata containers). 
  • Clustered instances: If your customers are interested in model training, then they will need multiple GPUs clustered into a single instance. For example, multiple HGX servers will have to be clustered together and offered as a single instance to your customers.

Of course with IaaS, you will encounter challenges like multi-tenancy and isolation, self-service APIs, and on-demand billing that will need to be solved to be able to offer a complete solution to customers.

PaaS

With PaaS,  complexities of the underlying infrastructure are hidden and the offering is a higher level abstraction. The options range from a GPU based Kubernetes cluster optimized to run NVIDIA NIM, LLMOps/MLOps, fine-tuning-as-a-Service, vector-database-as-as-Service, GPU spot instance creation (to sell excess unused capacity), among other services. A move from IaaS to PaaS instantly creates more value around your offering but requires additional technical sophistication and instrumentation.

Saas

The next level of sophistication is to offer managed software directly to users in the form of SaaS. This could include LLM-as-a-Service (similar to what OpenAI and the hyperscalers provide), RAG-as-a-Service, and more. This layer adds even more value than IaaS or PaaS.

To compete you will need to move up the value chain, leaving the low level “boring” infrastructure orchestration & management to Aarna.ml so that you can focus on building your differentiation.The Aarna Multi Cluster Orchestration Platform (AMCOP) orchestrates and manages low level infrastructure to achieve network isolation, Infiniband isolation, GPU/CPU configuration, OS and Kubernetes orchestration, storage configuration and more. Once the initial orchestration is complete, AMCOP monitors and manages the infrastructure as well. If you would like to slash your time-to-market, and build a differentiated and sustainable GPUaaS please get in touch with us for an initial 3-day architecture and strategy assessment.

Pavan Samudrala

Introducing AMCOP v4.0.1: Revolutionizing Private 5G Edge Orchestration
Find out more

We're excited to announce the release of AMCOP 4.0.1, the latest version of our Private 5G Edge Orchestrator. This is the first release based on Nephio/Kubernetes. Nephio Project, initiated by Google and backed by the Linux Foundation, is dedicated to providing carrier-grade, user-friendly, open, Kubernetes-based cloud-native intent automation.

The project aims to offer common automation templates for seamless deployment and management. Designed to simplify the complexities of enterprise edge and private 5G networks, AMCOP 4.0.1 introduces powerful enhancements, including Bare Metal Provisioning and OAI (OpenAirInterface) End-to-End Orchestration, that redefine the landscape of edge computing and network automation.


AMCOP Private 5G Edge Orchestrator serves as a comprehensive platform for orchestration, lifecycle management, real-time policy enforcement, and closed-loop automation of 5G network services and edge computing applications. By enabling zero-touch orchestration of edge infrastructure, applications, and network services at scale, AMCOP provides organizations with a unified management experience through a single pane of glass.

In AMCOP 4.0.1, the Orchestrator functionality is further enhanced with the introduction of Bare Metal Provisioning capabilities. This feature streamlines the process of setting up and managing infrastructure resources, and has options of including VM fleet management on platforms like VMware and Kubernetes cluster creation. With Bare Metal Provisioning, organizations can effortlessly deploy and manage their private 5G infrastructure, ensuring optimal performance and reliability.

Additionally, AMCOP 4.0.1 introduces OAI End-to-End Orchestration, enabling seamless integration with OpenAirInterface (OAI) technologies for end-to-end management of 5G network services. This integration facilitates the deployment and operation of various types of workloads, including Kubernetes workloads, with provisions of integrating VMs as Kubernetes objects using Kubevirt, and VMs on hypervisors like ESXi. With OAI End-to-End Orchestration, organizations can create a seamless connection across data resources and processes, driving efficiency and innovation in their edge computing environments.

Furthermore, AMCOP 4.0.1 continues to excel in lifecycle management, configuration management, KPI monitoring, and service assurance capabilities. The platform ensures seamless orchestration of containerized workloads across diverse clusters, simplifying deployment, updating, scaling, and monitoring of applications. With a centralized interface providing a "single pane of glass" view, administrators can efficiently manage the entire system, simplifying monitoring, troubleshooting, and management tasks.

In conclusion, AMCOP 4.0.1 represents a significant leap forward in private 5G edge orchestration, empowering organizations to unlock the full potential of their edge computing environments. With enhanced capabilities for Bare Metal Provisioning and OAI End-to-End Orchestration, AMCOP continues to lead the way in revolutionizing network automation and edge computing.

Join us in embracing the future of enterprise edge and private 5G networks with AMCOP 4.0.1. Experience it through our User Experience Kit.

Sandeep Sharma

Unlocking the Potential of Nephio R2 at Nephio India Meetup
Find out more

At the recent Nephio India Meetup held in Bangalore, Sandeep Sharma, Principal Architect at Aarna.ml, shared exciting advancements in Nephio R2. This update crucially supports multi-vendor orchestration across diverse 5G network components.

Challenges and Solutions

Managing networks across multiple vendors can be complex, demanding integrated strategies for various network functions. Nephio R2 introduces a topology controller (experimental) and enhanced automation capabilities to simplify the deployment, configuration, and monitoring of complex network setups. This tool enables network architects to define high-level intents for network configurations, making operations across heterogeneous environments more straightforward.

Watch Sandeep’s full discussion on the transformative capabilities of Nephio R2.

Benefits and Practical Applications

Nephio R2 is particularly beneficial in environments requiring specific configurations for network functions such as User Plane Functions (UPF), allowing for more efficient network management and reduced operational costs. The development of Nephio R2 signifies a significant advancement towards more adaptive, resilient, and efficient network management frameworks, supporting the rapidly evolving demands of modern telecommunications.

Engage with Us

Explore how Nephio R2 can optimize your network operations. For a deeper understanding or to discuss how Aarna.ml can assist in your digital transformation journey, contact us.

Sriram Rupanagunta

Insights from KubeCon + CloudNativeCon Europe 2024
Find out more

I recently had the opportunity to be a part of the KubeCon + CloudNativeCon event in Paris. This event, held annually, serves as a crucial forum for the exchange of ideas and advancements in cloud computing—a field that continues to redefine the boundaries of digital infrastructure and services.

KubeCon + CloudNativeCon brings together industry leaders and innovators to discuss the latest in Kubernetes and cloud-native technologies. These discussions are not just technical; they address the practical challenges and opportunities facing businesses today. The event offers insights into optimizing cloud infrastructure, enhancing application deployment, and ultimately, driving business agility and innovation.

In this event, we had an opportunity to collaborate with Red Hat, and present at their partner booth. In this collaboration with Red Hat, Aarna.ml addressed the complexities of cloud-native environments through our orchestration solutions. Our focus was on demonstrating the integration of Aarna’s Orchestration product AMCOP with Red Hat's OpenShift, to look at how our technologies simplify the deployment and configuration of infrastructure as well as cloud-native applications. This presentation emphasized the ease of use  for businesses to harness the benefits of cloud capabilities using OpenShift.

The key takeaways from our presentation—and indeed, the event as a whole—is the critical role of cloud-native technologies in modern digital strategies. Here’s what those who engage with our presentation can expect to gain:

  • Simplified Infrastructure and Cloud native network and application  Management: Insights into overcoming the complexities of deploying infrastructure and  cloud-native environments, enabling more efficient and flexible management of digital infrastructure, with gitops baked into the solution.
  • Practical Applications: Examples of how businesses can deploy and manage applications more effectively with the right tools and strategies.
  • Future-proofing Strategies: Understanding the trajectory of cloud technologies to ensure that your business remains competitive and agile in a rapidly evolving digital landscape.

As we navigate the complexities and possibilities of the cloud, it’s clear that collaboration, innovation, and a focus on practical solutions are key to unlocking its full potential.

For businesses looking to thrive in the digital era, embracing these technologies is not optional—it’s essential. I invite those interested in driving their digital strategies forward to explore our presentation and discover how Aarna.ml can support your journey. For a deeper understanding, please view the complete demonstration video and review the presentation materials.

For any inquiries or further discussions on this topic, please don't hesitate to reach out. We are here to assist and look forward to connecting with you. Contact us here.

Vikas Kumar

Automating Cloud Infrastructure and Network Functions with Nephio
Find out more

At the recent IEEE Workshop on Network Automation, I had the opportunity to share insights on the advancements in automating cloud infrastructure and network functions using Nephio. This blog post aims to encapsulate the essence of that presentation, delving into the transformative potential of Nephio in the telecommunications industry.

  • Recent Trends in Telco Network Automation - Move to Cloud Native
  • Telco Cloud Moving to IaaS
  • Scale

Telecommunications networks are evolving rapidly, driven by the increasing demand for faster, more reliable connectivity and the emergence of technologies like 5G and edge computing. Telco network automation plays a pivotal role in this evolution, enabling operators to streamline operations, enhance efficiency, and deliver superior services to end-users.

Challenges in Traditional Approaches: 

Traditionally, telcos have relied on manual configurations and management of network infrastructure, leading to inefficiencies, human errors, and slow response times. The complexity of modern networks exacerbates these challenges, necessitating a paradigm shift towards automation to meet the demands of today's digital landscape.

Enter Nephio: 

Nephio emerges as a game-changer in the realm of telco network automation, offering a comprehensive platform equipped with advanced capabilities to automate cloud infrastructure and network functions seamlessly. Nephio empowers operators to achieve unparalleled levels of agility, scalability, and performance in their networks.

In this talk we had a deep dive into Nephio concepts and discussed below in detail: 

  • Config Injection
  • Package Specialization
  • Condition Choreography

Then we talked about some of the industry relevant use cases like orchestrating bare-metal servers, deploying different kinds of workload on top of them.

At the end we discussed the next steps and how we can use the power of AI along with Nephio.

The Kubernetes Nephio framework is ideal for using GenAI for human-machine interaction due to its declarative intent and we can use AI for :

  • Prompts to declare intent (instead of YAML files)
  • Prompts to interact with logs/metrics (instead of looking at dashboards)
  • Prompts to get solutions to system anomalies 

In conclusion, the presentation on "Automating Cloud Infrastructure and Network Functions with Nephio" at IEEE Telco Network Automation underscored the significance of embracing innovative technologies like Nephio to navigate the complexities of modern telecommunications networks effectively. As the industry continues to evolve, Nephio stands at the forefront of driving digital transformation and empowering telcos to thrive in an increasingly competitive landscape.

Amar Kapadia

Server Architecture Changing After Six Decades?
Find out more

At the Partner Reception at GTC last week, Jensen stated that the computer architecture is changing for the first time since 1964 with the advent of accelerated computing. 

I tend to agree. After 60 years, the server architecture is changing from retrieval based to generative for the first time. The below diagram captures my thinking, which is centered around the Human Machine Interface (HMI).

Human Machine Interface

From the mid 60s to mid 80s, the HMI was CLI and the “servers” were minicomputers and mainframes built by companies such as IBM, Digital Equipment Corporation, and HP using highly proprietary architecture. The focal point was the CPU. From the mid 80s to the mid 20s, the HMI of choice has been a GUI (largely based on Xerox PARC research) or REST APIs, which led to client server and its variations such as the current front-end↔back-end split. This era has been dominated by industry standard servers with a CPU focus. The winner has been the x86 CPU and its ecosystem. Networking, memory, I/O, storage, and datacenters have undergone a tremendous renaissance during this era. 


Moving forward, the interface will be GenAI. It’s no longer going to be highly structured ways of interfacing with computers to retrieve information, but rather human-like communication based on dynamic generation of responses. Both input and output will be based on GenAI. After all, when we talk to humans, we don’t provide inputs through point-and-click screens and view outputs through dashboards. This era will be dominated by accelerated computing where the winner will be the GPU and its ecosystem. This doesn’t mean the CPU disappears. In fact, the CPU will always be needed, it’s just that it will take a back-seat. 

In my mind, there are three key tenets to this new architecture:

  • CPU, memory, I/O, networking, storage, datacenter all have to cater to the GPU and will change in fundamental ways
  • Utilization has to be 100% given the cost of GPUs; utilization has not been a concern so far
  • Use of completely greenfield technology stack

This new world creates massive new opportunities for us (Aarna):

  1. The infra needs to be orchestrated and managed in new ways. In the NVIDIA context that could take the shape of DGX-as-a-service and MGX-as-a-service.
  2. Workload orchestration and management will take a front-row seat given the utilization concern. Sophisticated techniques are required such as bringing in secondary workloads on the same GPU cluster when the primary workload is easing up. The GPU owner may need to sell off GPU capacity to aggregators as a “spot instance” during periods of underutilization. 
  3. Given our use of the Kubernetes-Nephio framework, greenfield is music to our ears. We don’t have to worry about VMs or bare metal instances based on old operating systems.

I’d love to hear your thoughts on these topics. Do reach out to me for a discussion.