Aarna’s Role in Enabling Sovereign GPUaaS Providers in India
Amar Kapadia
September 9, 2024
The government of India (India AI) issued a document titled, “Inviting Applications for Empanelment of Agencies for providing AI services on Cloud.” This document invites in-country GPUaaS providers to bid for sovereign opportunities. It is a detailed and thoughtful document and will no doubt spur innovation at all levels of the AI/ML stack within India.
If you are responding to this invitation or plan to, we would like to congratulate you! However, some of the requirements in sections 6.7 “Admin Portal”, 6.8 “Service Provisioning”, 6.9 “Operational Management”, and 6.12 “SLA Management” are complicated. They essentially require a GPU Cloud Management Software layer. And this cloud management software needs to be up & running in t0 + 6 months.
Let’s explore what your options are since it’s the classic “make” vs. “buy” situation. Here are the pros and cons of these two options.
“Make” Option
Full control of the software with ability to differentiate and customize (it may actually not be possible to differentiate at the IaaS layer, so the differentiation argument might be questionable)
Requires very strong in-house development skills, esp. given the tight development timelines
Matching ongoing feature requirements will get challenging in the long term
“Buy” Option
Get access to a purpose-built 3rd party product
Save cost (since 3rd party will be less expensive than in-house)
Focus precious development resources on AI/ML rather than Infra
Customization will be possible, but might be more difficult than in-house software
If you are going for the “make” option, the rest of this blog is moot. However, if you want to explore the “buy” option, we can help you with the below requirements[1].
Admin portal available within 6 months of LOI
Dynamically manage 1,000+ GPUs
6.7 “Admin Portal”
User registration/account creation
Service catalog and prices
Capacity dashboard
Utilization monitoring
Incident management
Service Health Dashboard
Ability to customize dashboard for the subsidy workflow
6.8 “Service Provisioning”
Online, on-demand instances that can be scaled up/down
Management portal
Public internet access with VPN
Support for BMaaS and VMs
MTTR SLAs and recovery
User notifications
Data destruction (so it cannot be forensically recovered)
6.9 “Operational Management”
Patch management
OS images with latest security patches
Root cause analysis and timely repairs
System usage
6.12 “SLA Management”
SLA measurement and MTTR improvement to meet incident management SLA (99.95% or higher)
Service availability measurement
Finally, to our knowledge, we are the only GPU Cloud Management Software company in the market. If this blog sounds interesting, learn more: