Aarna.ml

Resources

resources

Blog

Sandeep Sharma

What’s New in EMCO 22.03?
Find out more

The Edge Multi-Cluster Orchestrator (EMCO) open source project, part of the Linux Foundation Networking umbrella, is a software framework for intent-based deployment of cloud-native applications to a set of Kubernetes clusters, spanning enterprise data centers, multiple cloud service providers and numerous edge locations. It can be leveraged for Private 5G, O-RAN, multi-access edge computing (MEC) applications. EMCO has significant industry momentum from companies like Intel, Equinix, Nokia, and Aarna.ml.  

A major benefit of EMCO is extensibility via controllers which perform specific operations. Multiple controllers can be onboarded based on different use cases. Here’s a sample:

  • Cluster Manager - Registers clusters by cluster owners, enables users to onboard target Kubernetes clusters to the platform.
  • Network Manager - If secondary interfaces are required for orchestrating the services and applications through EMCO, this controller creates and manages these secondary networks such as exposing existing physical/provider networks into K8s
  • Distributed Cloud Manager - Presents a single logical cloud from multiple edges. It is used for stitching the clusters onboarded to the platform.
  • Application Config  Manager - Enables distribution of application/CNF configuration across Edges & Clouds.
  • Cert Distribution  Manager - Enrolls CA certificates using tenant specific parent CAs and distributes them across tenant specified K8s clusters.
  • Distributed Application Manager - Orchestrates the complex applications (or network services) with the help of various placement controllers. Works with various action controllers to enable secure communication among the microservices.
    - Hardware Platform Aware Controller - Enables selection of K8s clusters based on microservices hardware requirements.
    - 5GFF EDS Placement Controller - Enables selection of K8s clusters based on latency requirements of application microservices, UE capabilities, 5G Network requirements.
    - Generic Action Controller - Allows the customization of K8s resources of applications. Some customization examples include CPU/Memory limits based on destination cluster type.
    - Secure Mesh Controller - Auto-configures service mesh (ISTIO/Envoy) of multiple clusters to enable secure L7 connectivity among microservices in various clusters. Also, it can configure ingress/egress proxies to allow external L7 connectivity to/from microservices.
    - Secure WAN Controller - Automates fireewall & NAT policies of Cloud native OpenWRT gateways to enable L3/L4 connectivity among microservices and also with external entities.
    - Temporal Controller - Allows a way for third parties to develop workflows that need to be executed with complex application (or network service) life cycle.
    - SFC Controller - Allows the automation of service function chaining of multiple CNFs.
  • Resource Synchronizer & Status Monitoring - Manages instantiation of resources to clusters using various plugins to address clusters from various providers.

EMCO 22.03 Highlights

The EMCO 22.03 release brings several improvements, including:

  • EMCO GitOps Integration

Resources (services and Kubernetes object) created via EMCO, are now GitOps enabled and deployed in target clusters. These additional controllers push all resources to GitOps in a specific directory structure–enabling any pull request to these resources to be reconciled by agents like Flux V2. This allows for a complete continuous deployment (CD) cycle.

  • Modify instantiated logical cloud

There are two kinds of logical clouds – Standard and Admin. Admin logical cloud gives cluster wide authorization to the user. In standard logical cloud user has to specify a name space and resource quotas and permissions about the kind of Kubernetes resources this specific user can access. Until this release, once a logical cloud is created, it could not be modified. In this version, one can modify instantiated logical cloud.

  • Enhanced Status Querying/Notifications

In EMCO, a monitoring agent provides the status of resources (e.g. Kubernetes), deployed in the target cluster along with orchestrated applications. The newly added feature is  subscription for notifications and enhancements in the status API itself

  • New Features Introduced on the Web-Based UI
    -
    RBAC or Role Based Access on the GUI - A way of granting users granular access to Kubernetes based API resources. It implements a security design which provides restricted access to Kubernetes resources based on the role of the user.
    - Standard Logical Cloud - Until now, only the admin logical cloud was supported on the GUI, now the standard logical cloud has also been integrated.
    - Service Discovery on the GUI - Now integrated is the specific sub-controller, underneath the Distributed Traffic Controller, called the Istio traffic subcontroller. When orchestrating applications across multiple clusters, service discovery helps applications in one cluster to reach out to others.. EMCO creates all the Istio resources to make service discovery possible when deploying the applications across multiple clusters.
  • Temporal Workflow Engine

This is a new controller added in EMCO that orchestrates and manages temporal workflows based on use cases.

UI Enhancements: The EMCO 22.03 demo seen here presents a subset of the features of EMCO 22.03 and focussed on the enhancements in the UI. Here you can see how to log in as an admin, onboard the controller, and create a user along with a tenant. EMCO is shown orchestrating two apps -- client and server – across two Kubernetes clusters.

Want to learn more about EMCO? We encourage you to explore the EMCO Wiki, EMCO Repos on GitLab, join the EMCO Mailing list, and attend calls. Send any project related questions to Louis Illuzzi: lilluzzi [at] linuxfoundation [dot] org.

Aarna

Service Orchestration using BPMN with Domain Orchestration using CDS
Find out more

BPMN

Business Process Management Notation (BPMN) is a process modeling standard owned by the Object Management Group (OMG), a standards development organization. BPMN has become the de facto standard for understanding business process diagrams. The stakeholders of BPMN are people who design, manage, and realize business processes. BPMN diagrams are then translated to software process components. It provides businesses with the capability to understand their internal business procedures using a graphical notation and it uses an XML schema-based specification.

Camunda

Camunda is an open-source, Java-based BPMN engine, compliant with the BPMN 2.0 specification, that provides intelligent workflows. Aarna.ml uses a Camunda modeler, which is a UI to design the workflows.

There are different Camunda components:

●     Open source BPMN engine compliant with BPMN 2.0 specifications

●     A Java-based framework providing intelligent workflows

●     A camunda Modeling UI to design workflows

●     A Java-based process engine library responsible for executing BPMN 2.0 processes:

  • Case Management Model and Notation (CMMN 1.1) - this is a standard for modeling cases, while BPMN is a standard for modeling processes
  • Decision Model and Notation (DMN 1.1) decisions - this is used to configure any rules, which are required for the given use cases or for storing any configuration for a given use case.
  • Default support for Javascript & Groovy based tasks

●     Uses relational database for persistence - This stores all state related information.

Camunda Modeler is a UI  for designing workflows with drag and drop notations.

                                Fig 1 - Camunda Modeler

Camunda Process Engine Architecture has multiple components:

  1. Modeler - It is the Camunda UI or desktop application where we design the processes. This comes under the design phase.
  2. Task List -This is an out-of-the-box web application tightly integrated with Camunda. When we deploy any process as a user task, then it can be seen in the Task List.
  3. Cockpit - It is a web application used for process monitoring, e.g., if we have      deployed any process and it is still running, then through the cockpit, we can check the status of the process.
  4. REST API - It is used to interface with external components. Its goal is to provide      access to the engine interface.
  5. Custom  application - Using this we can have our own applications and through REST API, we can access all the engine interfaces.

                                          Fig 2 - Camunda Process Engine Architecture

Engine interfaces connect to the database which stores all the process states.

ONAP CDS Definitions and Concepts

CDS (Controller Design Studio) is a framework which is used to automate the resolution of the resources for instantiation and any configuration provisioning operation. There are different concepts in CDS:

  1. Data Dictionary - A list of parameters that need to be resolved during runtime.
  2. Resolution - Provides value to a configuration parameter during runtime. The source of these values can be through input, set as default, fetched through REST,      SQL and many more.
  3. Configuration Provisioning - Used for complex interaction with south bound APIs. CDS provides us workflows (in ONAP directed graph format) and scripts (Kotlin, Python).
  4. Modeling - Used for defining data dictionaries, resolution mechanism, workflows, and scripts, which are part of provisioning. CDS uses TOSCA and JSON for      modeling and all the model information is stored as CBA (CDS Blueprint      Archive).

Camunda and CDS Interaction Diagram (Synchronous)

In this synchronous process, the actor calls the workflow, using REST API or through the UI. This will call CBA workflow1, which is deployed in the CDS. Thus in runtime we trigger the CBA. CBA returns some value back to the Camunda workflow. Again wecall the CBA workflow2., which returns the response. In the Camunda workflow we can consolidate all the responses we received from the CBA and then return it to the consumer in a particular format as required. Each process will have a unique ID. In each response we will find the response ID and relevant response data.

                            Fig 3 - Camunda and CDS Interaction Synchronous

Camunda and CDS Interaction Diagram (Asynchronous)

The consumer will call a workflow, which would be defined as asynchronous. It immediately returns the response to the user with the Process UUID and in the background it will call the CBA configured here in the workflow and store the responses in the Camunda workflow cache. At the end it will be stored in the relational database. If we want this user to send back this response through some REST API, which the user has exposed, we can send the response.Users can use either standard Camunda REST API to get the status of this workflow and get the response data using standard REST API exposed by Camunda. If we have CBAs which are long running, we will use asynchronous interaction.

                            Fig 4 - Camunda and CDS Interaction Asynchronous

For more information -

https://docs.camunda.org/manual/latest/

https://camunda.com/best-practices/invoking-services-from-the-process/

https://docs.camunda.org/manual/latest/reference/bpmn20/

Below is the link to our webinar on the same topic, it includes the demo as well -

https://www.youtube.com/watch?v=tSMan-0ENy0&t=1099s

Aarna

What’s new in AMCOP 2.3.0 and 2.4.0?
Find out more

Aarna Multi-Cluster Orchestration Platform or AMCOP is an open-source platform for orchestration, lifecycle management, and automation of network services consisting of CNFs, PNFs, and VNFs and MEC applications consisting of Cloud Native Applications (CNAs) . AMCOP can be used for use cases like 5GC orchestration, MEC orchestration, ORAN SMO, and many more.

The new features introduced in AMCOP 2.3.0 are given below:

  • Kubernetes object management (Create/Modify) using GAC controller
  • Multi cluster orchestration support using service mesh
  • High Availability support
  • NWDAF support in AMCOP
  • Open Policy Agent (OPA) support in AMCOP
  • PNF device management support using AMCOP

Kubernetes object management using the Generic Action Controller(GAC) helps in Day 0 and Day N management of Kubernetes objects. During Day 0 configuration, if the user wants to deploy the application across multiple clusters but while deploying the user wants to verify or edit an environment variable that is different across the target clusters, GAC helps specify these details, so that the application can be deployed across multiple target clusters with the environmental variable being different across target clusters. It also helps in Day N configuration. Suppose the application is already running and the user wants to edit any value in the config map, she can do that using the single pane dashboard of AMCOP. Thus manual modification by going into 100s of clusters individually can be avoided.

Multicluster orchestration support using service mesh is achieved through a Distributed Traffic Controller (DTC). If one application running on a target cluster wants to discover and talk to another application running on a different target cluster, first discovery takes place and then network traffic steering takes place. The service entries will help the application running on one target cluster to discover the application running on another target cluster. This is taken care of by the DTC of AMCOP.

As part of high availability support, AMCOP now supports multi-master multi-worker deployments. In case a master node or worker node is down, the workload gets distributed automatically across different worker nodes, which are available and the master node takes care of rescheduling.

AMCOP currently has NWDAF support, which introduces the analytics part of a 5G network. If any Network Function requires analytics information it will connect to NRF; NWDAF also is registered with NRF. Hence NRF fetches the analytics function from NWDAF and sends it to the NF requesting the information. With NWDAF, AMCOP can execute closed loop automation. For example, if the prediction of CPU usage is high, horizontal scale out can be triggered by AMCOP. Thus an appropriate action can be triggered by an incident.

           

Fig 1: NWDAF in AMCOP

The Open Policy Agent (OPA) is a lightweight and powerful CNCF policy engine project, that is included in AMCOP. Using this, the admin can push policies so that the actions required in closed loop automation can be altered by user-defined policies. The policy engine executes the policy to alter the behavior of the closed loop automation.

AMCOP can be used to orchestrate both PNF and CNF based applications (and VNF through Kubevirt). It has a component called CDS (Controller Design Studio) and a Camunda based workflow engine, that help in Day 0 and Day N configuration in PNF devices. Day 0 involves powering up devices, starting workflows and discovery of devices and once the Day 0 configuration is successfully pushed using CDS, Day N configuration can also be executed on PNF devices using AMCOP.

AMCOP 2.4.0 was recently announced the enhancements are as follows:

●     RBAC support (Early access)

●     Log4j vulnerability fixes for AMCOP components

●     Upgrade support from prior a AMCOP release

●     Target cluster monitoring agent (Early access)

●     Performance improvements

Currently, two roles are possible in AMCOP—admin and tenant. The admin role is a superset of tenant role. Default admin credentials are created during deployment of AMCOP. Using these credentials admin can login and change the password and then admin can start creating tenants. Tenants are one level lower than admins in terms of privileges. The admin can see all the tenants but one tenant cannot see the other tenant created by the admin. For example, each business unit in an organization can be given a separate tenant access to only the applications that the tenant should have access to.

This latest version of AMCOP addresses the vulnerability in Log4j ver 2.8, arising from the Log4 shell. Log4 shell is a remote code execution vulnerability using which remote attackers can take control of any device on the internet if the device is running an application based on Log4j. Some of the components of AMCOP uses Log4j library. In version 2.4.0 of AMCOP, this vulnerability has been fixed to make sure AMCOP is safe and secure.

The current version of AMCOP provides one-click upgrade support from prior release, thus avoiding reinstallation of AMCOP involving migration of databases. Now any production server which has AMCOP 2.3.0 can be upgraded to AMCOP 2.4.0 without any data loss.

AMCOP 2.4.0 has a Target Cluster Monitoring Agent. When a user creates a logical cloud a monitoring agent gets deployed all across the onboarded target clusters which are part of the logical cloud. In the earlier version, AMCOP did not have information over the exact state of the application running in the target cluster. With this monitoring agent AMCOP knows the real state of the application or composite application, running as part of the target cluster. AMCOP dashboard now displays live status of the applications.

AMCOP 2.4.0 provides faster deployment and faster upgrade. Thus the performance of AMCOP has improved greatly.

Try AMCOP for free at aarna.ml/amcop.

Aarna

Aarna.ml MWC Barcelona 2022 Demos
Find out more

Aarna.ml, is an open-source software company that enables zero-touch management of 5G networks and edge computing applications. Our flagship product is AMCOP or the Aarna.ml Multi-Cluster Orchestration Platform, which is an open-source orchestration, lifecycle management, closed-loop automation platform for cloud native network services and edge computing applications. It consists of:

  • Linux Foundation Edge Multi-Cluster Orchestrator (EMCO) for intent-based orchestration and Day-0 configuration
  • Linux Foundation Open Network Automation Platform (ONAP) CDS component for Day 1 & 2 configuration and LCM; for the O-RAN SMO use case, this component is supplemented with OpenDaylight MDSAL
  • Kafka, CNCF Open Policy Agent (OPA), along with select ONAP DCAE microservices for analytics and closed-loop automation
  • Numerous CNCF and related projects (Istio, Prometheus, …)
  • Proprietary value adds such as 5G network slicing, O-RAN NONRTRIC, NWDAF

At the recently concluded MWC in Barcelona, Aarna.ml had the following live demonstrations:

1. Joint Demo with AWS and RedHat at AWS Experience Booth in MWC Barcelona

In this demo, it was shown how the Aarna.ml Multi-Cluster Orchestration Platform (AMCOP) was installed on a Red Hat OpenShift Service on AWS (ROSA) cluster and how AMCOP deployed a 5G core network service on another Kubernetes cluster running on an AWS EC2 instance. ROSA provided a way to accelerate application development by leveraging familiar OpenShift APIs and tools for deployments on AWS. AMCOP was one such application that could be installed on ROSA with a Kubernetes Operator. AMCOP in-turn deployed cloud-native network services and edge computing applications to other ROSA, AWS EKS, or Kubernetes on AWS EC2 clusters.

Read the Press Release to know more - https://bit.ly/3t8r3ww

See the recorded demonstration - https://youtu.be/gqF6oasNyeM

2. Joint Demo with Quanta Cloud Technology at the QCT Booth in MWC Barcelona

In this demo, the scaleout of QCT 5G Core was demonstrated across two Kubernetes clusters by using AMCOP. The demo focussed on the following:

  • Installation of AMCOP on Kubernetes cluster #1
  • Creation and registration of Kubernetes clusters #2 & #3  onto AMCOP
  • Onboarding of QCT 5GC CNFs onto AMCOP
  • Creation of a 5GC Network Service with onboarded CNFs
  • Orchestration of the 5GC network service onto target cluster #2 by specifying placement intents
  • Orchestration of Prometheus which scrapes CPU load information from cluster #2
  • Egress of CPU load data from Prometheus to AMCOP
  • Automatic scale-out of AMF network function into cluster #3 triggered by AMCOP when the CPU load exceeds the threshold

See the recorded demonstration - https://youtu.be/8p82GErOkm8

3. Free5gc CORE with multiple Anchor UPF orchestration via AMCOP at the TelcoDR Cloudcity Booth in MWC Barcelona

In this demonstration, we showed orchestration of Free5GC Core using AMCOP across multiple Edge Clusters with Uplink Classifier (ULCL) mode enabled and then tested it with UERANSIM (Simulator for End Device).

See the recorded demonstration - https://www.youtube.com/watch?v=qXI5P1m592g&list=PLyQ7hs1Psze4zaBPdRUgY5zL6U_PQgB2N&index=1

4. Multidomain orchestration using Terraform & ONAP CDS at the TelcoDR Cloudcity Booth in MWC Barcelona

In this session, we showed how EMCO can be integrated with other open-source projects: Terraform, Camunda workflow engine & ONAP CDS, to perform multidomain orchestration of cloud and edge services.

See the recorded demonstration - https://www.youtube.com/watch?v=gO_liMAxuRs&list=PLyQ7hs1Psze4zaBPdRUgY5zL6U_PQgB2N&index=2

Aarna.ml had also joined the Oracle for Startups Team at Mobile World Congress in the 4YFN hall. Read the blog to know more - https://blogs.oracle.com/startup/post/startups-mobile-world-congress

Aarna

Join Aarna.ml at the Linux Foundation Networking Developer & Testing Forum - Jan 2022
Find out more

The Linux Foundation Networking Developer & Testing Forum is being held from Jan 10-13, 2022. In this event, various LFN project technical communities will demonstrate and present their progress; discuss project architecture, direction, and integration points; and will explore possibilities of further innovation through the open source networking stack (Register and explore the schedule). Aarna.ml is participating in four discussions. Below is a brief description of these sessions:

1. ONAP: CDS Error Handling in Production deployments

Time - 11th Jan, 2022, 19:30 - 20:00 IST

Speakers - Vivekanandan Muthukrishnan and Kavitha P.

Description - Handling various error scenarios in a real-life production deployment of CDS.

Slides & Recording

2. EMCO: Orchestration of Magma

Time - 13th Jan, 2022, 18:30 - 19:00 IST

Speakers - Yogendra Pal and Rajendra Prasad Mishra

Description - In this session, we will show how Magma core (Access Gateway & Magma controller/orchestrator) can be deployed on the Kubernetes cluster.

Slides & Recording

3. EMCO: Service Upgrade/Update using GUI

Time - 13th Jan, 2022, 19:00 - 19:30 IST

Speakers - Sandeep Sharma Vikas Kumar

Description - In this session, we show how a network service consisting of multiple cloud-native functions can be updated or upgraded using EMCO GUI (which can also be done using the REST API).

Slides & Recording

4. EMCO: Multidomain Orchestration using Terraform & ONAP CDS

Time - 13th Jan, 2022, 19:30 - 20:00 IST

Speakers - Vivekanandan Muthukrishnan Oleg Berzin

Description - Integration of EMCO with Terraform, Camunda workflows & CDS, to perform multidomain orchestration. In this session, we will show how EMCO can be integrated with other open-source projects Terraform, Camunda workflow engine & ONAP CDS, to perform multidomain orchestration of cloud-native functions.

Slides & Recording

Sriram Rupanagunta

NWDAF Rel 17 Explained - Architecture, Features and Use Cases
Find out more

5G System is expected to be AI capable for optimal allocation and usage of network resources. Analytics functionality of 5G Network System are separated from other core functionalities to ensure better modularisation and reach. Network Data Analytics Function or NWDAF with its 3GPP compliant interfaces provide data analytics on 5G Core. NWDAF Rel 15 did not see much adoption because of the unavailability of data and the specifications in 3GPP was not fully defined with respect to NWDAF. At present with 5G deployments kicking off and 3GPP standardizing all necessary specifications, complete implementation of NWDAF is possible.  The architecture of NWDAF is defined in 3GPP Specifications TS 23.288 and the detailed specification with APIs, etc. is defined in 3GPP Specification TS 29.520.

Release 17 specifies two separate NWDAF functions

  • Analytics Logical Function (AnLF)
  • Model Training Logical Function (MTLF)

NWDAF is responsible for data collection and storage required for inference, but it can use other functions to achieve the same. A typical NWDAF Use Case consists of one or more machine learning models. Building a machine learning model is an iterative process. Data scientists experiment with different models and different data sets. Even after deployment, it requires constant monitoring and retraining. A typical use case consists of many ML models and overlapping data is fed into these ML models.

                     

Fig 1: NWDAF Architecture

NWDAF is different from other NFs in the 5G Core because -

  1. Requirement      of retraining, when we consider an NF and once we deploy it we don't      expect its behaviour to change. But a ML model is different, it is tightly      coupled to the data it was trained on. So if data patterns change from      trial runs to actual environments, the ML model might behave differently.      This is called Data Drift.
  2. Requirement      of historical data, NFs just need the current state of the machine, but a      ML system analyses historical data to derive the future values.

NWDAF is an API layer that provides a standard interface for other network elements to get analytics. The consumer can be an NF/OAM/AF. Consumers can subscribe to analytics through NWDAF. Each NWDAF will be identified by an analytics id and area of interest. Single NWDAF can provide multiple ids also. Managing the huge data set and different NWDAF ML Models need to ensure that there is no duplication of effort for collecting the data and storing the data. The Area of Interest is the geographical location that the NF belongs to. Since UE is mobile it can move from one area of interest to another.

Features of NWDAF -

1. Aggregation
Fig 2: Aggregation supported by NWDAF

                     

There are different types of Aggregation that NWDAF can do -

●     Aggregation based on Area of Interest -

Each Area of Interest can have separate Areas of Interest. In some cases the analytics consumer might require a larger area of interest. In the example above the consumer requires three areas of interest. Some NWDAF can act as an aggregator by collecting data from other NWDAF associated with other areas of interest, and send a single aggregated result to the consumer.

●     Aggregation based on Analytics -

A Use Case can be made up of different types of Use Cases. In the example above the NWDAF with AID 3 is made up of NWDAF with AID 1 and NWDAF with AID 2, by means of a logic. This kind of aggregation is called Analytics Aggregation.

  1. Analytics Subscription Transfer

One NWDAF can transfer subscriptions to another NWDAF. For example if an Analytics Consumer is getting analytics data of a UE through a NWDAF associated with a particular Area of Interest. Now if the UE moves to another Area of Interest, then the NWDAF associated with the new Area of Interest will continue sending analytics data to the consumer, as the first NWDAF will transfer the subscription of the consumer to the second NWDAF. This also becomes handy when NWDAF undergoes graceful shutdown or performs load balancing.

MLOps - the complete picture

MLOps comprises practices for deploying and maintaining machine learning models in production networks.The word “MLOps” is a compound of "machine learning" and “DevOps” It includes the following components which are also the prerequisites for building a NWDAF platform -

●             Configuration Module

●             Data Collection Module on the Core/Edge (should be as per 3GPP standards)

●             Data Collection Long Term Module

●             Data Verification Module

●             Machine Resource Management

●             Feature Extension Module

●             Analysis Tools

●             Process Management Tools

●             Data Serving Module (should be  as per 3GPP Standardisation)

●             Monitoring Module

●             ML Code

               

Fig 3: MLOps Introduction

The group of standard functions that are defined by 3GPP for supporting  data analytics in 5G Network Deployments -

●             NWDAF- AnLF - Analytical Logical Function

●             NWDAF - MTLF - Model Training Logical Function

●             DCCF - Data Collection Coordination (& Delivery) Function

●             ADRF - Analytical Data Repository Function

●             MFAF - Messaging Framework Adaptor Function

             

Fig 4 : Complete Loop

The NF/OAM/AF which is the Analytical Consumer, requests for Analytics from the NWDAF directly or through DCCF. NWDAF is divided into two functions - AnLF and MTLF. The Analytical Logical Function (AnLF) of NWDAF is responsible for collecting the analytical request and sending the response to the consumer. AnLF requires the model endpoints, which is provided by the MTLF (Model Training Logical Function). NWDAF MTLF trains and deploys the model inference microservice. Now the AnLF requires historical data that the Model Microservice requires for prediction, For this it requests the DCCF (Data Collection Coordination and Delivery Function). DCCF is the central point for managing all the data requests. If any other NF has already requested the same set of data and this data is available, DCCF directly sends it to NWDAF. Otherwise the DCCF initiates a data transfer from the Data Provider. It also initiates the data transfer with the data provider. Then the data transfer will actually happen between the MFAF (Messaging Framework Adaptor Function) and ADRF (Analytical Data Repository Function). ADRF stores the historical data that is required. Now the DCCF will pass on the data to the NWDAF’s AnLF. Now the NWDAF’s AnLF can request prediction from the Model Microservice. Now the NWDAF will construct the response in 3GPP format and pass that prediction back to the NF or Analytical Consumer.

Data Collection -

The data collection for analytics by NWDAF happens in 3 levels -

●     For Feature Engineering, Analysis and Offline Training, data is collected for the long term and  can be stored in  Data Lake/ Data Warehouse.

●     The data required for Online Training (which is managed by MTLF) can be collected in ADRF.

●     Data required by AnLF for model inference may come from ADRF/NF/OAM. This data will be shorter-term, like a few hours of data.

Model Serving & MTLP -

To understand MTLP we need to know what model it is serving. Models basically contain code or trained parameters. But for applications to use this we need to convert this into a microservice. So that the result of analytics is available as an end point to the application. So different frameworks are available for this like TF Serving (TensorFlow Model Serving), TorchServe Framework, Triton Inference Server (NVIDIA’s framework) and the Acumos AI.

The following is the input formats that MTLP accepts -

●             ML Code - Online Training

●             Saved Models - include code and trained parameters - this is the most popular way to share the pretrained models.

●             Container Images

Model Monitoring and Feedback -

ML Model’s performance may decay over the time, this can impact the performance of the system negatively like over allocating the resources or affecting the user experience. So continuous self-monitoring and re-training is required.Re-training with newer data can be managed by MTLP or outside the edge/core. Redesigning of the ML Model may be required in certain cases. MTLP needs to send a trigger to the model management layer when retraining is no more effective in MTLP.

ML Pipeline - NWDAF Interaction

             

Fig 5 : ML Pipeline

The part in the cloud is the ML Pipeline. To connect the cloud with the components in edge, three interfaces are important (the ones drawn in blue lines). The first is the Model Deploy interface that is required to push the Model from the Cloud Layer to MTLF. The second is the Model Feedback interface which MTLF uses for sending the feedback to the upper layer. The third is the Pull Data interface, which is required to send the data for ML training and will be stored in the Data Warehouse/ Data Lake.

The data in the data warehouse should be easily accessible for experiments by data analysts, hence it is present in the cloud. ETL/ELT is the step where data is Extracted, Transformed and Loaded to the storage. In some cases when the endpoint is data lake the data is loaded without any transformation. Data collection from the source is done in batches. Nowadays streaming of data is getting more traction. NWDAF is designed in such a way that even streaming can be used. Which data is to be uploaded or which data is required for the ML experiment is a complex subject. Uploading all data can lead to unnecessary use of bandwidth. Also there are Data Protection Regulations of different geographical areas where the edge is located. Hence this component should be designed very carefully.

Distributed System Platform -

Fig 6 : Distributed System Platform

The components of NWDAF are treated in a similar way as Network Functions by the underlying platform and they are installed in a distributed manner. The NWDAF should have minimal dependency on the underlying platform of the ecosystem. 

NWDAF SDK -

The platform provides SDK/Framework for developing AnLF & MTLF, this reduces coding effort for MLOps Engineers. The SDK framework should hide the complexity of the 3GPP standard from the developer. The SDK should also support advanced NWDAF features like aggregation, subscription transfer. 

3GPP Release 17 Timeline -

  • Q3 2021 - We can expect the Architecture to freeze

  • Q2 2022 - We can expect a Stage 3 freezing with detailed definitions and APIs

  • Q3 2022 - We can expect the protocol codes to be freezed.