2 of 10

Four Key Areas in AI for Networking

4- Network Infrastructure (Open Source Projects + Vendor solutions) + Domain Data sets The network itself and the data it provides and acts on the learnings from the above layers

1- Applications/AI Use Cases in Networking The new functionality that is made available using AI

2b. AI Models (Generic) The AI capabilities, such as prediction, content generation, anomaly detection, etc.
3- Data and AI infrastructure (computing elements) (Sharing, Governance, Processing) How data is collected and stored. The resources used for processing, running and training the models

2a. AI Models (Domain Specific) The AI capabilities, specific to Networking and Domain

3 of 10

Networking Use case group analysis - LFN Survey results

Network Operations & optimization rank higher, although wide interest in all 6 categories

4 of 10

Focus Areas of LFN - Survey results

Q: Where do you think are the 2-3 things existing LFN projects should focus on in order to accelerate adoption of AI in networking?

5 of 10

The keys for unleashing the power of AI for Networking

High Quality Structured Data - Avoiding “information islands” that cannot be interpreted
AI Trustworthiness - To enable full automation and taking humans out of the equation
Economical marginal cost - The cost for any single organization to build models is too high
Supportable Research Models - Resources must be pools to become cost effective
Contextual Data Sets - Coming from all layers - Application, Security, OSS/BSS, etc.
Community Unity and Standards - To avoid limited “field of view” of a single vendor solution

Open Source Collaboration is the only way to address these challenges

6 of 10

Telco Data Anonymization

The Anuket/Thoth project

7 of 10

The challenge of PII in Telco data sets

Good AI models require high quality network data

Training Telco AI models has to be performed on actual Telco data

Raw Telco data sets contain personally identifiable Information (PII)

Names (Systems, Domain, Individuals, Organizations, Places, etc.)
Address (IP and MAC)
Telco Fields - IMSI, IMEI, MSIN, MSISDN, MCC+MNC
Location Data (GPS, Cell-ID, Count, etc.)

8 of 10

What does the Anuket/Thoth project do?

Agree

on what constitute the ‘sensitive’ data. Agree on the problem set (questions we would want to answer)

Try

available tools (Libraries) and techniques (implementations) on the available datasets.

Find

the gaps in datasets, tools and techniques.

Fill

those gaps considering the problem-set.

Publish

the results.

9 of 10

What are the techniques we are trying?

Natural Language Processing

NLP techniques for the Logs.

Classic Techniques

K-Anonymity, L-Diversity, T-Closeness, Differential Privacy

GANs

Synthetic data generation as a perfect anonymization solution.

Autoencoders

Unsupervised techniques for the anonymization

10 of 10

Questions the Anuket/Thoth project is answering

Can we build a tool that takes in the dataset and anonymizes it automatically using the best technique ?

With no manual intervention

Is there a single technique that is applicable to all kinds of sensitive information?

What kind of sensitive information is well suited for each of the techniques?

Mapping of a type of sensitive information to a technique.

Do we have the datasets, which consists of the all the sensitive information?

Well used
Freely available
Significant size