1 of 48

xLLM: New Generation of Large Language Models for Enterprises��Vincent Granville, PhD�Chief AI Scientist �GenAItechLab.com�vincentg@mltechniques.com� �May 29, 2024

No-GPU Full Context Secure Multi-LLM

with Real-Time Fine-Tuning & Explainable AI�

MLtechniques.com – xLLM, by Vincent Granville

2 of 48

Agenda�

MLtechniques.com – xLLM, by Vincent Granville

Why xLLM? What is it?
xLLM Innovative Features
xLLM Architecture and Evaluation
xLLM for Clustering, Data Synthetization, Predictive Analytics
What is coming UP next?
References

3 of 48

Part 1�Why xLLM? What is it?��

MLtechniques.com – xLLM, by Vincent Granville

4 of 48

Extreme LLM (xLLM) in a Nutshell

Mixture of experts

Specialized sub-LLM and/or sub-LLMs for authorized users
LLM router to manage the sub-LLMs
User selects sun-LLM, agent, and hyperparameters
Each sub-LLM built with its own taxonomy and knowledge graph

No neural network, no training

Thus, low cost, easy to fine-tune in real-time
Self-tuned based on favorite hyperparameters, and customizable
No GPU, no latency, exhaustive concise results, local implementation

Concise results

Multiple sections displayed to user: links, related content, x-embeddings
Output with relevancy score attached to each item in each section; User offered choices for deeper or alternate queries
Great for search, for professional users and experts. Not just a “prompt box”; many options in the UI, like a mini browser

Case studies

Corporate corpus with augmented sources (content + taxonomies)
Wolfram corpus: 15 sub-LLMs, 500 sub-categories per sub-LLM
Publisher, 4000 titles: clustering, predicting article performance

MLtechniques.com - xLLM, by Vincent Granville

xLLM mission is to enable Enterprises to build their on LLMs that fits their purpose with precision, faster, cheaper with security as well open to integrate with any other LLMs…

5 of 48

Prompt Results – Card Format (web API)

MLtechniques.com - xLLM, by Vincent Granville

6 of 48

Prompt Results – Card Format (web API)

MLtechniques.com - xLLM, by Vincent Granville

7 of 48

Prompt Results – Listing Format (1)

MLtechniques.com - xLLM, by Vincent Granville

8 of 48

Prompt Results – Listing Format (2)

MLtechniques.com - xLLM, by Vincent Granville

9 of 48

Prompt Results – Text Format

Text entities retrieved from corpus via context chunking / indexation

Blended with images, datasets, URLs and so on (multimodal)
Knowledge graph elements included: categories, tags, related content, agents

Generating Multi-language output (prose) with GenAI

Coming soon, different from simple text retrieval
Turn output into English summary; accented characters already implemented for other languages
Pre-made customizable synthetic answers (template answers)
Blending AI with classic ML: integrating external tools with large list of pre-made, customizable template sentences to display in prompt results

MLtechniques.com - xLLM, by Vincent Granville

10 of 48

xLLM Integration with other APIs and LLMs

Leverage and blend capabilities of multiple LLMs (GPT, Perplexity, Mistral, etc..)
Use external GenAI tools or libraries to turn output into nice, fluid English text
CodeValet API for code generation
Wolfram/Mathematica API to solve math problems

MLtechniques.com - xLLM, by Vincent Granville

11 of 48

Part 2 �xLLM Innovative Features��

MLtechniques.com – xLLM, by Vincent Granville

12 of 48

Backend Features

Smart crawling to retrieve embedded structure

Breadcrumbs (enterprise corpus), concept associations (related links)
Metadata, tags, taxonomy (category graph)
Augmented with user prompts
Augmented with PDFs (TOC, index, glossaries, synonyms, titles)

X-embeddings

Variable-length embeddings stored as sparse nested hashes
Multi-token: “data~science” on top of single tokens “data” and “science”
Contextual token: “data^science”, both words in same paragraph but not adjacent
PMI (pointwise mutual information) instead of dot product / cosine distance
Parametric weights attached to tokens (no loss function to optimize)

MLtechniques.com - xLLM, by Vincent Granville

13 of 48

Retrieved Taxonomy: Wolfram Example

MLtechniques.com - xLLM, by Vincent Granville

14 of 48

Retrieved Context: Enterprise Example

MLtechniques.com - xLLM, by Vincent Granville

15 of 48

Backend Features (Cont.)

Home-made libraries

Issues with Python libraries (singularize, autocorrect, “Feller” changed to “seller”)
Minimize stemming and text transforms; keep plural if found in corpus
Important: accented characters, separators (punctuation), capital letters
Ad-hoc lists: home-made stopwords, do-not-singularize, do-not-autocorrect

Backend tables (specific to each sub-LLM)

X-embeddings not the most important table; taxonomy more important
Compression mechanism: sorted n-grams
Backend parameters

MLtechniques.com - xLLM, by Vincent Granville

16 of 48

Backend Features (Cont.)

Chunking & Indexing

Chunks called text entities: webpage, subsection (PDF), or JSON entity
Indexed for fast retrieval of full content, and for easy content linking
Chunks of variable length

Python with workarounds + homemade
Weighted graph tokens: multi-tokens found in the context/taxonomy elements
Customized pointwise mutual information (PMI), instead of cosine similarity

MLtechniques.com - xLLM, by Vincent Granville

17 of 48

Backend Features (Cont.)

Augmentation

Easy integration of external sources, tested on corporate corpus
External content flagged via tags or other context elements
User told if piece of output is internal or external
Taxonomy augmentation

Agents

Assigned post-crawling to text entities via clustering, for easy matching with prompt
Different from standard implementations (bottom up rather than top down)

Content Deduping

MLtechniques.com - xLLM, by Vincent Granville

18 of 48

Frontend Features

User Interface

Many options, not just a search box (see previous slide)
User can choose agents, sub-LLM, or fine-tuning in real time
End-user debugging with catch-all parameter set

Relevancy scores

Goal: too many results to show to user prompt, which ones to display?
Graph tokens and multi-tokens with 2+ words boost score
Text entity with 2+ multi-token intersection with prompt, get higher score
Rare multi-tokens get extra boost
Longer text entities get extra boost

MLtechniques.com - xLLM, by Vincent Granville

19 of 48

Relevancy scores

MLtechniques.com - xLLM, by Vincent Granville

20 of 48

Frontend Features (Cont.)

Distillation

If multi-tokens A~B~C and A~B have same count, show results from A~B~C, not A~B

Acronyms and synonyms

If A and B are synonyms, A in prompt but not in corpus, and B in corpus, map A to B in the prompt to retrieve B in the corpus (Goal: trying to be exhaustive)

Self-tuning – Most popular front-end parameters used to build default parameters
Prompt cleanup with stopwords list different from backend list
Disambiguation (coming soon)

MLtechniques.com - xLLM, by Vincent Granville

21 of 48

Distillation

MLtechniques.com - xLLM, by Vincent Granville

22 of 48

Part 3�xLLM Architecture and Evaluation��

MLtechniques.com – xLLM, by Vincent Granville

23 of 48

Backend: Overview

MLtechniques.com - xLLM, by Vincent Granville

24 of 48

Frontend: Overview

MLtechniques.com - xLLM, by Vincent Granville

25 of 48

Path from Prompt to Results

MLtechniques.com - xLLM, by Vincent Granville

26 of 48

Path from Crawl to Backend Tables

MLtechniques.com - xLLM, by Vincent Granville

27 of 48

Details: Indexation

MLtechniques.com - xLLM, by Vincent Granville

28 of 48

Detail: Relevancy Algorithm

MLtechniques.com - xLLM, by Vincent Granville

29 of 48

Detail: Sorted N-Grams

MLtechniques.com - xLLM, by Vincent Granville

30 of 48

Database: Nested Hashes (like JSON)

MLtechniques.com - xLLM, by Vincent Granville

31 of 48

Evaluation

User-based (automated)

Collect favorite hyperparameters chosen by users
Use smart grid search to set default hyperparameters based on user favorites
Fine-tune on one or few sub-LLMs (like LoRA) before full optimization on (say) 200 sub-LLMs. You may fine-tune all sub-LLMs in parallel.

Taxonomy-based (automated)

Pretend that the taxonomy backend table comes from external sources
Assign categories to webpages based on this “external” taxonomy
For each webpage, compare externally assigned to native category

Evaluation challenges

We are dealing with unsupervised learning: there is no perfect output except for trivial cases
Quality depends on user (professional users and laymen have different criteria)
How do you measure exhaustivity, depth, and recency?
Value of output versus grammatical capabilities
How do you integrate xLLM relevancy scores attached to each item, to evaluate output quality? No other LLM return these scores

MLtechniques.com - xLLM, by Vincent Granville

32 of 48

Taxonomy-Based Evaluation

MLtechniques.com - xLLM, by Vincent Granville

33 of 48

Part 4 �xLLM for Clustering, Data Synthetization, Predictive Analytics��

MLtechniques.com – xLLM, by Vincent Granville

34 of 48

Interlude – Adaptive Loss Function (ALF)

Adaptive loss function converging to model evaluation metric

Boosts quality measured using model evaluation, reduces gradient descent failures

MLtechniques.com - xLLM, by Vincent Granville

35 of 48

xLLM for Data Synthetization (with ALF)

MLtechniques.com - xLLM, by Vincent Granville

NoGAN Tabular Data Synthetization

Real data: 2 concentric circles
Synthesized, NoGAN synthesizer: blue dots. Constrained synthetization to keep loss above some threshold
As the loss function gets more granular, the synthesized data gets more similar to the real data (the training set)

36 of 48

xLLM for Predictions

Case study – media industry

Predicting article performance (pageviews) based on title keywords and category
4000 articles; pageview is normalized and time-adjusted

Evaluation and Loss function (identical)

Based on comparing predicted with observed quantiles, using 5 quantiles (see code)
Good proxy to Kolmogorov-Smirnov distance

MLtechniques.com - xLLM, by Vincent Granville

37 of 48

xLLM for Predictions – Model

MLtechniques.com - xLLM, by Vincent Granville

38 of 48

xLLM for Predictions – Category Encoding

Create new codes sequentially as you browse the training set.
Aggregate codes with few observations into bundles.
Create two key-value mappings. Ex:

Category_to_Code[‘Blog’, ‘William’] = 5
Code_to_category[5] = [‘Blog’, ‘William’]

Replace the categorical features by the newly created feature, “Code”.
Number of codes ≤ number of obs.

MLtechniques.com - NoGAN Synthesizer, by Vincent Granville

39 of 48

xLLM for Predictions – Results

Observed vs predicted normalized pageview count

MLtechniques.com - xLLM, by Vincent Granville

40 of 48

xLLM for Clustering

Case study – media industry

Identifying patterns / clusters in popular articles based on title keywords
4000 articles; pageview is normalized and time-adjusted

Methodology

Group multi-tokens into clusters based on a similarity metric, with hierarchical clustering and k-medoids
Let S(t) be the set of articles containing the multi-token t in the title
For each multi-token group G, the list L(G) of articles belonging to G is

MLtechniques.com - xLLM, by Vincent Granville

41 of 48

xLLM for Clustering (Cont.)

Similarity between two multi-tokens t₁, t₂

Remarks

Multi-token clusters are non-overlapping, but article clusters may overlap
Sklearn clustering methods require a distance matrix as input; the matrix (derived from the similarity metric) is huge but extremely sparse.
In my implementation, s(t₁, t₂) is computed and stored only if it is strictly positive. Using connected components for clustering, it is far more efficient than Sklearn.

MLtechniques.com - xLLM, by Vincent Granville

42 of 48

xLLM for Clustering – Sample Structure

MLtechniques.com - xLLM, by Vincent Granville

43 of 48

xLLM for Clustering – Sample Cluster

Cluster of popular articles linked to multi-token cluster with 3 elements, including one contextual multi-token: “Machine^vs” (pv stands for normalized pageview)

MLtechniques.com - xLLM, by Vincent Granville

44 of 48

Interlude – Fast Nearest Neighbor Search

Red dot: prompt-derived embeddings
Blue dot: backend table embedding
Over time, arrows link red dots to their nearest blue dots
Alternative to vector search

MLtechniques.com - xLLM, by Vincent Granville

45 of 48

xLLM for Next Token Prediction

MLtechniques.com - xLLM, by Vincent Granville

Next token prediction: the mother of all LLMs
Here: predict next DNA sub-sequence to generate synthetic genomic data
Alphabet has 4 letters
Left: Scatterplot comparing observed vs synthetic ECDFs

46 of 48

Part 5 �References��

MLtechniques.com – xLLM, by Vincent Granville

47 of 48

References

New book: “Building Disruptive AI & LLM Technology from Scratch”
First book: “State of the Art in GenAI & LLMs – Creative Projects, with Solutions”

Project 2.4 – Adaptive loss function
Project 7.2 – Main part, includes smart crawling and x-embeddings
Project 8.1 – Fast approximate nearest neighbor search
Project 8.2 – Evaluation using taxonomy
Project 8.3 – xLLM for clustering and predictions

GitHub: code, data: https://github.com/VincentGranville/Large-Language-Models
AI Research and book access: https://mltechniques.com/resources/

MLtechniques.com - xLLM, by Vincent Granville

1 of 48

2 of 48

3 of 48

4 of 48

5 of 48

6 of 48

7 of 48

8 of 48

9 of 48

10 of 48

11 of 48

12 of 48

13 of 48

14 of 48

15 of 48

16 of 48

17 of 48

18 of 48

19 of 48

20 of 48

21 of 48

22 of 48

23 of 48

24 of 48

25 of 48

26 of 48

27 of 48

28 of 48

29 of 48

30 of 48

31 of 48

32 of 48

33 of 48

34 of 48

35 of 48

36 of 48

37 of 48

38 of 48

39 of 48

40 of 48

41 of 48

42 of 48

43 of 48

44 of 48

45 of 48

46 of 48

47 of 48

48 of 48