3 of 63

Risks Specific to Enterprise LLMs (1/3)

Call to external APIs (OpenAI), data leakage
Reliance on Blackbox systems (transformers, DNN)

Hard to train, requires billions of tokens, GPU farms, lots of electricity
Hard to distill
Yet training to predict the next token is outdated
Charging by token incentivizes vendors to use more tokens
Analogy: pay-per-click monetization in 2010

Algorithmic bias
Prompt injection
Costly mistakes or hallucinations
No relevancy scores displayed to user
Not scoring input sources

MLtechniques.com - xLLM, by Vincent Granville

4 of 63

Risks Specific to Enterprise LLMs (2/3)

Not all users should have access to the full corpus

Solution: sub-LLMs and even chunks with restricted access

Failure to provide precise references to source

Highly re-worded response magnifies the problem, causes hallucinations

Most secure solutions are on-premises with full control by client
Faulty evaluation metrics

LLM as a judge: circular loop
Do not capture exhaustivity and other qualities
Use of synthetic prompts
Evaluation is user-dependent

MLtechniques.com - xLLM, by Vincent Granville

5 of 63

Risks Specific to Enterprise LLMs (3/3)

Issues with Python libraries

Autocorrect, stopwords, stemmers (global vs. local)
Inability to sample outside the observation range

Failure to provide precise references to source

Highly re-worded response magnifies the problem, causes hallucinations

Poor QA

No action taken following consistent poor user ratings to responses

Failure to connect parts of the corpus to your LLM

Execution without consulting with the right people

Hard to debug and fine-tune

Much easier with our architecture

MLtechniques.com - xLLM, by Vincent Granville

6 of 63

A Better UI for LLMs (1/2)

User can choose:

sub-LLMs,
Categories,
Tags

Search options:

Exact or broad match
Negative keywords
Search by recency

Offer real-time fine tune and re-training, with intuitive parameters for

Distillation, stemmer / un-stemmer, embeddings, relevancy scores, …

MLtechniques.com - xLLM, by Vincent Granville

7 of 63

A Better UI for LLMs (2/2)

Lightning-fast testing, training, debugging and in-memory LLMs, from UI

Nested hashes (JSON-like)
Variable-length embeddings instead of vector DB and dot products
In addition to standard response, offer structured output (summary boxes + alternate queries) to reduce hallucinations and prompt engineering, with relevancy scores and precise references to corpus
Chunks pre- and post-tagging
Hierarchical chunking with option to browse corpus

Minimize re-wording in final response (important e.g. for legal documents)

MLtechniques.com - xLLM, by Vincent Granville

8 of 63

Elements of our architecture (1/4)

MLtechniques.com - xLLM, by Vincent Granville

9 of 63

Elements of our architecture (2/4)

MLtechniques.com - xLLM, by Vincent Granville

10 of 63

Elements of our architecture (3/4)

MLtechniques.com - xLLM, by Vincent Granville

11 of 63

Elements of our architecture (4/4)

MLtechniques.com - xLLM, by Vincent Granville

12 of 63

Elements of our architecture

MLtechniques.com - xLLM, by Vincent Granville

13 of 63

Elements of our architecture

MLtechniques.com - xLLM, by Vincent Granville

14 of 63

Our Team

DANILO NATO

CEO & CO-FOUNDER

+17 Years of Experience

AB InBev, Global AI Director

BASF, LaTam

Degrees in computer science, business. Masters in stats and psychology

VINCENT GRANVILLE

CHIEF AI ARCH & CO-FOUNDER

Successful exit, Data Science Central sold to Tech Target (2020)

Microsoft, Wells Fargo, Ebay, Visa, NBC

Created xLLM - LLMs 2.0

PhD in image remote sensing, postdoc at University of Cambridge

PETER VOGT

VP OF SALES

Experienced Sales Executive across industry

Sales Domain expertise in Data, AI, Cloud, Cybersecurity

EDUARDO SOARES

CTO

+15 Years in Software Engineering, Data, AI

Worked major companies in LatAm

Founder CodeNato

Degrees in computer science

ANI DESWANDIKAR

PRODUCT LEAD

+30 Years of Experience in Software, Data and AI

+10 years at Microsoft, Principal Architect

Netflix, Sr Software Engineer

SADIAH ZAHOOR

AI LEAD

Phd University of Cambridge

Experienced Researcher Cambridge, TATA Institute, Ministry of Defense India

FERNANDO GONCALVES

ENGINEER LEAD

+20 Years of Experience in Data, ML

Boticario, Data Engineer lead

GAVB, Data Scientist lead

15 of 63

Part 1�Why xLLM? What is it?��

MLtechniques.com – xLLM, by Vincent Granville

16 of 63

Extreme LLM (xLLM) in a Nutshell

Mixture of experts

Specialized sub-LLM and/or sub-LLMs for authorized users
LLM router to manage the sub-LLMs
User selects sun-LLM, agents, and hyperparameters
Each sub-LLM built with its own taxonomy and contextual environment

No neural network, no training

Thus, low cost, easy to fine-tune in real-time, in-memory LLM, on-premises
Self-tuned based on favorite hyperparameters, intuitive parameters
No GPU, no latency, exhaustive concise results, local implementation

Concise results

Multiple sections: links, related content, x-embeddings based on E-PMI metric
Output with relevancy score attached to each item in each section; User offered choices for deeper or alternate queries
Great for professional users. Not just a “prompt box”; many options in the UI, like a mini-browser

Case studies

Corporate datal lake corpus
Nvidia PDF repository
Wolfram corpus: 15k webpages, 5k categories
Publisher, 4000 titles: clustering, predicting article performance

MLtechniques.com - xLLM, by Vincent Granville

xLLM for Enterprises to build own LLMs faster, at lower cost, with increased accuracy, security, explainable AI

17 of 63

Prompt Results – Card Format (web API)

MLtechniques.com - xLLM, by Vincent Granville

18 of 63

Prompt Results – Card Format (web API)

MLtechniques.com - xLLM, by Vincent Granville

19 of 63

Prompt Results – Listing Format (1)

MLtechniques.com - xLLM, by Vincent Granville

20 of 63

Prompt Results – Listing Format (2)

MLtechniques.com - xLLM, by Vincent Granville

21 of 63

Prompt Results – Structured Text Format

Text entities retrieved from corpus via contextual chunking / indexation

Blended with images, datasets, exact URLs/references and so on (multimodal)
Featuring categories, tags, related content, titles, timestamps, links, and so on
Structured output based on hierarchical chunking and multi-indexing, augmented with auto-tagging, acronyms / synonyms dictionary, and un-stemming
Multiple relevancy scores based on multiple types of multi-tokens, then normalized
Exact vs broad search, negative keywords, variable weights attached to prompt multi-tokens, search by recency.
Auto-correct, stemmer, stopwords specific to corpus; “real estate” or “San Francisco” are single tokens. Chunks visibility (for any given chunk) depends on user privileges.

MLtechniques.com - xLLM, by Vincent Granville

22 of 63

Prompt Results – Response Generation

Based on structured output (see previous slides)

Generic template prompts with premade auto-filled response matched against user prompts
Light, fast proprietary DNN, no TensorFlow, Python or Keras, explainable AI.

Proprietary non-Blackbox DNN with explainable AI (patent-pending)

With equalizer, stabilizer and other proprietary features to accelerate convergence and increase stability
Chaotic gradient descent with temperature decay
Sub-epochs, sub-layers, original universal function
Global optimization or one sub-layer at a time within an epoch
Pre-tabulated functions, generic partial derivatives

MLtechniques.com - xLLM, by Vincent Granville

23 of 63

DNN Weight Ghosting and Watermarking (1)

MLtechniques.com - xLLM, by Vincent Granville

24 of 63

DNN Weight Ghosting and Watermarking (2)

MLtechniques.com - xLLM, by Vincent Granville

25 of 63

Part 2 �xLLM Innovative Features��

MLtechniques.com – xLLM, by Vincent Granville

26 of 63

Backend Features

Smart crawling to retrieve embedded structure

Breadcrumbs (enterprise corpus), concept associations (related links)
Metadata, tags, taxonomy, long contextual environment
PDF parser (TOC, index, glossaries, synonyms, titles, tables, images)

X-embeddings

Variable-length embeddings stored as sparse nested hashes
Multi-token: “data~science” on top of single tokens “data” and “science”
Contextual token: “data^science”, both words in same paragraph but not adjacent
PMI (pointwise mutual information) instead of dot product / cosine distance
Parametric weights attached to tokens (no loss function to optimize)

MLtechniques.com - xLLM, by Vincent Granville

27 of 63

Retrieved Taxonomy: Wolfram Example

MLtechniques.com - xLLM, by Vincent Granville

28 of 63

Retrieved Context: Enterprise Example

MLtechniques.com - xLLM, by Vincent Granville

29 of 63

Backend Features (Cont.)

Home-made libraries

Issues with Python libraries (singularize, autocorrect, “Feller” changed to “seller”)
Minimize stemming and text transforms; keep plural if found in corpus
Important: accented characters, separators (punctuation), capital letters
Ad-hoc lists: home-made stopwords, do-not-singularize, do-not-autocorrect

Backend tables (specific to each sub-LLM)

X-embeddings not the most important table; taxonomy more important
Compression mechanism: sorted n-grams
Backend parameters

MLtechniques.com - xLLM, by Vincent Granville

30 of 63

Backend Features (Cont.)

Chunking & Indexing

Chunks called text entities: webpage, subsection (PDF), or JSON entity
Indexed for fast retrieval of full content, and for easy content linking
Chunks of variable length, hierarchical chunking and multi-index, content de-duping
Auto-tagging. Use relative font size and other elements to generate contextual fields.

Python with workarounds + homemade
Weighted graph tokens: multi-tokens found in the context/taxonomy elements
Customized pointwise mutual information (PMI), instead of cosine similarity

MLtechniques.com - xLLM, by Vincent Granville

31 of 63

Frontend Features

User Interface

Many options, not just a search box (see previous slide)
User can choose agents, sub-LLM, or fine-tuning in real time
End-user debugging with catch-all parameter set

Relevancy multi-scores to rank response chunks

Goal: too many results to show to user prompt, which ones to display?
Graph tokens and multi-tokens with 2+ words: boost score
Text entity with 2+ multi-token intersection with prompt, get higher score
Rare multi-tokens get extra boost
Longer text entities get extra boost

MLtechniques.com - xLLM, by Vincent Granville

32 of 63

Relevancy scores

MLtechniques.com - xLLM, by Vincent Granville

33 of 63

Frontend Features (Cont.)

Distillation

If multi-tokens A~B~C and A~B have same count, show results from A~B~C, not A~B

Acronyms and synonyms

If A and B are synonyms, A in prompt but not in corpus, and B in corpus, map A to B in the prompt to retrieve B in the corpus (Goal: trying to be exhaustive)

Self-tuning – Most popular front-end parameters used to build default parameters
Prompt cleanup with stopwords list / stemmer different from backend list
Stemming and un-stemming

MLtechniques.com - xLLM, by Vincent Granville

34 of 63

Distillation

MLtechniques.com - xLLM, by Vincent Granville

35 of 63

Proprietary JSON-based PDF Parser

MLtechniques.com - xLLM, by Vincent Granville

36 of 63

Part 3�xLLM Architecture and Evaluation��

MLtechniques.com – xLLM, by Vincent Granville

37 of 63

General Overview

MLtechniques.com - xLLM, by Vincent Granville

38 of 63

Backend: Overview

MLtechniques.com - xLLM, by Vincent Granville

39 of 63

Frontend: Overview

MLtechniques.com - xLLM, by Vincent Granville

40 of 63

Path from Prompt to Results

MLtechniques.com - xLLM, by Vincent Granville

41 of 63

Path from Crawl to Backend Tables

MLtechniques.com - xLLM, by Vincent Granville

42 of 63

Details: Indexation

MLtechniques.com - xLLM, by Vincent Granville

43 of 63

Detail: Relevancy Algorithm

MLtechniques.com - xLLM, by Vincent Granville

44 of 63

Detail: Sorted N-Grams

MLtechniques.com - xLLM, by Vincent Granville

45 of 63

Database: Nested Hashes (like JSON)

MLtechniques.com - xLLM, by Vincent Granville

46 of 63

Evaluation

User-based (automated)

Collect favorite hyperparameters chosen by users
Use smart grid search to set default hyperparameters based on user favorites
Fine-tune on one or few sub-LLMs (like LoRA) before full optimization on (say) 200 sub-LLMs. You may fine-tune all sub-LLMs in parallel.

Taxonomy-based (automated)

Pretend that the taxonomy backend table comes from external sources
Assign categories to webpages based on this “external” taxonomy
For each webpage, compare externally assigned to native category

MLtechniques.com - xLLM, by Vincent Granville

47 of 63

Evaluation (Cont.)

Evaluation challenges

We are dealing with unsupervised learning: there is no perfect output except for trivial cases
Quality depends on user (professional users and laymen have different criteria)
How do you measure exhaustivity, depth, and recency?
Output value versus grammatical capabilities
How do you integrate xLLM relevancy scores attached to each item, to evaluate output quality? No other LLM return these scores

MLtechniques.com - xLLM, by Vincent Granville

48 of 63

Taxonomy-Based Evaluation

MLtechniques.com - xLLM, by Vincent Granville

49 of 63

Part 4 �xLLM for Clustering, Data Synthetization, Predictive Analytics��

MLtechniques.com – xLLM, by Vincent Granville

50 of 63

Interlude – Adaptive Loss Function (ALF)

Adaptive loss function converging to model evaluation metric

Boosts quality measured using model evaluation, reduces gradient descent failures

MLtechniques.com - xLLM, by Vincent Granville

51 of 63

xLLM for Data Synthetization (with ALF)

MLtechniques.com - xLLM, by Vincent Granville

NoGAN Tabular Data Synthetization

Real data: 2 concentric circles
Synthesized, NoGAN synthesizer: blue dots. Constrained synthetization to keep loss above some threshold
As the loss function gets more granular, the synthesized data gets more similar to the real data (the training set)

52 of 63

xLLM for Predictions

Case study – media industry

Predicting article performance (pageviews) based on title keywords and category
4000 articles; pageview is normalized and time-adjusted

Evaluation and Loss function (identical)

Based on comparing predicted with observed quantiles, using 5 quantiles (see code)
Good proxy to Kolmogorov-Smirnov distance

MLtechniques.com - xLLM, by Vincent Granville

53 of 63

xLLM for Predictions – Model

MLtechniques.com - xLLM, by Vincent Granville

54 of 63

xLLM for Predictions – Category Encoding

Create new codes sequentially as you browse the training set.
Aggregate codes with few observations into bundles.
Create two key-value mappings. Ex:

Category_to_Code[‘Blog’, ‘William’] = 5
Code_to_category[5] = [‘Blog’, ‘William’]

Replace the categorical features by the newly created feature, “Code”.
Number of codes ≤ number of obs.

MLtechniques.com - NoGAN Synthesizer, by Vincent Granville

55 of 63

xLLM for Predictions – Results

Observed vs predicted normalized pageview count

MLtechniques.com - xLLM, by Vincent Granville

56 of 63

xLLM for Clustering

Case study – media industry

Identifying patterns / clusters in popular articles based on title keywords
4000 articles; pageview is normalized and time-adjusted

Methodology

Group multi-tokens into clusters based on a similarity metric, with hierarchical clustering and k-medoids
Let S(t) be the set of articles containing the multi-token t in the title
For each multi-token group G, the list L(G) of articles belonging to G is

MLtechniques.com - xLLM, by Vincent Granville

57 of 63

xLLM for Clustering (Cont.)

Similarity between two multi-tokens t₁, t₂

Remarks

Multi-token clusters are non-overlapping, but article clusters may overlap
Sklearn clustering methods require a distance matrix as input; the matrix (derived from the similarity metric) is huge but extremely sparse.
In my implementation, s(t₁, t₂) is computed and stored only if it is strictly positive. Using connected components for clustering, it is far more efficient than Sklearn.

MLtechniques.com - xLLM, by Vincent Granville

58 of 63

xLLM for Clustering – Sample Structure

MLtechniques.com - xLLM, by Vincent Granville

59 of 63

xLLM for Clustering – Sample Cluster

Cluster of popular articles linked to multi-token cluster with 3 elements, including one contextual multi-token: “Machine^vs” (pv stands for normalized pageview)

MLtechniques.com - xLLM, by Vincent Granville

60 of 63

Interlude – Fast Nearest Neighbor Search

Red dot: prompt-derived embeddings
Blue dot: backend table embedding
Over time, arrows link red dots to their nearest blue dots
Alternative to vector search

MLtechniques.com - xLLM, by Vincent Granville

61 of 63

xLLM for Next Token Prediction

MLtechniques.com - xLLM, by Vincent Granville

Next token prediction: the mother of all LLMs
Here: predict next DNA sub-sequence to generate synthetic genomic data
Alphabet has 4 letters
Left: Scatterplot comparing observed vs synthetic ECDFs

62 of 63

Part 5 �References��

MLtechniques.com – xLLM, by Vincent Granville

63 of 63

References

10 high-level articles describing xLLM:

https://mltblog.com/3GPc8Ss

Technical papers and books:

https://mltblog.com/3zsnQ2g

Our proprietary DNN technology:

https://mltblog.com/3SA3OJ1

MLtechniques.com - xLLM, by Vincent Granville