1 of 48

xLLM: New Generation of Large Language Models for Enterprises��Vincent Granville, PhD�Chief AI Scientist �GenAItechLab.com�vincentg@mltechniques.com� �May 29, 2024

No-GPU Full Context Secure Multi-LLM

with Real-Time Fine-Tuning & Explainable AI�

MLtechniques.com – xLLM, by Vincent Granville

1

2 of 48

Agenda�

MLtechniques.com – xLLM, by Vincent Granville

2

  1. Why xLLM? What is it?
  2. xLLM Innovative Features
  3. xLLM Architecture and Evaluation
  4. xLLM for Clustering, Data Synthetization, Predictive Analytics
  5. What is coming UP next?
  6. References

3 of 48

Part 1�Why xLLM? What is it?��

MLtechniques.com – xLLM, by Vincent Granville

3

4 of 48

Extreme LLM (xLLM) in a Nutshell

  • Mixture of experts
      • Specialized sub-LLM and/or sub-LLMs for authorized users
      • LLM router to manage the sub-LLMs
      • User selects sun-LLM, agent, and hyperparameters
      • Each sub-LLM built with its own taxonomy and knowledge graph
  • No neural network, no training
      • Thus, low cost, easy to fine-tune in real-time
      • Self-tuned based on favorite hyperparameters, and customizable
      • No GPU, no latency, exhaustive concise results, local implementation
  • Concise results
      • Multiple sections displayed to user: links, related content, x-embeddings
      • Output with relevancy score attached to each item in each section; User offered choices for deeper or alternate queries
      • Great for search, for professional users and experts. Not just a “prompt box”; many options in the UI, like a mini browser
  • Case studies
      • Corporate corpus with augmented sources (content + taxonomies)
      • Wolfram corpus: 15 sub-LLMs, 500 sub-categories per sub-LLM
      • Publisher, 4000 titles: clustering, predicting article performance

MLtechniques.com - xLLM, by Vincent Granville

xLLM mission is to enable Enterprises to build their on LLMs that fits their purpose with precision, faster, cheaper with security as well open to integrate with any other LLMs…

4

5 of 48

Prompt Results – Card Format (web API)

MLtechniques.com - xLLM, by Vincent Granville

5

6 of 48

Prompt Results – Card Format (web API)

MLtechniques.com - xLLM, by Vincent Granville

6

7 of 48

Prompt Results – Listing Format (1)

MLtechniques.com - xLLM, by Vincent Granville

7

8 of 48

Prompt Results – Listing Format (2)

MLtechniques.com - xLLM, by Vincent Granville

8

9 of 48

Prompt Results – Text Format

  • Text entities retrieved from corpus via context chunking / indexation
      • Blended with images, datasets, URLs and so on (multimodal)
      • Knowledge graph elements included: categories, tags, related content, agents
  • Generating Multi-language output (prose) with GenAI
      • Coming soon, different from simple text retrieval
      • Turn output into English summary; accented characters already implemented for other languages
      • Pre-made customizable synthetic answers (template answers)
      • Blending AI with classic ML: integrating external tools with large list of pre-made, customizable template sentences to display in prompt results

MLtechniques.com - xLLM, by Vincent Granville

9

10 of 48

xLLM Integration with other APIs and LLMs

      • Leverage and blend capabilities of multiple LLMs (GPT, Perplexity, Mistral, etc..)
      • Use external GenAI tools or libraries to turn output into nice, fluid English text
      • CodeValet API for code generation
      • Wolfram/Mathematica API to solve math problems

MLtechniques.com - xLLM, by Vincent Granville

10

11 of 48

Part 2 �xLLM Innovative Features��

MLtechniques.com – xLLM, by Vincent Granville

11

12 of 48

Backend Features

  • Smart crawling to retrieve embedded structure
      • Breadcrumbs (enterprise corpus), concept associations (related links)
      • Metadata, tags, taxonomy (category graph)
      • Augmented with user prompts
      • Augmented with PDFs (TOC, index, glossaries, synonyms, titles)
  • X-embeddings
      • Variable-length embeddings stored as sparse nested hashes
      • Multi-token: “data~science” on top of single tokens “data” and “science”
      • Contextual token: “data^science”, both words in same paragraph but not adjacent
      • PMI (pointwise mutual information) instead of dot product / cosine distance
      • Parametric weights attached to tokens (no loss function to optimize)

MLtechniques.com - xLLM, by Vincent Granville

12

13 of 48

Retrieved Taxonomy: Wolfram Example

MLtechniques.com - xLLM, by Vincent Granville

13

14 of 48

Retrieved Context: Enterprise Example

MLtechniques.com - xLLM, by Vincent Granville

14

15 of 48

Backend Features (Cont.)

  • Home-made libraries
      • Issues with Python libraries (singularize, autocorrect, “Feller” changed to “seller”)
      • Minimize stemming and text transforms; keep plural if found in corpus
      • Important: accented characters, separators (punctuation), capital letters
      • Ad-hoc lists: home-made stopwords, do-not-singularize, do-not-autocorrect
  • Backend tables (specific to each sub-LLM)
      • X-embeddings not the most important table; taxonomy more important
      • Compression mechanism: sorted n-grams
      • Backend parameters

MLtechniques.com - xLLM, by Vincent Granville

15

16 of 48

Backend Features (Cont.)

  • Chunking & Indexing
      • Chunks called text entities: webpage, subsection (PDF), or JSON entity
      • Indexed for fast retrieval of full content, and for easy content linking
      • Chunks of variable length
  • NLP
      • Python with workarounds + homemade
      • Weighted graph tokens: multi-tokens found in the context/taxonomy elements
      • Customized pointwise mutual information (PMI), instead of cosine similarity

MLtechniques.com - xLLM, by Vincent Granville

16

17 of 48

Backend Features (Cont.)

  • Augmentation
      • Easy integration of external sources, tested on corporate corpus
      • External content flagged via tags or other context elements
      • User told if piece of output is internal or external
      • Taxonomy augmentation
  • Agents
      • Assigned post-crawling to text entities via clustering, for easy matching with prompt
      • Different from standard implementations (bottom up rather than top down)
  • Content Deduping

MLtechniques.com - xLLM, by Vincent Granville

17

18 of 48

Frontend Features

  • User Interface
      • Many options, not just a search box (see previous slide)
      • User can choose agents, sub-LLM, or fine-tuning in real time
      • End-user debugging with catch-all parameter set
  • Relevancy scores
      • Goal: too many results to show to user prompt, which ones to display?
      • Graph tokens and multi-tokens with 2+ words boost score
      • Text entity with 2+ multi-token intersection with prompt, get higher score
      • Rare multi-tokens get extra boost
      • Longer text entities get extra boost

MLtechniques.com - xLLM, by Vincent Granville

18

19 of 48

Relevancy scores

MLtechniques.com - xLLM, by Vincent Granville

19

20 of 48

Frontend Features (Cont.)

  • Distillation
      • If multi-tokens A~B~C and A~B have same count, show results from A~B~C, not A~B
  • Acronyms and synonyms
      • If A and B are synonyms, A in prompt but not in corpus, and B in corpus, map A to B in the prompt to retrieve B in the corpus (Goal: trying to be exhaustive)
  • Self-tuning Most popular front-end parameters used to build default parameters
  • Prompt cleanup with stopwords list different from backend list
  • Disambiguation (coming soon)

MLtechniques.com - xLLM, by Vincent Granville

20

21 of 48

Distillation

MLtechniques.com - xLLM, by Vincent Granville

21

22 of 48

Part 3�xLLM Architecture and Evaluation��

MLtechniques.com – xLLM, by Vincent Granville

22

23 of 48

Backend: Overview

MLtechniques.com - xLLM, by Vincent Granville

23

24 of 48

Frontend: Overview

MLtechniques.com - xLLM, by Vincent Granville

24

25 of 48

Path from Prompt to Results

MLtechniques.com - xLLM, by Vincent Granville

25

26 of 48

Path from Crawl to Backend Tables

MLtechniques.com - xLLM, by Vincent Granville

26

27 of 48

Details: Indexation

MLtechniques.com - xLLM, by Vincent Granville

27

28 of 48

Detail: Relevancy Algorithm

MLtechniques.com - xLLM, by Vincent Granville

28

29 of 48

Detail: Sorted N-Grams

MLtechniques.com - xLLM, by Vincent Granville

29

30 of 48

Database: Nested Hashes (like JSON)

MLtechniques.com - xLLM, by Vincent Granville

30

31 of 48

Evaluation

  • User-based (automated)
      • Collect favorite hyperparameters chosen by users
      • Use smart grid search to set default hyperparameters based on user favorites
      • Fine-tune on one or few sub-LLMs (like LoRA) before full optimization on (say) 200 sub-LLMs. You may fine-tune all sub-LLMs in parallel.
  • Taxonomy-based (automated)
      • Pretend that the taxonomy backend table comes from external sources
      • Assign categories to webpages based on this “external” taxonomy
      • For each webpage, compare externally assigned to native category

  • Evaluation challenges
      • We are dealing with unsupervised learning: there is no perfect output except for trivial cases
      • Quality depends on user (professional users and laymen have different criteria)
      • How do you measure exhaustivity, depth, and recency?
      • Value of output versus grammatical capabilities
      • How do you integrate xLLM relevancy scores attached to each item, to evaluate output quality? No other LLM return these scores

MLtechniques.com - xLLM, by Vincent Granville

31

32 of 48

Taxonomy-Based Evaluation

MLtechniques.com - xLLM, by Vincent Granville

32

33 of 48

Part 4 �xLLM for Clustering, Data Synthetization, Predictive Analytics��

MLtechniques.com – xLLM, by Vincent Granville

33

34 of 48

Interlude – Adaptive Loss Function (ALF)

  • Adaptive loss function converging to model evaluation metric
      • Boosts quality measured using model evaluation, reduces gradient descent failures

MLtechniques.com - xLLM, by Vincent Granville

34

35 of 48

xLLM for Data Synthetization (with ALF)

MLtechniques.com - xLLM, by Vincent Granville

35

NoGAN Tabular Data Synthetization

  • Real data: 2 concentric circles
  • Synthesized, NoGAN synthesizer: blue dots. Constrained synthetization to keep loss above some threshold
  • As the loss function gets more granular, the synthesized data gets more similar to the real data (the training set)

36 of 48

xLLM for Predictions

  • Case study – media industry
      • Predicting article performance (pageviews) based on title keywords and category
      • 4000 articles; pageview is normalized and time-adjusted
  • Evaluation and Loss function (identical)
      • Based on comparing predicted with observed quantiles, using 5 quantiles (see code)
      • Good proxy to Kolmogorov-Smirnov distance

MLtechniques.com - xLLM, by Vincent Granville

36

37 of 48

xLLM for Predictions – Model

MLtechniques.com - xLLM, by Vincent Granville

37

38 of 48

xLLM for Predictions – Category Encoding

  • Create new codes sequentially as you browse the training set.
  • Aggregate codes with few observations into bundles.
  • Create two key-value mappings. Ex:
      • Category_to_Code[‘Blog’, ‘William’] = 5
      • Code_to_category[5] = [‘Blog’, ‘William’]
  • Replace the categorical features by the newly created feature, “Code”.
  • Number of codes ≤ number of obs.

MLtechniques.com - NoGAN Synthesizer, by Vincent Granville

38

39 of 48

xLLM for Predictions – Results

  • Observed vs predicted normalized pageview count

MLtechniques.com - xLLM, by Vincent Granville

39

40 of 48

xLLM for Clustering

  • Case study – media industry
      • Identifying patterns / clusters in popular articles based on title keywords
      • 4000 articles; pageview is normalized and time-adjusted
  • Methodology
      • Group multi-tokens into clusters based on a similarity metric, with hierarchical clustering and k-medoids
      • Let S(t) be the set of articles containing the multi-token t in the title
      • For each multi-token group G, the list L(G) of articles belonging to G is

MLtechniques.com - xLLM, by Vincent Granville

40

41 of 48

xLLM for Clustering (Cont.)

  • Similarity between two multi-tokens t1, t2

  • Remarks
      • Multi-token clusters are non-overlapping, but article clusters may overlap
      • Sklearn clustering methods require a distance matrix as input; the matrix (derived from the similarity metric) is huge but extremely sparse.
      • In my implementation, s(t1, t2) is computed and stored only if it is strictly positive. Using connected components for clustering, it is far more efficient than Sklearn.

MLtechniques.com - xLLM, by Vincent Granville

41

42 of 48

xLLM for Clustering – Sample Structure

MLtechniques.com - xLLM, by Vincent Granville

42

43 of 48

xLLM for Clustering – Sample Cluster

  • Cluster of popular articles linked to multi-token cluster with 3 elements, including one contextual multi-token: “Machine^vs” (pv stands for normalized pageview)

MLtechniques.com - xLLM, by Vincent Granville

43

44 of 48

Interlude – Fast Nearest Neighbor Search

  • Red dot: prompt-derived embeddings
  • Blue dot: backend table embedding
  • Over time, arrows link red dots to their nearest blue dots
  • Alternative to vector search

MLtechniques.com - xLLM, by Vincent Granville

44

45 of 48

xLLM for Next Token Prediction

MLtechniques.com - xLLM, by Vincent Granville

45

  • Next token prediction: the mother of all LLMs
  • Here: predict next DNA sub-sequence to generate synthetic genomic data
  • Alphabet has 4 letters
  • Left: Scatterplot comparing observed vs synthetic ECDFs

46 of 48

Part 5 �References��

MLtechniques.com – xLLM, by Vincent Granville

46

47 of 48

References

  • New book: “Building Disruptive AI & LLM Technology from Scratch”
  • First book: “State of the Art in GenAI & LLMs – Creative Projects, with Solutions”
      • Project 2.4 – Adaptive loss function
      • Project 7.2 – Main part, includes smart crawling and x-embeddings
      • Project 8.1 – Fast approximate nearest neighbor search
      • Project 8.2 – Evaluation using taxonomy
      • Project 8.3 – xLLM for clustering and predictions
  • GitHub: code, data: https://github.com/VincentGranville/Large-Language-Models
  • AI Research and book access: https://mltechniques.com/resources/

MLtechniques.com - xLLM, by Vincent Granville

47

48 of 48

Thank you!

48