1 of 63

Lead Smarter: Stay Ahead of AI Risks��Vincent Granville, PhD�Chief AI Architect �BondingAI.io�vincent@BondingAI.io� �August 27, 2025

Large Language Models: security and related issues. What are they? How to address them?�

MLtechniques.com – xLLM, by Vincent Granville

1

2 of 63

Agenda�

MLtechniques.com – xLLM, by Vincent Granville

2

  1. Risks specific to Enterprise LLMs
  2. A better UI for LLMs
  3. Elements of our architecture
  4. Our team

3 of 63

Risks Specific to Enterprise LLMs (1/3)

  • Call to external APIs (OpenAI), data leakage
  • Reliance on Blackbox systems (transformers, DNN)
    • Hard to train, requires billions of tokens, GPU farms, lots of electricity
    • Hard to distill
    • Yet training to predict the next token is outdated
    • Charging by token incentivizes vendors to use more tokens
    • Analogy: pay-per-click monetization in 2010
  • Algorithmic bias
  • Prompt injection
  • Costly mistakes or hallucinations
  • No relevancy scores displayed to user
  • Not scoring input sources

MLtechniques.com - xLLM, by Vincent Granville

3

4 of 63

Risks Specific to Enterprise LLMs (2/3)

  • Not all users should have access to the full corpus
    • Solution: sub-LLMs and even chunks with restricted access
  • Failure to provide precise references to source
    • Highly re-worded response magnifies the problem, causes hallucinations
  • Most secure solutions are on-premises with full control by client
  • Faulty evaluation metrics
    • LLM as a judge: circular loop
    • Do not capture exhaustivity and other qualities
    • Use of synthetic prompts
    • Evaluation is user-dependent

MLtechniques.com - xLLM, by Vincent Granville

4

5 of 63

Risks Specific to Enterprise LLMs (3/3)

  • Issues with Python libraries
    • Autocorrect, stopwords, stemmers (global vs. local)
    • Inability to sample outside the observation range
  • Failure to provide precise references to source
    • Highly re-worded response magnifies the problem, causes hallucinations
  • Poor QA
    • No action taken following consistent poor user ratings to responses
  • Failure to connect parts of the corpus to your LLM
    • Execution without consulting with the right people
  • Hard to debug and fine-tune
    • Much easier with our architecture

MLtechniques.com - xLLM, by Vincent Granville

5

6 of 63

A Better UI for LLMs (1/2)

  • User can choose:
    • sub-LLMs,
    • Categories,
    • Tags
  • Search options:
    • Exact or broad match
    • Negative keywords
    • Search by recency
  • Offer real-time fine tune and re-training, with intuitive parameters for
    • Distillation, stemmer / un-stemmer, embeddings, relevancy scores, …

MLtechniques.com - xLLM, by Vincent Granville

6

7 of 63

A Better UI for LLMs (2/2)

  • Lightning-fast testing, training, debugging and in-memory LLMs, from UI
    • Nested hashes (JSON-like)
    • Variable-length embeddings instead of vector DB and dot products
    • In addition to standard response, offer structured output (summary boxes + alternate queries) to reduce hallucinations and prompt engineering, with relevancy scores and precise references to corpus
    • Chunks pre- and post-tagging
    • Hierarchical chunking with option to browse corpus
  • Minimize re-wording in final response (important e.g. for legal documents)

MLtechniques.com - xLLM, by Vincent Granville

7

8 of 63

Elements of our architecture (1/4)

MLtechniques.com - xLLM, by Vincent Granville

8

9 of 63

Elements of our architecture (2/4)

MLtechniques.com - xLLM, by Vincent Granville

9

10 of 63

Elements of our architecture (3/4)

MLtechniques.com - xLLM, by Vincent Granville

10

11 of 63

Elements of our architecture (4/4)

MLtechniques.com - xLLM, by Vincent Granville

11

12 of 63

Elements of our architecture

MLtechniques.com - xLLM, by Vincent Granville

12

13 of 63

Elements of our architecture

MLtechniques.com - xLLM, by Vincent Granville

13

14 of 63

Our Team

DANILO NATO

CEO & CO-FOUNDER

+17 Years of Experience

AB InBev, Global AI Director

BASF, LaTam

Degrees in computer science, business. Masters in stats and psychology

VINCENT GRANVILLE

CHIEF AI ARCH & CO-FOUNDER

Successful exit, Data Science Central sold to Tech Target (2020)

Microsoft, Wells Fargo, Ebay, Visa, NBC

Created xLLM - LLMs 2.0

PhD in image remote sensing, postdoc at University of Cambridge

PETER VOGT

VP OF SALES

Experienced Sales Executive across industry

Sales Domain expertise in Data, AI, Cloud, Cybersecurity

EDUARDO SOARES

CTO

+15 Years in Software Engineering, Data, AI

Worked major companies in LatAm

Founder CodeNato

Degrees in computer science

ANI DESWANDIKAR

PRODUCT LEAD

+30 Years of Experience in Software, Data and AI

+10 years at Microsoft, Principal Architect

Netflix, Sr Software Engineer

SADIAH ZAHOOR

AI LEAD

Phd University of Cambridge

Experienced Researcher Cambridge, TATA Institute, Ministry of Defense India

FERNANDO GONCALVES

ENGINEER LEAD

+20 Years of Experience in Data, ML

Boticario, Data Engineer lead

GAVB, Data Scientist lead

15 of 63

Part 1�Why xLLM? What is it?��

MLtechniques.com – xLLM, by Vincent Granville

15

16 of 63

Extreme LLM (xLLM) in a Nutshell

  • Mixture of experts
      • Specialized sub-LLM and/or sub-LLMs for authorized users
      • LLM router to manage the sub-LLMs
      • User selects sun-LLM, agents, and hyperparameters
      • Each sub-LLM built with its own taxonomy and contextual environment

  • No neural network, no training
      • Thus, low cost, easy to fine-tune in real-time, in-memory LLM, on-premises
      • Self-tuned based on favorite hyperparameters, intuitive parameters
      • No GPU, no latency, exhaustive concise results, local implementation
  • Concise results
      • Multiple sections: links, related content, x-embeddings based on E-PMI metric
      • Output with relevancy score attached to each item in each section; User offered choices for deeper or alternate queries
      • Great for professional users. Not just a “prompt box”; many options in the UI, like a mini-browser
  • Case studies
      • Corporate datal lake corpus
      • Nvidia PDF repository
      • Wolfram corpus: 15k webpages, 5k categories
      • Publisher, 4000 titles: clustering, predicting article performance

MLtechniques.com - xLLM, by Vincent Granville

xLLM for Enterprises to build own LLMs faster, at lower cost, with increased accuracy, security, explainable AI

16

17 of 63

Prompt Results – Card Format (web API)

MLtechniques.com - xLLM, by Vincent Granville

17

18 of 63

Prompt Results – Card Format (web API)

MLtechniques.com - xLLM, by Vincent Granville

18

19 of 63

Prompt Results – Listing Format (1)

MLtechniques.com - xLLM, by Vincent Granville

19

20 of 63

Prompt Results – Listing Format (2)

MLtechniques.com - xLLM, by Vincent Granville

20

21 of 63

Prompt Results – Structured Text Format

  • Text entities retrieved from corpus via contextual chunking / indexation
      • Blended with images, datasets, exact URLs/references and so on (multimodal)
      • Featuring categories, tags, related content, titles, timestamps, links, and so on
      • Structured output based on hierarchical chunking and multi-indexing, augmented with auto-tagging, acronyms / synonyms dictionary, and un-stemming
      • Multiple relevancy scores based on multiple types of multi-tokens, then normalized
      • Exact vs broad search, negative keywords, variable weights attached to prompt multi-tokens, search by recency.
      • Auto-correct, stemmer, stopwords specific to corpus; “real estate” or “San Francisco” are single tokens. Chunks visibility (for any given chunk) depends on user privileges.

MLtechniques.com - xLLM, by Vincent Granville

21

22 of 63

Prompt Results – Response Generation

  • Based on structured output (see previous slides)
    • Generic template prompts with premade auto-filled response matched against user prompts
    • Light, fast proprietary DNN, no TensorFlow, Python or Keras, explainable AI.

  • Proprietary non-Blackbox DNN with explainable AI (patent-pending)
      • With equalizer, stabilizer and other proprietary features to accelerate convergence and increase stability
      • Chaotic gradient descent with temperature decay
      • Sub-epochs, sub-layers, original universal function
      • Global optimization or one sub-layer at a time within an epoch
      • Pre-tabulated functions, generic partial derivatives

MLtechniques.com - xLLM, by Vincent Granville

22

23 of 63

DNN Weight Ghosting and Watermarking (1)

MLtechniques.com - xLLM, by Vincent Granville

23

24 of 63

DNN Weight Ghosting and Watermarking (2)

MLtechniques.com - xLLM, by Vincent Granville

24

25 of 63

Part 2 �xLLM Innovative Features��

MLtechniques.com – xLLM, by Vincent Granville

25

26 of 63

Backend Features

  • Smart crawling to retrieve embedded structure
      • Breadcrumbs (enterprise corpus), concept associations (related links)
      • Metadata, tags, taxonomy, long contextual environment
      • PDF parser (TOC, index, glossaries, synonyms, titles, tables, images)
  • X-embeddings
      • Variable-length embeddings stored as sparse nested hashes
      • Multi-token: “data~science” on top of single tokens “data” and “science”
      • Contextual token: “data^science”, both words in same paragraph but not adjacent
      • PMI (pointwise mutual information) instead of dot product / cosine distance
      • Parametric weights attached to tokens (no loss function to optimize)

MLtechniques.com - xLLM, by Vincent Granville

26

27 of 63

Retrieved Taxonomy: Wolfram Example

MLtechniques.com - xLLM, by Vincent Granville

27

28 of 63

Retrieved Context: Enterprise Example

MLtechniques.com - xLLM, by Vincent Granville

28

29 of 63

Backend Features (Cont.)

  • Home-made libraries
      • Issues with Python libraries (singularize, autocorrect, “Feller” changed to “seller”)
      • Minimize stemming and text transforms; keep plural if found in corpus
      • Important: accented characters, separators (punctuation), capital letters
      • Ad-hoc lists: home-made stopwords, do-not-singularize, do-not-autocorrect
  • Backend tables (specific to each sub-LLM)
      • X-embeddings not the most important table; taxonomy more important
      • Compression mechanism: sorted n-grams
      • Backend parameters

MLtechniques.com - xLLM, by Vincent Granville

29

30 of 63

Backend Features (Cont.)

  • Chunking & Indexing
      • Chunks called text entities: webpage, subsection (PDF), or JSON entity
      • Indexed for fast retrieval of full content, and for easy content linking
      • Chunks of variable length, hierarchical chunking and multi-index, content de-duping
      • Auto-tagging. Use relative font size and other elements to generate contextual fields.
  • NLP
      • Python with workarounds + homemade
      • Weighted graph tokens: multi-tokens found in the context/taxonomy elements
      • Customized pointwise mutual information (PMI), instead of cosine similarity

MLtechniques.com - xLLM, by Vincent Granville

30

31 of 63

Frontend Features

  • User Interface
      • Many options, not just a search box (see previous slide)
      • User can choose agents, sub-LLM, or fine-tuning in real time
      • End-user debugging with catch-all parameter set
  • Relevancy multi-scores to rank response chunks
      • Goal: too many results to show to user prompt, which ones to display?
      • Graph tokens and multi-tokens with 2+ words: boost score
      • Text entity with 2+ multi-token intersection with prompt, get higher score
      • Rare multi-tokens get extra boost
      • Longer text entities get extra boost

MLtechniques.com - xLLM, by Vincent Granville

31

32 of 63

Relevancy scores

MLtechniques.com - xLLM, by Vincent Granville

32

33 of 63

Frontend Features (Cont.)

  • Distillation
      • If multi-tokens A~B~C and A~B have same count, show results from A~B~C, not A~B
  • Acronyms and synonyms
      • If A and B are synonyms, A in prompt but not in corpus, and B in corpus, map A to B in the prompt to retrieve B in the corpus (Goal: trying to be exhaustive)
  • Self-tuning Most popular front-end parameters used to build default parameters
  • Prompt cleanup with stopwords list / stemmer different from backend list
  • Stemming and un-stemming

MLtechniques.com - xLLM, by Vincent Granville

33

34 of 63

Distillation

MLtechniques.com - xLLM, by Vincent Granville

34

35 of 63

Proprietary JSON-based PDF Parser

MLtechniques.com - xLLM, by Vincent Granville

35

36 of 63

Part 3�xLLM Architecture and Evaluation��

MLtechniques.com – xLLM, by Vincent Granville

36

37 of 63

General Overview

MLtechniques.com - xLLM, by Vincent Granville

37

38 of 63

Backend: Overview

MLtechniques.com - xLLM, by Vincent Granville

38

39 of 63

Frontend: Overview

MLtechniques.com - xLLM, by Vincent Granville

39

40 of 63

Path from Prompt to Results

MLtechniques.com - xLLM, by Vincent Granville

40

41 of 63

Path from Crawl to Backend Tables

MLtechniques.com - xLLM, by Vincent Granville

41

42 of 63

Details: Indexation

MLtechniques.com - xLLM, by Vincent Granville

42

43 of 63

Detail: Relevancy Algorithm

MLtechniques.com - xLLM, by Vincent Granville

43

44 of 63

Detail: Sorted N-Grams

MLtechniques.com - xLLM, by Vincent Granville

44

45 of 63

Database: Nested Hashes (like JSON)

MLtechniques.com - xLLM, by Vincent Granville

45

46 of 63

Evaluation

  • User-based (automated)
      • Collect favorite hyperparameters chosen by users
      • Use smart grid search to set default hyperparameters based on user favorites
      • Fine-tune on one or few sub-LLMs (like LoRA) before full optimization on (say) 200 sub-LLMs. You may fine-tune all sub-LLMs in parallel.
  • Taxonomy-based (automated)
      • Pretend that the taxonomy backend table comes from external sources
      • Assign categories to webpages based on this “external” taxonomy
      • For each webpage, compare externally assigned to native category

MLtechniques.com - xLLM, by Vincent Granville

46

47 of 63

Evaluation (Cont.)

  • Evaluation challenges
      • We are dealing with unsupervised learning: there is no perfect output except for trivial cases
      • Quality depends on user (professional users and laymen have different criteria)
      • How do you measure exhaustivity, depth, and recency?
      • Output value versus grammatical capabilities
      • How do you integrate xLLM relevancy scores attached to each item, to evaluate output quality? No other LLM return these scores

MLtechniques.com - xLLM, by Vincent Granville

47

48 of 63

Taxonomy-Based Evaluation

MLtechniques.com - xLLM, by Vincent Granville

48

49 of 63

Part 4 �xLLM for Clustering, Data Synthetization, Predictive Analytics��

MLtechniques.com – xLLM, by Vincent Granville

49

50 of 63

Interlude – Adaptive Loss Function (ALF)

  • Adaptive loss function converging to model evaluation metric
      • Boosts quality measured using model evaluation, reduces gradient descent failures

MLtechniques.com - xLLM, by Vincent Granville

50

51 of 63

xLLM for Data Synthetization (with ALF)

MLtechniques.com - xLLM, by Vincent Granville

51

NoGAN Tabular Data Synthetization

  • Real data: 2 concentric circles
  • Synthesized, NoGAN synthesizer: blue dots. Constrained synthetization to keep loss above some threshold
  • As the loss function gets more granular, the synthesized data gets more similar to the real data (the training set)

52 of 63

xLLM for Predictions

  • Case study – media industry
      • Predicting article performance (pageviews) based on title keywords and category
      • 4000 articles; pageview is normalized and time-adjusted
  • Evaluation and Loss function (identical)
      • Based on comparing predicted with observed quantiles, using 5 quantiles (see code)
      • Good proxy to Kolmogorov-Smirnov distance

MLtechniques.com - xLLM, by Vincent Granville

52

53 of 63

xLLM for Predictions – Model

MLtechniques.com - xLLM, by Vincent Granville

53

54 of 63

xLLM for Predictions – Category Encoding

  • Create new codes sequentially as you browse the training set.
  • Aggregate codes with few observations into bundles.
  • Create two key-value mappings. Ex:
      • Category_to_Code[‘Blog’, ‘William’] = 5
      • Code_to_category[5] = [‘Blog’, ‘William’]
  • Replace the categorical features by the newly created feature, “Code”.
  • Number of codes ≤ number of obs.

MLtechniques.com - NoGAN Synthesizer, by Vincent Granville

54

55 of 63

xLLM for Predictions – Results

  • Observed vs predicted normalized pageview count

MLtechniques.com - xLLM, by Vincent Granville

55

56 of 63

xLLM for Clustering

  • Case study – media industry
      • Identifying patterns / clusters in popular articles based on title keywords
      • 4000 articles; pageview is normalized and time-adjusted
  • Methodology
      • Group multi-tokens into clusters based on a similarity metric, with hierarchical clustering and k-medoids
      • Let S(t) be the set of articles containing the multi-token t in the title
      • For each multi-token group G, the list L(G) of articles belonging to G is

MLtechniques.com - xLLM, by Vincent Granville

56

57 of 63

xLLM for Clustering (Cont.)

  • Similarity between two multi-tokens t1, t2

  • Remarks
      • Multi-token clusters are non-overlapping, but article clusters may overlap
      • Sklearn clustering methods require a distance matrix as input; the matrix (derived from the similarity metric) is huge but extremely sparse.
      • In my implementation, s(t1, t2) is computed and stored only if it is strictly positive. Using connected components for clustering, it is far more efficient than Sklearn.

MLtechniques.com - xLLM, by Vincent Granville

57

58 of 63

xLLM for Clustering – Sample Structure

MLtechniques.com - xLLM, by Vincent Granville

58

59 of 63

xLLM for Clustering – Sample Cluster

  • Cluster of popular articles linked to multi-token cluster with 3 elements, including one contextual multi-token: “Machine^vs” (pv stands for normalized pageview)

MLtechniques.com - xLLM, by Vincent Granville

59

60 of 63

Interlude – Fast Nearest Neighbor Search

  • Red dot: prompt-derived embeddings
  • Blue dot: backend table embedding
  • Over time, arrows link red dots to their nearest blue dots
  • Alternative to vector search

MLtechniques.com - xLLM, by Vincent Granville

60

61 of 63

xLLM for Next Token Prediction

MLtechniques.com - xLLM, by Vincent Granville

61

  • Next token prediction: the mother of all LLMs
  • Here: predict next DNA sub-sequence to generate synthetic genomic data
  • Alphabet has 4 letters
  • Left: Scatterplot comparing observed vs synthetic ECDFs

62 of 63

Part 5 �References��

MLtechniques.com – xLLM, by Vincent Granville

62

63 of 63

References

  • 10 high-level articles describing xLLM:

https://mltblog.com/3GPc8Ss

  • Technical papers and books:

https://mltblog.com/3zsnQ2g

  • Our proprietary DNN technology:

https://mltblog.com/3SA3OJ1

MLtechniques.com - xLLM, by Vincent Granville

63