Ruby + AI for building multi-agent architecture
Sergy Sergyenko
24 Oct 2024, EDI
Sergey Sergyenko
@sergyenko
Ruby Software Development Agency
Addicted to Ruby since v2.1.9
https://rubyness.co.uk
sergy@rubyness.co.uk
AI + Ruby and Rails
Ruby + AI for multi-agent architecture
When you start an AI project,
in 99% you go with a platform
Microsoft Azure AI, Google Cloud AI, OpenAI API, Hugging Face, AWS AI/ML
Is ChatGPT a language model?
In most of the cases one model is not enough
Multi-Agent Architecture
over
“One-Size-Fits-All” LLM
Typical MA Ecosystem
User
Input
Initial Analysis and Routing
LLM Function Calling
Agent Skills Calling
RAG
Internal / External APIs
Evals
Debug / Graph
Agent Step
Analyze
Evals
Continuous Analysis and re-Routing
Router and
Planer function
Router
Planner
Agent Step
Skill 1
Skill N-1
User
Input
Skill 2
Messages
Memory
Critique Step
Router
Planner
Agent Step
Skill N
User
Input
Critique Step
Agent Process (Router - Skill)
Saved State
Router Example
Customer Input
I need a new Kindle e-reader for my reading hobby. Are there any discounts currently?
Classification�OpenAI Call
Item Search
Q/A
LLM
Recommend Item
LLM
Query Response
Purchase
Query
Classification::Purchase
Classification::Query
Code
LLM
User Intent
Router
Product Comparison
Product Search
Customer Support
Track Package
Promos and Deals
Chatbot Query: Product, Price, Quality
Unstructured to Structured
Extract Query
Call Order Details API
Call Promo Database for latest promos
Unstructured to Structured
Search API
RAG on Customer Support Docs
Tracking UI
Promo UI
Search API
Rank Products
Chatbot Query: Classification, FAQ
Chat with Live Support Agent
Comparison UI
Summarize Product Description
Add to Wishlist
Checkout
Skills
LLM Call
Application
API Call
Router Example
Return to Menu
User Intent
Router
Product Comparison
Product Search
Customer Support
Track Package
Promos and Deals
Chatbot Query: Product, Price, Quality
Unstructured to Structured
Extract Query
Call Order Details API
Call Promo Database for latest promos
Unstructured to Structured
Search API
RAG on Customer Support Docs
Tracking UI
Promo UI
Search API
Rank Products
Chat with Live Support Agent
Comparison UI
Summarize Product Description
Add to Wishlist
Checkout
Skills
LLM Call
Application
API Call
Router Example
Chatbot Query: Classification, FAQ
Return to Menu
User Intent
Router
Product Comparison
Chatbot Query: Product, Price, Quality
Unstructured to Structured
Search API
Comparison UI
Skills
LLM Call
Application
API Call
Router Example
PROMPT
PROMPT
Function Call
Customer Support
Extract Query
RAG on Customer Support Docs
Chatbot Query: Classification, FAQ
Return to Menu
Function Call
PROMPT
PROMPT
User Intent
Router
Product Comparison
Chatbot Query: Product, Price, Quality
Unstructured to Structured
Search API
Comparison UI
Skills
LLM Call
Application
API Call
Router Example
PROMPT
PROMPT
Function Call
Customer Support
Extract Query
RAG on Customer Support Docs
Chatbot Query: Classification, FAQ
Return to Menu
Function Call
PROMPT
PROMPT
User Intent
Router
Product Comparison
Chatbot Query: Product, Price, Quality
Unstructured to Structured
Search API
Comparison UI
Skills
LLM Call
Application
API Call
Router Example
PROMPT
PROMPT
Function Call
Customer Support
Extract Query
RAG on Customer Support Docs
Chatbot Query: Classification, FAQ
Return to Menu
Function Call
PROMPT
PROMPT
Conversational Model
Model Classifiers
Instructional Model
Extract Query
RAG on Customer Support Docs
Chatbot Query: Classification, FAQ
Conversational Model
Classifier Model
Instructional Model
Extract Query
RAG on Customer Support Docs
Chatbot Query: Classification, FAQ
Conversational Model
Classifier Model
Instructional Model
Own Your Own AI
Story behind llama.cpp
Georgi Gerganov
Sofia, Bulgaria
$ git clone https://github.com/ggerganov/llama.cpp.git
$ python3 -m pip install -r requirements.txt
$ cd models
llama.cpp
brew install llama.cpp
🤗
$ git lfs install
$ git clone git@hf.co:openlm-research/open_llama_7b
Clone Model Repository (🤗 huggingface.co)
Convert the Model to GGUF format
$ python3 convert_hf_to_gguf.py models/open_llama_7b
$ make
This conversion enables the model to be loaded and executed with improved performance on CPUs
Quantization reduces the precision of model weights (e.g., from 32-bit to 16-bit or even 1-bit) to save memory and speed up inference. For example:
- **32-bit weight**: `0.123456789`
- **16-bit weight**: `0.1234`
- **1-bit weight**: `0` or `1`
This allows AI models to run efficiently on devices like smartphones or IoT hardware with minimal accuracy loss.
Quantize the Model
$ ./llama-quantize \
./models/open_llama_7b/Open_Llama_7B-6.7B-F16.gguf \
./models/open_llama_7b/Open_Llama_7B-6.7B-F16.bin \
q4_0
Demo
require 'llama_cpp'
rSpec LLama
gem install rspec-llama
https://github.com/aifoundry-org/rspec-llama
LLamagator
docker compose up
https://github.com/aifoundry-org/llamagator
LLamagator
LLM-as-a-Judge made easy
https://github.com/aifoundry-org/llamagator
Thank you!
@sergyenko