Unleashing the Power of Gemini
Google's Next-Gen LLM for Every Developer
AI Camp - 18 July 2024
Image by Gemini Advanced
Image from Denis Valášek w/ DALL·E 3
Who are you?
Software developer / Consultant
New York
Google Developer Expert
Google Cloud Champion Innovator
LangChain.js contributor
Co-host "Two Voice Devs"
http://spiders.com/
http://prisoner.com/
LinkedIn: Allen Firstenberg
Google and AI
Responsible AI
3,000
Researchers
7,000
Publications
Built & Tested for Safety
Privacy in design
Upholds high scientific standards
Accountable to People
Socially Beneficial
Avoid creating unfair bias
2015
Google DeepMind AlphaGo defeats Go champion
2016
Google’s DeepMind helps detect eye disease
2017
Google invents Transformer kickstarting LLM revolution
2018
Google’s groundbreaking large language model, BERT
2019
Text-to-Text
Transfer Transformer
LLM 10B P Model Open Sourced
2020
Google LaMDA
Model Trained to converse
2022
AlphaFold predicts structures of all known proteins
2023
Bard: conversational AI Service powered by PaLM2
2024
Gemini Family of multimodal LLMs & products
May 2018
Smart Compose for GMail
June 2021
LaMDA chat demonstrated at Google I/O
Where do People Fit In?
2015
Google DeepMind AlphaGo defeats Go champion
2016
Google’s DeepMind helps detect eye disease
2017
Google invents Transformer kickstarting LLM revolution
2018
Google’s groundbreaking large language model, BERT
2019
Text-to-Text
Transfer Transformer
LLM 10B P Model Open Sourced
2020
Google LaMDA
Model Trained to converse
2022
AlphaFold predicts structures of all known proteins
2023
Bard: conversational AI Service powered by PaLM2
2024
Gemini Family of multimodal LLMs & products
Nov 2022
OpenAI introduces ChatGPT.
GPT APIs
Democratizes LLMs
June 2015
Google Photos introduces image search and facial matching
Google Cloud Platform
Service
APIs
Search / Indexing
Conversation
Text-to-speech
Speech-to-text
Vision
Foundation Model APIs
Gemini
Claude
Build it Yourself
Gemma
Llama 3
Hugging Face
TensorFlow
Gemini - The Big Picture
Gemini Models
Other Models
Other Models
Gemini API
Vertex AI
AI Studio
Local Models
Android AI Core
Chrome
Gemini for
Developers
Cloud
Workspace
Gemini chat app
Developers
"Consumers"
Cloud Generative Models
| AI Studio | Vertex AI |
gemini-1.5-pro | ✓ | ✓ |
gemini-1.5-flash | ✓ | ✓ |
gemini-1.0-ultra | | restricted |
gemini-1.0-pro | ✓ | ✓ |
gemini-1.0-pro-vision | deprecated | ✓ |
attributed question / answer | ✓ | |
Cloud Embeddings Models
Input / Output Size
| Input Tokens[*] | Output Tokens[*] |
gemini-1.5-pro | 128 K 2 M | 8 K |
gemini-1.5-flash | 128 K 1 M | 8 K |
gemini-1.0-pro | 32 K | 8 K |
gemini-1.0-pro-vision | 16 K | 2 K |
Input Multimodality in Gemini 1.5
Multiple media enclosures
Media conversion
AI Studio: File API
Vertex AI: �Google Cloud Storage
Output Multimodality
Nope, but see
Imagen API
Text to Speech API
on Vertex AI and Google Cloud Platform
Text generation
| Gemini 1.5 Pro | Gemini 1.5 Flash | Gemini 1.0 Pro | Gemini 1.0 Pro Vision |
Text Completion | ✓ | ✓ | ✓ | ✓ |
Conversational | ✓ | ✓ | ✓ | |
Safety Settings | ✓ | ✓ | ✓ | ✓ |
JSON Mode | ✓ | ✓ | | |
JSON Schema | ✓ | | | |
System Instructions | ✓ | ✓ | some versions | |
Context Caching[*] | ✓ | ✓ | | |
Knowledge Grounding
| Gemini 1.5 Pro | Gemini 1.5 Flash | Gemini 1.0 Pro | Gemini 1.0 Pro Vision |
Function Calling Tool | ✓ | ✓ | ✓ | ✓ |
Citations | Vertex Only | Vertex Only | Vertex / AQA | |
Google Search Tool[*] | Vertex Only | Vertex Only | | |
Vertex AI Search Tool[*] | Vertex Only | Vertex Only | | |
Code Execution Tool | ✓ | ✓ | | |
AQA | | | Semantic Retriever model | |
Tuning
Two Platforms - One Model
AI Studio / Google Generative AI
Easy to get started
Vertex AI
Full Google Cloud integration
Access
Pricing
| AI Studio | Vertex AI |
Gemini 1.5 Flash input (small context) | $0.35 / 1M tokens | $0.125 / 1M char[*] |
Gemini 1.5 Pro input (small context) | $3.50 / 1M tokens | $1.25 / 1M char[*] |
Output | 3x input context | |
Large context (> 128k tokens) | Double small context | |
Context cache creation | Same as base rate | |
Context cache usage | 1/4 price of base rate | |
Context cache storage (1.5 Flash) | $1.00 / 1M tokens / hour | $0.25 / 1M char[*] / hour |
Context cache storage (1.5 Pro) | $4.50 / 1M tokens / hour | $1.125 / 1M char[*] / hour |
Media storage | Free (48 hour storage) | GCS pricing |
How does Gemini Compare?
LMSYS Chatbot Arena - 2024-07-16
Feature | GPT-4o | GPT-4o mini | Gemini-1.5-pro | Gemini-1.5-flash |
Text input | yes | yes | yes | yes |
Image input | yes | yes | yes | yes |
Audio input | "not yet" | "not yet" | yes | yes |
Video input | via frames | "not yet" | via frames | via frames |
- Automatically converts video files to suggested frame rate | no | N/A | yes | yes |
- Suggested frame rate | 2-4 fps | N/A | 1 fps | 1 fps |
Text output | yes | yes | yes | yes |
Image output | no | no | no | no |
Audio output | no | no | no | no |
Video output | no | no | no | no |
Max Context Window | 128k | 128k | 2000k | 1000k |
Free tier | no | no | yes | yes |
Free rate limit - Tokens / Minute | 0 | 0 | 32,000 | 1,000,000 |
Free rate limit - Requests / Minute | 0 | 0 | 2 | 15 |
Free rate limit - Requests / Day | 0 | 0 | 50 | 1,500 |
Base rate limit - Tokens / Minute | 30,000 | 60,000 | 4,000,000 | 4,000,000 |
Base rate limit - Requests / Minute | 500 | 500 | 360 | 1,000 |
Base rate limit - Requests / Day | N/A | 10,000 | 10,000 | N/A |
Price / 1M input token | $5.00 | $0.15 | $3.50 | $0.35 |
Price / 1M output token | $15.00 | $0.60 | $10.50 | $1.05 |
Beyond Gemini in the Cloud
Conclusion
Image from Denis Valášek w/ DALL·E 3
Questions?
https://deepmind.google/technologies/gemini/
https://ai.google.dev/
https://aistudio.google.com/
https://cloud.google.com/vertex-ai
https://console.cloud.google.com/vertex-ai/model-garden
http://spiders.com/
http://prisoner.com/
LinkedIn: Allen Firstenberg
2 Question Survey