1 of 23

Unleashing the Power of Gemini

Google's Next-Gen LLM for Every Developer

AI Camp - 18 July 2024

Image by Gemini Advanced

Image from Denis Valášek w/ DALL·E 3

2 of 23

Who are you?

Software developer / Consultant

New York

Google Developer Expert

Google Cloud Champion Innovator

LangChain.js contributor

Co-host "Two Voice Devs"

http://spiders.com/

http://prisoner.com/

LinkedIn: Allen Firstenberg

3 of 23

Google and AI

Responsible AI

3,000

Researchers

7,000

Publications

Built & Tested for Safety

Privacy in design

Upholds high scientific standards

Accountable to People

Socially Beneficial

Avoid creating unfair bias

2015

Google DeepMind AlphaGo defeats Go champion

2016

Google’s DeepMind helps detect eye disease

2017

Google invents Transformer kickstarting LLM revolution

2018

Google’s groundbreaking large language model, BERT

2019

Text-to-Text

Transfer Transformer

LLM 10B P Model Open Sourced

2020

Google LaMDA

Model Trained to converse

2022

AlphaFold predicts structures of all known proteins

2023

Bard: conversational AI Service powered by PaLM2

2024

Gemini Family of multimodal LLMs & products

4 of 23

May 2018

Smart Compose for GMail

June 2021

LaMDA chat demonstrated at Google I/O

Where do People Fit In?

2015

Google DeepMind AlphaGo defeats Go champion

2016

Google’s DeepMind helps detect eye disease

2017

Google invents Transformer kickstarting LLM revolution

2018

Google’s groundbreaking large language model, BERT

2019

Text-to-Text

Transfer Transformer

LLM 10B P Model Open Sourced

2020

Google LaMDA

Model Trained to converse

2022

AlphaFold predicts structures of all known proteins

2023

Bard: conversational AI Service powered by PaLM2

2024

Gemini Family of multimodal LLMs & products

Nov 2022

OpenAI introduces ChatGPT.

GPT APIs

Democratizes LLMs

June 2015

Google Photos introduces image search and facial matching

5 of 23

Google Cloud Platform

Service

APIs

Search / Indexing

Conversation

Text-to-speech

Speech-to-text

Vision

Foundation Model APIs

Gemini

Claude

Build it Yourself

Gemma

Llama 3

Hugging Face

TensorFlow

6 of 23

7 of 23

Gemini - The Big Picture

Gemini Models

Other Models

Other Models

Gemini API

Vertex AI

AI Studio

Local Models

Android AI Core

Chrome

Gemini for

Developers

Cloud

Workspace

Gemini chat app

Developers

"Consumers"

8 of 23

Cloud Generative Models

AI Studio

Vertex AI

gemini-1.5-pro

gemini-1.5-flash

gemini-1.0-ultra

restricted

gemini-1.0-pro

gemini-1.0-pro-vision

deprecated

attributed question / answer

9 of 23

Cloud Embeddings Models

  • Text embeddings
    • Output: Up to 768 dimensions
    • Task-specific embeddings
  • Multimodal embeddings (Vertex AI)
    • Input: Text, Image, or Video
    • Output: 128, 256, 512, or 1408 dimensions
    • Shared embedding space

10 of 23

Input / Output Size

Input Tokens[*]

Output Tokens[*]

gemini-1.5-pro

128 K

2 M

8 K

gemini-1.5-flash

128 K

1 M

8 K

gemini-1.0-pro

32 K

8 K

gemini-1.0-pro-vision

16 K

2 K

11 of 23

Input Multimodality in Gemini 1.5

  • Text
    • ~4 characters / token
  • Images
    • 259 tokens / image
  • Video
    • Automatic frame splitting @ 1 fps
    • 263 tokens / second[*]
    • No audio
  • Audio
    • 32 tokens / second

Multiple media enclosures

Media conversion

AI Studio: File API

Vertex AI: �Google Cloud Storage

12 of 23

Output Multimodality

Nope, but see

Imagen API

Text to Speech API

on Vertex AI and Google Cloud Platform

13 of 23

Text generation

Gemini 1.5 Pro

Gemini 1.5 Flash

Gemini 1.0 Pro

Gemini 1.0 Pro Vision

Text Completion

Conversational

Safety Settings

JSON Mode

JSON Schema

System Instructions

some versions

Context Caching[*]

14 of 23

Knowledge Grounding

Gemini 1.5 Pro

Gemini 1.5 Flash

Gemini 1.0 Pro

Gemini 1.0 Pro Vision

Function Calling Tool

Citations

Vertex Only

Vertex Only

Vertex / AQA

Google Search Tool[*]

Vertex Only

Vertex Only

Vertex AI Search Tool[*]

Vertex Only

Vertex Only

Code Execution Tool

AQA

Semantic Retriever model

15 of 23

Tuning

  • Currently available for Gemini 1.0 Pro�"Coming soon" for Gemini 1.5 Flash
  • Parameter Efficient Tuning
    • Allows for easy training and deployment
    • Suitable for modest data sets�Hundreds to Thousands of examples
  • Pricing
    • No cost for tuning
    • Base model cost for use

16 of 23

Two Platforms - One Model

AI Studio / Google Generative AI

  • Free tier w/ usage caveats[*]
  • Paid tier w/ privacy
  • Access with API Key / OAuth[**]
  • Media access through File API
  • Billing by token

Easy to get started

Vertex AI

  • Starter credits
  • Private with Data Residency
  • Access through Google Cloud
  • Media access through GCS
  • Billing by character[***]

Full Google Cloud integration

17 of 23

Access

  • AI Studio / Vertex AI Studio web UI
  • REST / gRPC
  • Google Libraries: Python, Node.js
    • AI Studio: Android, iOS, Flutter
    • Vertex AI: Go, Java, C#
  • Vertex AI for Firebase (web and mobile)
  • Third parties
    • LangChain & LangChain.js
    • Llamaindex
    • Variety of other languages
  • OpenAI API Vertex AI endpoint

18 of 23

Pricing

AI Studio

Vertex AI

Gemini 1.5 Flash input (small context)

$0.35 / 1M tokens

$0.125 / 1M char[*]

Gemini 1.5 Pro input (small context)

$3.50 / 1M tokens

$1.25 / 1M char[*]

Output

3x input context

Large context (> 128k tokens)

Double small context

Context cache creation

Same as base rate

Context cache usage

1/4 price of base rate

Context cache storage (1.5 Flash)

$1.00 / 1M tokens / hour

$0.25 / 1M char[*] / hour

Context cache storage (1.5 Pro)

$4.50 / 1M tokens / hour

$1.125 / 1M char[*] / hour

Media storage

Free (48 hour storage)

GCS pricing

19 of 23

How does Gemini Compare?

LMSYS Chatbot Arena - 2024-07-16

20 of 23

Feature

GPT-4o

GPT-4o mini

Gemini-1.5-pro

Gemini-1.5-flash

Text input

yes

yes

yes

yes

Image input

yes

yes

yes

yes

Audio input

"not yet"

"not yet"

yes

yes

Video input

via frames

"not yet"

via frames

via frames

- Automatically converts video files to suggested frame rate

no

N/A

yes

yes

- Suggested frame rate

2-4 fps

N/A

1 fps

1 fps

Text output

yes

yes

yes

yes

Image output

no

no

no

no

Audio output

no

no

no

no

Video output

no

no

no

no

Max Context Window

128k

128k

2000k

1000k

Free tier

no

no

yes

yes

Free rate limit - Tokens / Minute

0

0

32,000

1,000,000

Free rate limit - Requests / Minute

0

0

2

15

Free rate limit - Requests / Day

0

0

50

1,500

Base rate limit - Tokens / Minute

30,000

60,000

4,000,000

4,000,000

Base rate limit - Requests / Minute

500

500

360

1,000

Base rate limit - Requests / Day

N/A

10,000

10,000

N/A

Price / 1M input token

$5.00

$0.15

$3.50

$0.35

Price / 1M output token

$15.00

$0.60

$10.50

$1.05

21 of 23

Beyond Gemini in the Cloud

  • Gemini Nano on Android and Chrome
  • Access Claude and Mistral families through Vertex AI "Model-as-a-service"
  • Gemma family of open models
  • Hugging Face models deployed on Vertex AI
  • Imagen family of image generation models
  • Other tuned models for code generation, medical processing, etc.

22 of 23

  • Access to Gemini is available through the AI Studio API and Vertex AI API
    • Use your favorite language / library
  • Up to 2M tokens - largest in industry
  • Features that match or exceed other platforms
  • Competitive pricing and performance
  • Builds on years of experience
    • AI and scalable cloud computing

Conclusion

Image from Denis Valášek w/ DALL·E 3

23 of 23

Questions?

https://deepmind.google/technologies/gemini/

https://ai.google.dev/

https://aistudio.google.com/

https://cloud.google.com/vertex-ai

https://console.cloud.google.com/vertex-ai/model-garden

http://spiders.com/

http://prisoner.com/

LinkedIn: Allen Firstenberg

2 Question Survey