JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 13

461

//CoolPilot

Albert Sun, Andrew Fortner, Donald Thai

2 of 13

The

Problem

Large Language Models (LLMs) are powerful, but who knows where the data we input goes. It could be used for training, leaked, or even spit out in another answer.

3 of 13

Our Solution

4 of 13

Our Solution

To an AI, data is just numbers and patterns, It does not need the real raw data.

Companies want AI code tools to help their developers, but don't want to hand off their internal code to a black box.

We help by creating models on encrypted text that returns encrypted text, allowing for code generation without the actual code.

5 of 13

Homomorphic Encryption

“The conversion of data into ciphertext that can be analyzed and worked with as if it were still in its original form”

6 of 13

Traditional LLMs

Unencrypted Prompt/Training Data

LLM API (GPT)

Data is not safe, providing external sources with our raw data

7 of 13

CoolPilot

Encrypted Prompt/Training Data

CoolPilot API

LLM API (GPT)

Data is encrypted on-prem, guaranteed to be safe and secure as it is processed

8 of 13

How It Works

Client-side homomorphic encryption

Train LLM with encrypted data

Outputs Encrypted Data

User decrypts data

9 of 13

What We Tried

LLM from scratch
Fine tuning pretrained models (GPT-2, codegen)
Prompt Engineering with Chat GPT

10 of 13

More Applications

Voice and facial recognition
Conversations remain safe
Training medical models without sharing information.
Maintains accuracy while preserving security

11 of 13

DEMO

12 of 13

Future Challenges

Training model per user is very expensive
Model can predict patterns but does not fundamentally understand language

We cannot ask questions about the code yet