1 of 13

461

//CoolPilot

Albert Sun, Andrew Fortner, Donald Thai

2 of 13

The

Problem

Large Language Models (LLMs) are powerful, but who knows where the data we input goes. It could be used for training, leaked, or even spit out in another answer.

3 of 13

Our Solution

01

4 of 13

Our Solution

To an AI, data is just numbers and patterns, It does not need the real raw data.

Companies want AI code tools to help their developers, but don't want to hand off their internal code to a black box.

We help by creating models on encrypted text that returns encrypted text, allowing for code generation without the actual code.

5 of 13

Homomorphic Encryption

“The conversion of data into ciphertext that can be analyzed and worked with as if it were still in its original form”

6 of 13

Traditional LLMs

Unencrypted Prompt/Training Data

LLM API (GPT)

Data is not safe, providing external sources with our raw data

7 of 13

CoolPilot

Encrypted Prompt/Training Data

CoolPilot API

LLM API (GPT)

Data is encrypted on-prem, guaranteed to be safe and secure as it is processed

8 of 13

How It Works

01

Client-side homomorphic encryption

02

Train LLM with encrypted data

03

Outputs Encrypted Data

04

User decrypts data

9 of 13

What We Tried

  • LLM from scratch
  • Fine tuning pretrained models (GPT-2, codegen)
  • Prompt Engineering with Chat GPT

10 of 13

More Applications

  • Voice and facial recognition
  • Conversations remain safe
  • Training medical models without sharing information.
  • Maintains accuracy while preserving security

11 of 13

DEMO

12 of 13

Future Challenges

  • Training model per user is very expensive
  • Model can predict patterns but does not fundamentally understand language
    • We cannot ask questions about the code yet

13 of 13