MenuSights

Helping at-risk patients choose low-cholesterol items at local restaurants

31.7%

of Americans have high LDL cholesterol

High cholesterol can be a result of:

  • Familial hypercholesterolemia
  • Aging
  • Lifestyle and activity level
  • Obesity

High cholesterol can lead to heart disease

… but following a low cholesterol diet can reduce blood cholesterol by 10-15%

Source: CDC

...many independent restaurants don't have the resources to provide detailed nutritional information

While it’s easy to find nutritional information for chain restaurant menus...

MenuSights ranks menu items for cholesterol content using natural language processing and nutritional analysis

Training

Ground Truth

Logistic Regression: assigning probabilities to cholesterol category

Chicken Pad Thai

2-gram classification coefficient

Word classification coefficient

0.2

0.01

0.3

0.0

0.5

V. high

High

Medium

Low

Data workflow

  • Scrape 3,500 recipes from allrecipes.com; 1765 restaurants from Zomato
  • Divide recipes into low/mod./high/very-high cholesterol categories
  • N-gram, stem and tokenize words
  • Vectorize recipe names
  • Fit and assess machine learning models
  • Build frontend and apply model to unknown restaurant data

Collect data

Process text

Model, apply and present

Coefficients describe not just cholesterol content of foods, but also other words commonly associated with a particular cholesterol level

Word

Coefficient

“shrimp”

4.2

“rum”

4.2

“quiche”

4.6

“frittata”

4.86

“seafood”

5.2

Very High Cholesterol

Word

Coefficient

“veggie”

2.83

“cauliflower”

3.02

“japanese”

3.10

“vegan”

3.15

“moroccan chicken”

4.4

Low Cholesterol

The model successfully predicts many high-cholesterol items, but fails with creatively-named menu items

Item

Prediction

True cholesterol

“Popcorn Shrimp”

high

high
A serving of shrimp contains

64% of daily cholesterol

“The Anti-Salad”

low

high
Description reveals that item

is a meat platter!

The MenuSights Logistic Regression Model predicts cholesterol with 57% accuracy

Tested model against ground truth

57% accuracy

(random chance predicts 25%)

Ground truth cholesterol

Predicted cholesterol

Low

Med

High

V. high

Low

Med

High

V. high

Andy Lane, Ph.D.

Postdoc: CRISPR Bioinformatics/Molecular Biology

Ph.D. in Molecular and Cell Biology

Why did I use home recipes to train a restaurant analysis engine?

Initial strategy:

Can I cluster similar recipes by ingredient?

Has extra data in the form of ingredient lists

Just recipe names & some descriptions

Match accuracy is refined using ingredient cluster matching

Chicken Pad Thai

Spinach Pizza

Mary’s Special Chicken Soup

Adjectivalized nouns

before dish name

Low-frequency uninformative descriptors

first if present

Primary dish name
end words

Scoping the problem: restaurant menu items aren’t the same as home-cooked recipes

Qualitative differences between recipes and menu items:

  • No serving size information
  • Names structured differently
  • Items may be “composite”, i.e. served with other ingredients or dishes
  • Absolute cholesterol values are unlikely to be the same
  • However, classification of menu items may be possible

Sirloin Steak, 9 oz

Pete’s Awesome Sirloin

MenuSights Demo - Google Slides