FIRE: Food Image to REcipe generation
Prateek Chhikara, Dhiraj Chaurasia, Yifan Jiang, Omkar Masur, and Filip Ilievski
{pchhikar, dchauras, yjiang44, omasur}@usc.edu, f.ilievski@vu.nl
Background
Goal
Contributions
through integration with few-shot prompting of large LMs.
Proposed Methodology
Proposed architecture to extract ingredients, and generate the recipe title and cooking instructions from a food image. (Ingredients with quantity is passed during the train time only)
1. Title Generation: Used BLIP model which is fine-tuned on 10% of Recipe1M dataset.
2. Ingredient Extraction: Extracted features from input food image using Vision Transformer (ViT). Image embeddings are passed through an ingredient decoder.
Proposed Methodology (... continued)
3. Cooking Instruction Generation: We used recipe title and ingredients to fine-tune a T5 model. During test time, title and ingredients from previous two stages are passed to fine-tuned T5 model.
Experiment Setup
RI2L (retrieval-based), RI2LR (retrieval-based), FFTD, and InverseCooking
InverseCooking and ChefTransformer
IoU and F1
SacreBLEU and RougeL
Results
End-to-end scores
Ablation Study
Case Study
FIRE Applications
FIRE Applications – Analysis
Conducted a human evaluation with seven experts involving 10 recipes and their customizations.
1. Recipe Customization
2. Recipe-to-Machine-Code Generation
Future Work
Thanks
Reach out for questions at pchhikar@usc.edu or connect at