Plate2Recipe – Food Image to Recipe Generation

Multimodal deep learning system generating recipes directly from food images.

This project aims to create an end-to-end system that converts food images into structured recipes, combining computer vision and natural language processing techniques.

  • Used Vision Transformers (ViT) for ingredient recognition and classification from food images, leveraging datasets like Food-101, Recipe1M, and RecipeNLG.
  • Fine-tuned GPT-2 and trained LSTMs on RecipeNLG to produce coherent cooking instructions based on identified ingredients.
  • Explored various aspects such as dataset preprocessing using HuggingFace datasets, data augmentation techniques, and hyperparameter tuning to optimize model performance.
  • Achieved promising qualitative results (structured recipes) but highlighted challenges in scalability and robustness in real-world applications.