Plate2Recipe – Food Image to Recipe Generation
Multimodal deep learning system generating recipes directly from food images.
This project aims to create an end-to-end system that converts food images into structured recipes, combining computer vision and natural language processing techniques.
- Used Vision Transformers (ViT) for ingredient recognition and classification from food images, leveraging datasets like Food-101, Recipe1M, and RecipeNLG.
- Fine-tuned GPT-2 and trained LSTMs on RecipeNLG to produce coherent cooking instructions based on identified ingredients.
- Explored various aspects such as dataset preprocessing using HuggingFace datasets, data augmentation techniques, and hyperparameter tuning to optimize model performance.
- Achieved promising qualitative results (structured recipes) but highlighted challenges in scalability and robustness in real-world applications.