FoodDB ๐ŸŽ

Local-first personal nutrition facts catalogue โ€” SQLite, Typer CLI, and Streamlit UI sharing one service layer, with a regex label parser and swappable OCR backends.

GitHub ยท Back to Projects


  • Problem: Tracking nutrition facts across different stores and package sizes requires a tool that stores data once but is queryable from anywhere โ€” CLI for quick adds, UI for browsing.
  • Approach: Layered architecture (ORM โ†’ repository โ†’ services โ†’ dual CLI/UI surfaces); regex-based label parser with optional OCR for image imports; swappable OCR backends behind a clean interface.
  • Outcome: A self-contained, portable local tool with consistent behavior across both surfaces, typed error handling, and an extension-friendly design.

Snapshots

Quick look at the UI/workflow.

Food DB: search Search + criteria
Food DB: import ocr Ranking table

FoodDB is a personal utility project that demonstrates software engineering discipline applied to a real problem. The CLI and Streamlit UI are both thin clients over the same repository and service layer โ€” they share one SQLite file, meaning CLI writes are immediately visible in the UI after a page refresh, with no sync required.

Core capabilities:

  • Full CRUD for foods with complete nutrition facts (macros, micros, ingredients, best-by date)
  • Store variants โ€” track the same product at Costco vs. Aldi with price and package sizing
  • Inventory tracking per variant (quantity on hand)
  • Import from label text (paste) or label image (upload): regex parser extracts structured fields; OCR converts image โ†’ text โ†’ parser
  • Fuzzy search with token overlap + SequenceMatcher scoring; filterable by category, brand, store
  • Export to JSON or CSV
  • Unit conversion table with UI calculator

Architecture


  CLI (Typer)        Streamlit UI
        \                /
         \              /
          โ”€โ”€ Services โ”€โ”€
                |
           Repository
                |
          SQLAlchemy ORM
                |
            SQLite file
            

No HTTP server. Both surfaces call the same Python service layer directly. The DB path comes from a single pydantic-settings config object โ€” change it in one place and both surfaces follow.


Engineering Notes


Design Decisions
  • ORM / Pydantic separation โ€” SQLAlchemy ORM classes map to DB tables; Pydantic schemas are the API layer. No ORM objects leak above the repository boundary.
  • foods vs food_variants โ€” canonical nutrition facts live on the food; store/price/package details live on variants. Avoids duplicating nutrition data across stores.
  • ParsedNutrition.warnings list โ€” explicit, inspectable signal for uncertain extractions. The UI surfaces warnings before saving so the user can correct them, rather than silently saving wrong data.
  • Lazy OCR imports โ€” the package works without Tesseract or pytesseract installed. A clear RuntimeError with install instructions is raised if OCR is called without extras.
OCR Backend Swapping
  • Tesseract (default): grayscale โ†’ bilateral filter โ†’ Otsu threshold โ†’ morphology open โ†’ pytesseract. HEIC/HEIF handled via pillow-heif โ†’ temp PNG.
  • Ollama vision (alternative): image โ†’ PNG bytes โ†’ base64 โ†’ multimodal chat completion via OpenAI-compatible API. Switch with OCR_BACKEND=ollama_vision in .env โ€” no code change required.
  • Typed error handling: OCRException with a typed OCRErrorType enum. The import page translates each error type into an actionable user-facing hint.
Fuzzy Search
  • Normalizes query and candidate (lowercase, & โ†’ and, collapse whitespace)
  • Returns 1.0 for direct substring match
  • Otherwise: 0.65 ร— token_overlap + 0.35 ร— SequenceMatcher_ratio
  • Token overlap handles multi-word queries; SequenceMatcher handles misspellings and OCR noise
  • Thresholds configurable via .env without code changes
Config & Tooling
  • pydantic-settings for all configuration โ€” .env support is automatic; no scattered os.getenv()
  • uv for dependency and environment management
  • Streamlit @st.cache_resource ensures a single DB connection per server process
  • Stable Streamlit widget keys via make_form_key(action, entity_id) to avoid form key collisions across dynamic renders
What I'd Build Next
  • LLM-assisted meal planning โ€” surface inventory items as context for a macro-balancing suggestion agent
  • Schema migrations โ€” replace manual ALTER TABLE with Alembic for safer field additions
  • Smarter search โ€” swap SequenceMatcher for BM25 or lightweight embeddings for better recall on synonyms
  • Barcode scanning โ€” integrate an open nutrition DB API for one-shot imports by barcode

Tech Stack

  • Language: Python 3.11+
  • Storage: SQLite ยท SQLAlchemy ORM
  • CLI: Typer ยท Rich
  • UI: Streamlit
  • Schemas: Pydantic ยท pydantic-settings
  • OCR (optional): pytesseract ยท OpenCV ยท pillow-heif ยท Ollama (OpenAI-compatible vision)
  • Environment: uv