FoodDB ๐
Local-first personal nutrition facts catalogue โ SQLite, Typer CLI, and Streamlit UI sharing one service layer, with a regex label parser and swappable OCR backends.
- Problem: Tracking nutrition facts across different stores and package sizes requires a tool that stores data once but is queryable from anywhere โ CLI for quick adds, UI for browsing.
- Approach: Layered architecture (ORM โ repository โ services โ dual CLI/UI surfaces); regex-based label parser with optional OCR for image imports; swappable OCR backends behind a clean interface.
- Outcome: A self-contained, portable local tool with consistent behavior across both surfaces, typed error handling, and an extension-friendly design.
Snapshots
Quick look at the UI/workflow.
FoodDB is a personal utility project that demonstrates software engineering discipline applied to a real problem. The CLI and Streamlit UI are both thin clients over the same repository and service layer โ they share one SQLite file, meaning CLI writes are immediately visible in the UI after a page refresh, with no sync required.
Core capabilities:
- Full CRUD for foods with complete nutrition facts (macros, micros, ingredients, best-by date)
- Store variants โ track the same product at Costco vs. Aldi with price and package sizing
- Inventory tracking per variant (quantity on hand)
- Import from label text (paste) or label image (upload): regex parser extracts structured fields; OCR converts image โ text โ parser
- Fuzzy search with token overlap + SequenceMatcher scoring; filterable by category, brand, store
- Export to JSON or CSV
- Unit conversion table with UI calculator
Architecture
CLI (Typer) Streamlit UI
\ /
\ /
โโ Services โโ
|
Repository
|
SQLAlchemy ORM
|
SQLite file
No HTTP server. Both surfaces call the same Python service layer directly.
The DB path comes from a single pydantic-settings config object โ change it in one place and both surfaces follow.
Engineering Notes
Design Decisions
- ORM / Pydantic separation โ SQLAlchemy ORM classes map to DB tables; Pydantic schemas are the API layer. No ORM objects leak above the repository boundary.
- foods vs food_variants โ canonical nutrition facts live on the food; store/price/package details live on variants. Avoids duplicating nutrition data across stores.
- ParsedNutrition.warnings list โ explicit, inspectable signal for uncertain extractions. The UI surfaces warnings before saving so the user can correct them, rather than silently saving wrong data.
-
Lazy OCR imports โ the package works without Tesseract or pytesseract installed.
A clear
RuntimeErrorwith install instructions is raised if OCR is called without extras.
OCR Backend Swapping
- Tesseract (default): grayscale โ bilateral filter โ Otsu threshold โ morphology open โ pytesseract. HEIC/HEIF handled via pillow-heif โ temp PNG.
-
Ollama vision (alternative): image โ PNG bytes โ base64 โ multimodal chat completion via OpenAI-compatible API.
Switch with
OCR_BACKEND=ollama_visionin.envโ no code change required. -
Typed error handling:
OCRExceptionwith a typedOCRErrorTypeenum. The import page translates each error type into an actionable user-facing hint.
Fuzzy Search
- Normalizes query and candidate (lowercase,
&โand, collapse whitespace) - Returns 1.0 for direct substring match
- Otherwise:
0.65 ร token_overlap + 0.35 ร SequenceMatcher_ratio - Token overlap handles multi-word queries; SequenceMatcher handles misspellings and OCR noise
- Thresholds configurable via
.envwithout code changes
Config & Tooling
pydantic-settingsfor all configuration โ.envsupport is automatic; no scatteredos.getenv()uvfor dependency and environment management- Streamlit
@st.cache_resourceensures a single DB connection per server process - Stable Streamlit widget keys via
make_form_key(action, entity_id)to avoid form key collisions across dynamic renders
What I'd Build Next
- LLM-assisted meal planning โ surface inventory items as context for a macro-balancing suggestion agent
- Schema migrations โ replace manual
ALTER TABLEwith Alembic for safer field additions - Smarter search โ swap SequenceMatcher for BM25 or lightweight embeddings for better recall on synonyms
- Barcode scanning โ integrate an open nutrition DB API for one-shot imports by barcode
Tech Stack
- Language: Python 3.11+
- Storage: SQLite ยท SQLAlchemy ORM
- CLI: Typer ยท Rich
- UI: Streamlit
- Schemas: Pydantic ยท pydantic-settings
- OCR (optional): pytesseract ยท OpenCV ยท pillow-heif ยท Ollama (OpenAI-compatible vision)
- Environment: uv