ProductLens

Agentic AI demo: LLM-driven product comparison tool that ranks products based on user priorities.

Live Demo · Source · Back to Projects

Problem: Product research is slow and inconsistent when criteria are vague or multi-objective.
Approach: Plan from user intent → research candidates in parallel → normalize findings → score + recommend with tradeoffs.
Outcome: Deployed a tool-enabled comparison workflow with a simple UI; produces decision-ready summaries from live web sources.

Snapshots

Quick look at the UI/workflow.

ProductLens UI: search and criteria input

Search + criteria

Recommendation

Ranking table

ProductLens turns a natural-language request (e.g., “best noise-cancelling headphones for travel and calls”) into a structured comparison: it plans what to evaluate, gathers evidence per product, and returns a ranked recommendation with tradeoffs.

Key components:

Planner: converts intent into clear evaluation criteria and constraints.
Parallel research: spawns one researcher per product to speed up gathering and reduce omissions.
Normalizer: aligns findings into consistent fields, then scores with transparent tradeoffs.
Light tool use: web browsing tool as an input to the comparison.
UI: Gradio demo interface; deployed on Hugging Face Spaces.

Workflow

            User Request
              → Orchestrator (Comparison Manager)
                → Planner Agent (criteria + candidates)
                  → Research Agents (parallel per product)
                    → Comparator / Decision Agent (score + tradeoffs)
                      → Output (Ranked Recommendation + Table + Sources)

Tech stack

Runtime: Python
Agent framework: OpenAI Agents SDK (tool calls + traces)
LLMs: local (Ollama) and/or cloud clients (OpenAI/OpenRouter)
Tooling: Web search tool (or equivalent HTTP-based search)
UI: Gradio

Engineering Notes

Focused agents: each agent does one job (plan, research, compare) to reduce confusion.
Structured intermediate outputs: planner produces a "contract” so downstream agents stay consistent and comparable.
Source grounding: captures/returns sources used during research for transparency.
Failure modes: conflicting specs, outdated reviews, and ambiguous user constraints (e.g., “best” without priorities).

Limitations

Not a full price-tracker; results depend on what sources are available at query time.
Not production-hardened (no persistent storage, ranking audits, or robust source filtering).
Evaluation is qualitative; next step is a repeatable test set of queries and scoring.