Work

Systems I've shipped — explained the way I'd explain them in an interview.

Each case study covers the problem, the approach, the architecture, and the outcome. No screenshots without context.

Problem

Off-the-shelf chatbots either hallucinated citations or dropped context across multi-step queries. Faculty needed answers they could trust enough to publish in a course handbook.

Approach

Hybrid retrieval (BM25 + dense embeddings), cross-encoder reranking, then a constrained generator that refuses to answer when grounded confidence drops below threshold. State managed through a LangGraph state machine with checkpointing for replay.

Architecture
  • 01Ingestion → chunking with semantic boundaries → embeddings (OpenAI text-embed-3) → pgvector
  • 02Hybrid retriever combines lexical and dense recall, top-k = 20
  • 03Cross-encoder rerank narrows to top-8 with calibrated scores
  • 04Generator runs through LangGraph with deterministic refusal node
  • 05Postgres-backed checkpointer for conversation state and audit
Outcome
94%
Citation accuracy
1.8s
p95 latency
12k+
Sources indexed
Problem

Faculty couldn't answer 'how is each team doing?' without manual chasing. Students didn't know what 'on track' looked like. Reviewers received PDFs with no context.

Approach

Built a structured workspace around project lifecycle (proposal → milestones → review). Layered AI-assisted progress summaries that read commit history and standup notes to produce honest weekly updates faculty actually read.

Architecture
  • 01Django REST backend with role-based permissions (student / mentor / reviewer)
  • 02React frontend with optimistic UI for status updates
  • 03Background jobs summarize weekly activity per team
  • 04Notification system routes only the changes that matter
Outcome
40+
Active project teams
~6 hrs
Faculty time saved / week
Department-wide
Adoption
Problem

Existing aptitude tools optimize for grinding. Students hit plateaus they can't see, so they drill the same topic instead of the next one.

Approach

Modeled topics as a directed graph with prerequisites. Per-skill scores update with every attempt; the engine selects the next test set from the frontier of weakest skills with at-risk dependencies.

Architecture
  • 01Topic graph with weighted edges for prerequisite strength
  • 02Per-skill score model updated via Bayesian smoothing
  • 03Test composer pulls from question bank balanced by skill targets
  • 04React frontend with progress visualisations students actually understand
Outcome
1,500+
Question bank
12
Test types
In production
Status
Problem

General-purpose social platforms reward outrage and recency. Knowledge communities lose their best contributors to noise.

Approach

Topic-first navigation with structured threads, contributor reputation tied to peer endorsement, and zero algorithmic ordering by default.

Architecture
  • 01Topic graph with moderator-curated taxonomy
  • 02Thread model with parent / reply / endorsement edges
  • 03Reputation system computed nightly from peer signals
  • 04Search built on Postgres full-text + topic filters
Outcome
5
Beta communities
8 replies
Avg. thread depth
Pilot
Status

Want the architecture diagrams?

Happy to walk through any of these systems in detail — including trade-offs, failure modes, and what I'd do differently next time.

Get in touch