Off-the-shelf chatbots either hallucinated citations or dropped context across multi-step queries. Faculty needed answers they could trust enough to publish in a course handbook.
Hybrid retrieval (BM25 + dense embeddings), cross-encoder reranking, then a constrained generator that refuses to answer when grounded confidence drops below threshold. State managed through a LangGraph state machine with checkpointing for replay.
- 01Ingestion → chunking with semantic boundaries → embeddings (OpenAI text-embed-3) → pgvector
- 02Hybrid retriever combines lexical and dense recall, top-k = 20
- 03Cross-encoder rerank narrows to top-8 with calibrated scores
- 04Generator runs through LangGraph with deterministic refusal node
- 05Postgres-backed checkpointer for conversation state and audit