Real Time LLM Enrichment Pipeline

Overview

A streaming service that enriches live Kafka events with LLM generated labels and scores in near real time, built to stay within a fixed latency and cost budget under heavy load.

The problem

Inline model calls per event do not scale on latency or cost. The goal was real time enrichment that degrades gracefully instead of blocking the stream.

Approach

Treated the model as a rate limited dependency: batching, caching, and budget aware passthrough.
Added a reliability layer with retries, timeouts, and a circuit breaker.
Built an eval suite that gates deploys on label quality.

Outcome

Held steady throughput with enrichment within roughly [X] ms of the source event at about [Y] per million events. Real figures go here once the run is published.