Ramakrishna
// light mode is a myth here
← All work
AI Project

Real Time LLM Enrichment Pipeline

2026 · Java · Apache Kafka · LLM APIs · Redis · Kubernetes
[ project cover / screenshot ]

Overview

A streaming service that enriches live Kafka events with LLM generated labels and scores in near real time, built to stay within a fixed latency and cost budget under heavy load.

The problem

Inline model calls per event do not scale on latency or cost. The goal was real time enrichment that degrades gracefully instead of blocking the stream.

Approach

  • Treated the model as a rate limited dependency: batching, caching, and budget aware passthrough.
  • Added a reliability layer with retries, timeouts, and a circuit breaker.
  • Built an eval suite that gates deploys on label quality.

Outcome

Held steady throughput with enrichment within roughly [X] ms of the source event at about [Y] per million events. Real figures go here once the run is published.

JavaApache KafkaLLM APIsRedisKubernetes