Whimsical blogs in the age of AI

Building InsightFlow: LangGraph Implementation & Production Metrics

Technical Deep-Dive and Real Pilot Performance Data from two-phase development

This is Part 2 of my multi-agent AI orchestration analysis. In Part 1, I compared the strategic frameworks →. Now for the implementation reality.

After choosing LangGraph for InsightFlow's multi-agent coordination (covered in Part 1), the real work began: building a system that could orchestrate specialized AI agents into coherent learning conversations. Here's what I've learned across two pilot iterations.


Table of Contents

  1. Two-Phase Pilot Development
  2. Current Architecture
  3. Agent Coordination Pattern
  4. State Management Reality
  5. Evaluation Results
  6. Production Learnings
  7. Architectural Decision Guide
  8. What's Next

1. Two-Phase Pilot Development

Phase 1: Chainlit-Based Prototype

My initial implementation used Chainlit as the primary interface, which proved invaluable for rapid prototyping and understanding multi-agent coordination patterns.

What worked with Chainlit:

Key learnings from Phase 1:

Phase 2: Production Architecture (Current)

Phase 1 taught me that streaming coordination matters, so in Phase 2 I moved the backend to Google Cloud for better async control while preserving the real-time patterns that worked.

Current Architecture:

Current System Architecture


2. Current Architecture

Technical Stack:

Full Architecture Details
FastAPI Backend (Google Cloud)
├── LangGraph Orchestration Layer
│   ├── Agent Nodes: analyst_agent, creative_agent, synthesis_agent
│   ├── Shared State Management
│   └── Real-time streaming coordination
├── PostgreSQL (JSONB conversation state)
└── React/Next.js Frontend (Vercel)

Architecture decisions:


3. Agent Coordination Pattern

The core insight from both phases: treat agent coordination as a state machine where each agent contributes to shared conversation context.

Agent Roles Refined Through Testing:

  1. Analyst Agent - Breaks down complex topics into structured components
  2. Creative Agent - Provides analogies, examples, and alternative perspectives
  3. Synthesis Agent - Combines insights from previous agents into coherent responses

State Management:

class ConversationState(TypedDict):
    messages: List[HumanMessage | AIMessage]
    current_agent: str
    conversation_history: Dict
    coordination_metadata: Dict

Each agent accesses the full conversation state while maintaining its specialized processing logic.

Agent Coordination State Machine


4. State Management Reality

LangGraph + PostgreSQL Implementation

Checkpoint-based persistence (current design):

State synchronization:

# Simplified state update pattern
def update_conversation_state(state: ConversationState, agent_response: str) -> ConversationState:
    state["messages"].append(AIMessage(content=agent_response))
    state["coordination_metadata"]["last_agent"] = agent_type
    state["conversation_history"]["agent_contributions"] += 1
    return state

Comparison with Assistants API

State Management Comparison: Checkpoint vs Thread Persistence

Approach State Management Production Reality
Assistants API Thread-based with 60-day timeout Manual thread ID tracking, context window management up to 128k
LangGraph + PostgreSQL Checkpoint-based state persistence Automatic state recovery, unlimited conversation history

5. Evaluation Results

Testing Approach: Evaluated 25 conversations from Phase 1 Chainlit testing across 5 learning scenarios.

Implementation:

results = evaluate(dataset, [context_precision, faithfulness, answer_relevancy])

Coordination Quality Results

Phase 1 RAGAS Scores (n=25 conversations):

Performance Comparison

Phase Average Latency Infrastructure Development Speed
Phase 1 (Chainlit) ≈2.1s Single service Extremely fast prototyping
Phase 2 (FastAPI) ≈1.8s Multi-service Slower setup, more control

6. Production Learnings

Error Recovery Patterns

What I discovered: Current design handles errors by graceful degradation—the system continues with remaining agents when one fails, preserving conversation context despite agent failures, and implementing retry logic so failed agents can rejoin coordination on the next interaction.

# Example: Graceful degradation when Creative Agent fails
if creative_agent_available:
    response = await orchestrate_full_pipeline(state)
else:
    response = await analyst_synthesis_fallback(state)
    # Log degradation for monitoring

Cost Optimization Strategies

Debugging Multi-Agent Workflows

LangGraph Studio's visual debugging proved essential for:


7. Architectural Decision Guide

Based on testing both approaches, here's when each architecture makes sense:

When Chainlit-Style Architecture Works

When FastAPI + Custom Frontend Wins

The Hybrid Approach

Many teams might benefit from:

  1. Prototype Phase (Chainlit): Validate coordination patterns quickly
  2. Production Graduation (Custom): Scale validated patterns based on proven concepts
  3. Optimization Phase: Cost + performance tuning with full infrastructure control

8. What's Next

Current Status: Phase 2 FastAPI + React architecture actively running pilot evaluations with 15+ learning conversations weekly.

Next explorations: Applying cognitive load theory to agent coordination patterns—when does multi-agent thinking help vs. overwhelm learners? Exploring tighter coordination patterns based on learning science research, particularly around spaced repetition triggers.

Technical Deep-Dives Coming: As the production system stabilizes, I'll share more detailed architecture patterns and evaluation frameworks.

What I'm pondering next:

If you're building similar systems, I'd love to hear your perspectives.


References


Acknowledgments

Special thanks to Laura Funderburk for encouraging me to share these technical explorations with the community.

Thanks also to the AI Maker Space team - Greg Loughnane and Chris Alexiuk - for exceptional instruction that shaped this exploration.

#AIEngineering #LangGraph #MultiAgentSystems #RAGAS #Chainlit #ProdAI


Got experience with multi-agent production systems? I'd especially love to hear about your error handling patterns and cost optimization strategies. What challenges have you solved that I'm still working on? 👇