Building InsightFlow: LangGraph Implementation & Production Metrics

2025-05-26

Technical Deep-Dive and Real Pilot Performance Data from two-phase development

This is Part 2 of my multi-agent AI orchestration analysis. In Part 1, I compared the strategic frameworks →. Now for the implementation reality.

After choosing LangGraph for InsightFlow's multi-agent coordination (covered in Part 1), the real work began: building a system that could orchestrate specialized AI agents into coherent learning conversations. Here's what I've learned across two pilot iterations.

Two-Phase Pilot Development
Current Architecture
Agent Coordination Pattern
State Management Reality
Evaluation Results
Production Learnings
Architectural Decision Guide
What's Next

1. Two-Phase Pilot Development

Phase 1: Chainlit-Based Prototype

My initial implementation used Chainlit as the primary interface, which proved invaluable for rapid prototyping and understanding multi-agent coordination patterns.

What worked with Chainlit:

Rapid prototyping: From concept to working multi-agent interface in days
Real-time streaming: Agent coordination visible to users as it happened
Built-in RAGAS integration: Could evaluate conversation quality in real-time
Development velocity: Perfect for experimenting with agent coordination patterns

Key learnings from Phase 1:

Multi-agent coordination needs visual feedback for both developers and users
Real-time streaming makes the "thinking process" transparent and engaging
RAGAS metrics provide valuable signals for tuning agent interactions
State management becomes critical as conversations extend beyond simple exchanges

Phase 2: Production Architecture (Current)

Phase 1 taught me that streaming coordination matters, so in Phase 2 I moved the backend to Google Cloud for better async control while preserving the real-time patterns that worked.

Current Architecture:

FastAPI Backend (Google Cloud) - Python-friendly ML stack with better async processing
React/Next.js Frontend (Vercel) - More flexible UI for complex interactions
PostgreSQL - Robust state persistence for extended conversations
LangGraph Orchestration - Refined agent coordination patterns from Phase 1

Current System Architecture

2. Current Architecture

Technical Stack:

Full Architecture Details

FastAPI Backend (Google Cloud)
├── LangGraph Orchestration Layer
│   ├── Agent Nodes: analyst_agent, creative_agent, synthesis_agent
│   ├── Shared State Management
│   └── Real-time streaming coordination
├── PostgreSQL (JSONB conversation state)
└── React/Next.js Frontend (Vercel)

Architecture decisions:

Google Cloud for FastAPI: Python-friendly ML stack with better async control
Vercel for React frontend: Optimized for real-time WebSocket connections
PostgreSQL over document stores: JSONB provides flexibility with ACID guarantees
Retained streaming patterns: Chainlit taught me that visible coordination matters

3. Agent Coordination Pattern

The core insight from both phases: treat agent coordination as a state machine where each agent contributes to shared conversation context.

Agent Roles Refined Through Testing:

Analyst Agent - Breaks down complex topics into structured components
Creative Agent - Provides analogies, examples, and alternative perspectives
Synthesis Agent - Combines insights from previous agents into coherent responses

State Management:

class ConversationState(TypedDict):
    messages: List[HumanMessage | AIMessage]
    current_agent: str
    conversation_history: Dict
    coordination_metadata: Dict

Each agent accesses the full conversation state while maintaining its specialized processing logic.

Agent Coordination State Machine

4. State Management Reality

LangGraph + PostgreSQL Implementation

Checkpoint-based persistence (current design):

Conversation state saved to PostgreSQL at natural conversation breakpoints
JSONB storage for flexible conversation state and learning analytics
Automatic state recovery for conversations spanning multiple sessions
No context window limitations for extended learning sessions

State synchronization:

# Simplified state update pattern
def update_conversation_state(state: ConversationState, agent_response: str) -> ConversationState:
    state["messages"].append(AIMessage(content=agent_response))
    state["coordination_metadata"]["last_agent"] = agent_type
    state["conversation_history"]["agent_contributions"] += 1
    return state

Comparison with Assistants API

State Management Comparison: Checkpoint vs Thread Persistence

Approach	State Management	Production Reality
Assistants API	Thread-based with 60-day timeout	Manual thread ID tracking, context window management up to 128k
LangGraph + PostgreSQL	Checkpoint-based state persistence	Automatic state recovery, unlimited conversation history

5. Evaluation Results

Testing Approach: Evaluated 25 conversations from Phase 1 Chainlit testing across 5 learning scenarios.

Implementation:

results = evaluate(dataset, [context_precision, faithfulness, answer_relevancy])

Coordination Quality Results

Phase 1 RAGAS Scores (n=25 conversations):

Context Precision: 0.89 (measures how well agents use relevant information rather than hallucinating—critical for learning applications where accuracy builds trust)
Faithfulness: 0.92 (multi-agent response consistency across the conversation)
Answer Relevancy: 0.87 (topic coherence across agent contributions)

Performance Comparison

Phase	Average Latency	Infrastructure	Development Speed
Phase 1 (Chainlit)	≈2.1s	Single service	Extremely fast prototyping
Phase 2 (FastAPI)	≈1.8s	Multi-service	Slower setup, more control

6. Production Learnings

Error Recovery Patterns

What I discovered: Current design handles errors by graceful degradation—the system continues with remaining agents when one fails, preserving conversation context despite agent failures, and implementing retry logic so failed agents can rejoin coordination on the next interaction.

# Example: Graceful degradation when Creative Agent fails
if creative_agent_available:
    response = await orchestrate_full_pipeline(state)
else:
    response = await analyst_synthesis_fallback(state)
    # Log degradation for monitoring

Cost Optimization Strategies

Conditional agent activation: Not every conversation requires all three agents (reduced per-conversation costs by ~40% compared to always-on three-agent coordination)
Shared context efficiency: Single state object reduces redundant context passing
Optimized checkpointing: Balance between data safety and database load

Debugging Multi-Agent Workflows

LangGraph Studio's visual debugging proved essential for:

Flow visualization: See exactly where conversations get stuck
State inspection: Real-time view of shared state changes
Performance bottlenecks: Identify which agents create latency

7. Architectural Decision Guide

Based on testing both approaches, here's when each architecture makes sense:

When Chainlit-Style Architecture Works

Rapid prototyping: Getting multi-agent concepts working in days, not weeks
Research and experimentation: Understanding how agents should coordinate
Educational applications: Built-in streaming and evaluation tools
Small team development: Single developer can manage entire stack

When FastAPI + Custom Frontend Wins

Production scalability: Need to handle multiple concurrent users
Custom user experiences: UI requirements beyond standard chat interfaces
Enterprise integration: APIs for connecting to existing systems
Long-term maintenance: Full control over infrastructure and dependencies

The Hybrid Approach

Many teams might benefit from:

Prototype Phase (Chainlit): Validate coordination patterns quickly
Production Graduation (Custom): Scale validated patterns based on proven concepts
Optimization Phase: Cost + performance tuning with full infrastructure control

8. What's Next

Current Status: Phase 2 FastAPI + React architecture actively running pilot evaluations with 15+ learning conversations weekly.

Next explorations: Applying cognitive load theory to agent coordination patterns—when does multi-agent thinking help vs. overwhelm learners? Exploring tighter coordination patterns based on learning science research, particularly around spaced repetition triggers.

Technical Deep-Dives Coming: As the production system stabilizes, I'll share more detailed architecture patterns and evaluation frameworks.

What I'm pondering next:

Graceful degradation patterns - Maintaining conversation coherence when agents fail
Conditional intelligence economics - Optimizing conversation complexity to computational cost
Cross-session memory architectures - Systems that remember reasoning patterns, not just dialogue

If you're building similar systems, I'd love to hear your perspectives.

References

LangGraph Multi-Agent Workflows Documentation. https://langchain-ai.github.io/langgraph/concepts/multi_agent/
RAGAS: Automated Evaluation Framework. https://docs.ragas.io/en/stable/
Es, S., et al. (2023). RAGAS: Automated Evaluation of Retrieval Augmented Generation. arXiv preprint arXiv:2309.15217
Chainlit Documentation. https://docs.chainlit.io/
Yao, S., et al. (2022). ReAct: Synergizing Reasoning and Acting in Language Models. arXiv preprint arXiv:2210.03629
Wei, J., et al. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv preprint arXiv:2201.11903
Xi, Z., et al. (2023). The Rise and Potential of Large Language Model Based Agents: A Survey. arXiv preprint arXiv:2309.07864
Wang, G., et al. (2023). Voyager: An Open-Ended Embodied Agent with Large Language Models. arXiv preprint arXiv:2305.16291

Acknowledgments

Special thanks to Laura Funderburk for encouraging me to share these technical explorations with the community.

Thanks also to the AI Maker Space team - Greg Loughnane and Chris Alexiuk - for exceptional instruction that shaped this exploration.

#AIEngineering #LangGraph #MultiAgentSystems #RAGAS #Chainlit #ProdAI

Got experience with multi-agent production systems? I'd especially love to hear about your error handling patterns and cost optimization strategies. What challenges have you solved that I'm still working on? 👇

Whimsical blogs in the age of AI