general7 min read

Webhooks and Async Patterns for Long-Running Analysis

Patterns for handling DataStoryBot's analysis time: polling, webhooks, and queue-based architectures for production integrations.

By DataStoryBot Team

Webhooks and Async Patterns for Long-Running Analysis

DataStoryBot's /analyze endpoint takes 10-120 seconds to return. That's fine for a CLI script. It's a problem for a web application where users are staring at a loading spinner, or for a backend service that's holding open an HTTP connection while 200 other requests queue behind it.

Long-running API calls need async patterns. This article covers three approaches — polling, webhooks, and queue-based architectures — with production code for each.

Why Analysis Takes Time

The timing breakdown for a typical /analyze call:

  • Container setup: 1-3 seconds (if a new container is needed)
  • Prompt construction: < 1 second
  • Code generation: 2-5 seconds (LLM generates the Python analysis code)
  • Code execution: 5-60 seconds (pandas operations, statistical tests, chart rendering)
  • Result extraction: 1-2 seconds (parsing Code Interpreter output into story angles)

The execution step is the bottleneck. Complex analysis on large datasets — correlation matrices, multi-group comparisons, time-series decomposition — takes longer. Simple analyses on small datasets can complete in 10-15 seconds.

Pattern 1: Client-Side Polling

The simplest approach. The client sends the request, shows a loading state, and polls for results.

Frontend (React)

function AnalysisLoader({ containerId, steering, onComplete }) {
  const [status, setStatus] = useState("starting");
  const [elapsed, setElapsed] = useState(0);

  useEffect(() => {
    let cancelled = false;

    async function run() {
      setStatus("analyzing");
      const startTime = Date.now();

      // Start analysis
      const res = await fetch("/api/analyze", {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({ containerId, steering }),
      });

      if (!cancelled) {
        const data = await res.json();
        setStatus("complete");
        setElapsed(Math.round((Date.now() - startTime) / 1000));
        onComplete(data);
      }
    }

    // Update elapsed time every second
    const timer = setInterval(() => {
      setElapsed((e) => e + 1);
    }, 1000);

    run();

    return () => {
      cancelled = true;
      clearInterval(timer);
    };
  }, [containerId, steering]);

  return (
    <div className="flex items-center gap-3 p-4 rounded bg-blue-50">
      <div className="animate-spin h-5 w-5 border-2 border-blue-600 border-t-transparent rounded-full" />
      <div>
        <div className="font-medium">
          {status === "analyzing" ? "Analyzing your data..." : "Complete"}
        </div>
        <div className="text-sm text-gray-500">
          {elapsed}s elapsed
          {elapsed > 15 && " — complex datasets take up to 2 minutes"}
        </div>
      </div>
    </div>
  );
}

Backend (Streaming Response)

Instead of holding the connection open, use Server-Sent Events to stream status updates:

// app/api/analyze-stream/route.ts
import { NextRequest } from "next/server";

export async function POST(request: NextRequest) {
  const { containerId, steering } = await request.json();

  const encoder = new TextEncoder();
  const stream = new ReadableStream({
    async start(controller) {
      // Send initial status
      controller.enqueue(
        encoder.encode(`data: {"status": "started"}\n\n`)
      );

      try {
        // Run analysis (this blocks for 10-120s)
        const result = await fetch("https://datastory.bot/api/analyze", {
          method: "POST",
          headers: { "Content-Type": "application/json" },
          body: JSON.stringify({
            containerId,
            steeringPrompt: steering,
          }),
        });

        const stories = await result.json();

        controller.enqueue(
          encoder.encode(
            `data: {"status": "complete", "stories": ${JSON.stringify(stories)}}\n\n`
          )
        );
      } catch (error) {
        controller.enqueue(
          encoder.encode(
            `data: {"status": "error", "message": "Analysis failed"}\n\n`
          )
        );
      }

      controller.close();
    },
  });

  return new Response(stream, {
    headers: {
      "Content-Type": "text/event-stream",
      "Cache-Control": "no-cache",
      Connection: "keep-alive",
    },
  });
}

Frontend SSE Consumer

function startAnalysis(containerId, steering, onStatus) {
  return new Promise((resolve, reject) => {
    const response = fetch("/api/analyze-stream", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ containerId, steering }),
    });

    response.then(async (res) => {
      const reader = res.body.getReader();
      const decoder = new TextDecoder();

      while (true) {
        const { done, value } = await reader.read();
        if (done) break;

        const text = decoder.decode(value);
        const lines = text.split("\n").filter((l) => l.startsWith("data: "));

        for (const line of lines) {
          const data = JSON.parse(line.slice(6));
          onStatus(data);

          if (data.status === "complete") resolve(data.stories);
          if (data.status === "error") reject(new Error(data.message));
        }
      }
    });
  });
}

Pattern 2: Job Queue

For applications that can't hold connections open (serverless functions, API gateways with short timeouts), use a job queue.

Client → POST /api/analyze → Returns job_id immediately
Client → GET /api/jobs/{job_id} → Returns status (pending/running/complete)
Worker → Picks up job → Calls DataStoryBot → Stores result
Client → GET /api/jobs/{job_id} → Returns completed result

Job Submission

// app/api/analyze/route.ts
import { randomUUID } from "crypto";

export async function POST(request: NextRequest) {
  const { containerId, steering } = await request.json();

  const jobId = randomUUID();

  // Store job in database
  await db.jobs.create({
    id: jobId,
    status: "pending",
    containerId,
    steering,
    createdAt: new Date(),
  });

  // Enqueue for processing
  await queue.push({
    jobId,
    containerId,
    steering,
  });

  return NextResponse.json({ jobId, status: "pending" });
}

Job Status

// app/api/jobs/[id]/route.ts
export async function GET(
  request: NextRequest,
  { params }: { params: { id: string } }
) {
  const job = await db.jobs.findById(params.id);

  if (!job) {
    return NextResponse.json({ error: "Job not found" }, { status: 404 });
  }

  if (job.status === "complete") {
    return NextResponse.json({
      status: "complete",
      result: job.result,
    });
  }

  return NextResponse.json({
    status: job.status,
    createdAt: job.createdAt,
  });
}

Worker

// worker.ts
async function processJob(job) {
  await db.jobs.update(job.jobId, { status: "running" });

  try {
    const stories = await fetch("https://datastory.bot/api/analyze", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({
        containerId: job.containerId,
        steeringPrompt: job.steering,
      }),
    }).then((r) => r.json());

    // Refine top story
    const report = await fetch("https://datastory.bot/api/refine", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({
        containerId: job.containerId,
        selectedStoryTitle: stories[0].title,
      }),
    }).then((r) => r.json());

    await db.jobs.update(job.jobId, {
      status: "complete",
      result: { stories, report },
    });
  } catch (error) {
    await db.jobs.update(job.jobId, {
      status: "failed",
      error: error.message,
    });
  }
}

Client Polling

async function waitForResult(jobId, pollInterval = 3000) {
  while (true) {
    const res = await fetch(`/api/jobs/${jobId}`);
    const data = await res.json();

    if (data.status === "complete") return data.result;
    if (data.status === "failed") throw new Error("Analysis failed");

    await new Promise((r) => setTimeout(r, pollInterval));
  }
}

Pattern 3: Webhook Callback

For service-to-service integrations where the client doesn't poll:

Client → POST /api/analyze { callbackUrl: "https://your-app.com/webhook" }
Worker → Calls DataStoryBot → POST result to callbackUrl

Submission with Callback

export async function POST(request: NextRequest) {
  const { containerId, steering, callbackUrl } = await request.json();

  const jobId = randomUUID();

  await queue.push({ jobId, containerId, steering, callbackUrl });

  return NextResponse.json({ jobId, status: "accepted" });
}

Worker with Callback

async function processJobWithCallback(job) {
  try {
    const result = await runAnalysis(job.containerId, job.steering);

    // Deliver result via webhook
    await fetch(job.callbackUrl, {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        "X-Job-Id": job.jobId,
        "X-Signature": sign(result, webhookSecret),
      },
      body: JSON.stringify({
        jobId: job.jobId,
        status: "complete",
        result,
      }),
    });
  } catch (error) {
    await fetch(job.callbackUrl, {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        "X-Job-Id": job.jobId,
      },
      body: JSON.stringify({
        jobId: job.jobId,
        status: "failed",
        error: error.message,
      }),
    });
  }
}

Always sign webhook payloads so the receiver can verify they came from your system.

Choosing the Pattern

PatternBest ForComplexityLatency
SSE streamingWeb apps with real-time UILowReal-time
Job queue + pollingServerless, API gatewaysMedium3-5s polling delay
Webhook callbackService-to-serviceMediumReal-time
Synchronous (no pattern)Scripts, CLI toolsNoneBlocks caller

For most web applications, SSE streaming is the best choice — it's simple, shows progress, and doesn't require infrastructure for job queues.

For microservice architectures where the analysis consumer is a different service, webhooks are cleaner than polling.

For batch processing (analyzing 100 CSVs overnight), the job queue pattern scales well — just push all jobs onto the queue and let workers process them concurrently.

Timeout Handling

Regardless of pattern, handle the 300-second analysis timeout:

import asyncio

async def analyze_with_timeout(container_id, steering, timeout=300):
    """Run analysis with explicit timeout."""
    try:
        result = await asyncio.wait_for(
            call_datastorybot_analyze(container_id, steering),
            timeout=timeout
        )
        return result
    except asyncio.TimeoutError:
        return {
            "status": "timeout",
            "message": (
                "Analysis timed out after 5 minutes. "
                "Try with a smaller dataset or simpler steering prompt."
            )
        }

What to Read Next

For the error handling that wraps these async patterns, see error handling and retry patterns for data analysis APIs.

For the API endpoints these patterns call, see the DataStoryBot API reference.

For building a complete user-facing application, read building a self-service reporting portal with DataStoryBot.

Ready to find your data story?

Upload a CSV and DataStoryBot will uncover the narrative in seconds.

Try DataStoryBot →