general11 min read

From Numbers to Narrative: Automating Data Stories via API

How the DataStoryBot API converts statistical patterns into structured prose. A technical deep-dive into narrative generation from tabular data.

By DataStoryBot Team

From Numbers to Narrative: Automating Data Stories via API

A data story is not a summary statistic with a sentence wrapped around it. "Revenue grew 18%" is a fact. "Revenue grew 18% because enterprise adoption in APAC tripled after the Q2 pricing change — and if that rate holds, APAC will be the largest revenue region by Q4" is a narrative. The difference is structure: a hook, contextual grounding, an analytical insight, supporting evidence, and an implication for the audience.

Most analytics tools stop at the fact. They give you the number and leave you to write the story. DataStoryBot generates the full narrative — structured markdown with bold key insights, blockquote callouts for critical findings, and supporting charts — through three API calls.

This article is a technical deep-dive into how that narrative generation works: what the output looks like, how it's structured, how to control it, and where the seams show.

The Narrative Output Format

When you call the /api/refine endpoint, the narrative field in the response is a Markdown string. Not plain text. Not HTML. Markdown — which means you can render it in any context that supports Markdown, or convert it to HTML, PDF, or Slack blocks.

Here's a real example of what the narrative output looks like for a quarterly sales dataset:

## West Region Drives 68% of Q4 Revenue Growth

The West region accounted for **$4.2M of the $6.1M total revenue increase** in Q4,
making it the dominant growth driver by a wide margin. The next largest contributor,
the South region, added $1.1M — less than a quarter of the West's contribution.

> **Key finding:** West region growth was not driven by new customer acquisition.
> Win rates remained flat at 22%. Instead, existing accounts expanded their contracts
> by an average of 34%, suggesting strong product-market fit within the installed base.

The timing of the acceleration is notable. **Growth began inflecting upward in
mid-October**, two weeks after the regional sales team restructured from
geographic territories to vertical specialization. While correlation is not
causation, the alignment is difficult to dismiss.

### Supporting Evidence

Three metrics reinforce the expansion-driven growth thesis:

- **Average contract value** in the West increased from $47K to $63K (+34%)
  between Q3 and Q4, while other regions saw increases of 5-8%.
- **Net revenue retention** for West region accounts hit 142%, compared to
  114% company-wide.
- **Deal cycle length** decreased from 45 days to 31 days for expansion deals,
  suggesting reduced friction in the upsell process.

### Implications

If West region expansion rates hold through Q1, the company will exceed its
annual revenue target by approximately 12% without any improvement in new
customer acquisition. The question for leadership: **is the vertical
specialization model replicable in other regions?**

Notice the structural patterns. Every narrative DataStoryBot generates follows a consistent framework, though the specific sections vary based on the data.

Anatomy of a Generated Narrative

The narratives follow a five-part structure that maps to how analysts naturally communicate findings:

1. Headline (H2)

The story title becomes the H2 heading. It's a complete sentence that states the key finding, not a label. "West Region Drives 68% of Q4 Revenue Growth" tells you the insight before you read a word of the body. Compare that to a typical dashboard title like "Regional Revenue Analysis."

2. Lead Paragraph

The opening paragraph quantifies the headline claim. Bold formatting highlights the specific numbers — $4.2M of the $6.1M total revenue increase — so a reader scanning the document can grasp the magnitude without reading every word. This is deliberate: the narrative is designed for busy stakeholders who skim before they read.

3. Blockquote Callout

The > blockquote syntax creates a visual callout for the most important analytical finding. In rendered Markdown, this typically appears as an indented block with a left border — visually distinct from the surrounding prose. DataStoryBot uses this for the "so what" insight: the finding that changes how you interpret the headline number.

4. Supporting Evidence

A section (often under an H3 heading) that presents the data backing the narrative's claims. This usually takes the form of a bulleted list with bold metric names and specific numbers. Each bullet is independently verifiable — a reader can check each claim against the filtered dataset that comes alongside the narrative.

5. Implications

The closing section translates the analysis into action. What should the audience do with this information? What question does it raise? This is where the narrative earns its value over a dashboard — dashboards show what happened, narratives suggest what to do about it.

Generating Your First Narrative

The full pipeline to get from a CSV file to a narrative:

# Step 1: Upload
UPLOAD=$(curl -s -X POST https://datastory.bot/api/upload \
  -F "file=@quarterly_sales.csv")
CONTAINER_ID=$(echo $UPLOAD | jq -r '.containerId')
echo "Container: $CONTAINER_ID"

# Step 2: Analyze — get 3 story angles
STORIES=$(curl -s -X POST https://datastory.bot/api/analyze \
  -H "Content-Type: application/json" \
  -d "{\"containerId\": \"$CONTAINER_ID\"}")
echo "Stories found:"
echo $STORIES | jq -r '.[].title'

# Step 3: Refine — generate narrative for the first story
TITLE=$(echo $STORIES | jq -r '.[0].title')
RESULT=$(curl -s -X POST https://datastory.bot/api/refine \
  -H "Content-Type: application/json" \
  -d "{\"containerId\": \"$CONTAINER_ID\", \"selectedStoryTitle\": \"$TITLE\"}")

# Extract and save the narrative
echo $RESULT | jq -r '.narrative' > story.md
echo "Narrative saved to story.md ($(wc -c < story.md) bytes)"

And the Python equivalent with more control:

import requests

BASE_URL = "https://datastory.bot/api"

# Upload
with open("quarterly_sales.csv", "rb") as f:
    upload = requests.post(f"{BASE_URL}/upload", files={"file": f}).json()
container_id = upload["containerId"]

# Analyze
stories = requests.post(f"{BASE_URL}/analyze", json={
    "containerId": container_id
}).json()

for story in stories:
    print(f"[{story['id']}] {story['title']}")
    print(f"    {story['summary']}\n")

# Refine the first story
result = requests.post(f"{BASE_URL}/refine", json={
    "containerId": container_id,
    "selectedStoryTitle": stories[0]["title"]
}).json()

# The narrative is ready to use
narrative = result["narrative"]
print(f"Narrative length: {len(narrative)} characters")
print(f"Charts: {len(result['charts'])}")
print(f"\n--- NARRATIVE ---\n")
print(narrative)

Controlling Narrative Style with Refinement Prompts

The refinementPrompt parameter in the /api/refine call adjusts the narrative output without changing the underlying analysis. This is useful because the same data insight needs different framing for different audiences.

Executive Brevity

result = requests.post(f"{BASE_URL}/refine", json={
    "containerId": container_id,
    "selectedStoryTitle": stories[0]["title"],
    "refinementPrompt": "Executive audience. Keep the narrative under 150 words. Lead with the bottom-line impact. Skip methodology details."
}).json()

The output compresses to a tight paragraph with the key number and the recommended action. No supporting evidence section, no methodology — just the finding and the implication.

Technical Detail

result = requests.post(f"{BASE_URL}/refine", json={
    "containerId": container_id,
    "selectedStoryTitle": stories[0]["title"],
    "refinementPrompt": "Analytics team audience. Include statistical methods used, confidence intervals where applicable, and note any caveats about data quality or sample size."
}).json()

The output includes more nuance: which statistical tests were applied, what the confidence bounds are, and explicit caveats about what the data can and cannot support.

Recommendation-Focused

result = requests.post(f"{BASE_URL}/refine", json={
    "containerId": container_id,
    "selectedStoryTitle": stories[0]["title"],
    "refinementPrompt": "Focus on actionable recommendations. For each insight, suggest a specific next step the team should take. Frame recommendations as testable hypotheses."
}).json()

This shifts the implications section from suggestive to prescriptive. Instead of "the question for leadership is..." you get "recommendation: pilot vertical specialization in the South region for Q1 with a 90-day review gate."

Narrative + Charts + Dataset: The Complete Package

The narrative doesn't stand alone. Each refined story returns three components:

result = requests.post(f"{BASE_URL}/refine", json={
    "containerId": container_id,
    "selectedStoryTitle": stories[0]["title"]
}).json()

# 1. The narrative (Markdown string)
narrative = result["narrative"]

# 2. Charts (list of file references with captions)
for chart in result["charts"]:
    print(f"Chart: {chart['caption']}")
    # Download: GET /api/files/{container_id}/{chart['fileId']}

# 3. Filtered dataset (CSV of relevant rows)
dataset = result["resultDataset"]
print(f"Dataset: {dataset['fileName']} ({dataset['rowCount']} rows)")
# Download: GET /api/files/{container_id}/{dataset['fileId']}

The charts are designed to be embedded alongside the narrative — each caption corresponds to a claim in the text. The filtered dataset contains only the rows the narrative references, which serves as both an audit trail and a starting point for deeper analysis.

Download everything before the container expires (20 minutes from upload):

import os

output_dir = "story_output"
os.makedirs(output_dir, exist_ok=True)

# Save narrative
with open(f"{output_dir}/narrative.md", "w") as f:
    f.write(result["narrative"])

# Save charts
for i, chart in enumerate(result["charts"]):
    img = requests.get(f"{BASE_URL}/files/{container_id}/{chart['fileId']}")
    with open(f"{output_dir}/chart_{i+1}.png", "wb") as f:
        f.write(img.content)

# Save filtered dataset
ds = result["resultDataset"]
ds_resp = requests.get(f"{BASE_URL}/files/{container_id}/{ds['fileId']}")
with open(f"{output_dir}/{ds['fileName']}", "wb") as f:
    f.write(ds_resp.content)

print(f"Story saved to {output_dir}/")

How the Narrative Generation Pipeline Works

Under the hood, narrative generation happens in two phases inside the ephemeral Code Interpreter container:

Phase 1: Statistical analysis. The Code Interpreter writes and executes Python code to analyze your data. It profiles distributions, computes aggregations, tests for significance, and identifies the patterns that support the selected story angle. This phase produces numbers, dataframes, and charts.

Phase 2: Narrative synthesis. GPT-4o takes the statistical results — the computed numbers, the chart descriptions, the identified patterns — and composes the Markdown narrative. It applies the five-part structure (headline, lead, callout, evidence, implications), formats bold highlights and blockquotes, and ensures every claim in the text is backed by a specific number from the analysis.

The two phases are tightly coupled. The narrative doesn't hallucinate numbers that weren't in the analysis. Every bold statistic in the text corresponds to a computation that ran inside the container. This is the advantage of Code Interpreter over a pure language model: the LLM writes and runs code first, then writes prose about the code's output. The prose is grounded in computed results, not generated from vibes.

That said, the interpretation of those numbers involves judgment. When the narrative says "the alignment between the restructuring date and the growth inflection is difficult to dismiss," that's an editorial choice by the model. The data supports the temporal correlation. Whether that correlation is meaningful is a judgment call — one the narrative makes explicitly rather than hiding behind a chart.

Rendering Narratives in Different Contexts

Since the output is Markdown, you can render it anywhere:

Web applications — use any Markdown library (marked.js, react-markdown, markdown-it) to render as HTML.

Email — convert to HTML with Python's markdown library, then embed in an email body. The bold formatting and blockquotes translate cleanly.

PDF — pipe through a Markdown-to-PDF tool like Pandoc or WeasyPrint. The consistent heading structure means the PDF gets a proper table of contents automatically.

Slack — Slack's message format supports a subset of Markdown (bold, links, blockquotes). The narrative renders well in Slack with minimal conversion.

Confluence / Notion — paste the Markdown directly. Both tools support Markdown input.

The charts are separate PNG files, so you handle them according to each platform's image embedding method — <img> tags for HTML, inline attachments for email, \includegraphics for LaTeX.

Multiple Stories from One Dataset

The analyze step returns three story angles. You don't have to pick just one. For a comprehensive report, refine all three:

stories = requests.post(f"{BASE_URL}/analyze", json={
    "containerId": container_id
}).json()

full_report = []
for story in stories:
    result = requests.post(f"{BASE_URL}/refine", json={
        "containerId": container_id,
        "selectedStoryTitle": story["title"]
    }).json()
    full_report.append(result["narrative"])

# Combine into a single document
combined = "\n\n---\n\n".join(full_report)
with open("full_report.md", "w") as f:
    f.write(combined)

This gives you a three-section report, each with its own narrative, charts, and supporting data. The horizontal rule (---) separates the sections visually.

Be mindful of the 20-minute container TTL when doing multiple refine calls. Each refine call takes 15-30 seconds, so three calls is well within the window. But if you're also downloading charts and datasets between calls, keep the total pipeline under 15 minutes to leave margin.

The Honest Limitations

Automated narrative generation is powerful but imperfect. Know where the edges are:

Narrative coherence varies. Most generated narratives read well. Occasionally, the structure is awkward — a blockquote that restates the lead paragraph, or an implications section that's too vague. The refinement prompt can fix specific issues, but there's no guarantee of perfection on every run.

Causal claims are suggestions. When the narrative says growth "was driven by" expansion deals, it's reporting a strong correlation found in the data. It's not proving causation. Sophisticated readers will understand this. Less analytical audiences may take causal language at face value. If that's a concern, use a refinement prompt: "Avoid causal language. Use correlational framing."

Tone can feel generic. The narratives are competent but not distinctive. They don't have the voice of your best analyst. For internal reports, this is usually fine. For client-facing deliverables or published content, you'll want a human editing pass on the generated draft.

Long datasets may lose nuance. Code Interpreter processes the full dataset, but the narrative necessarily summarizes. If your data has 50 important segments, the narrative will focus on the 3-5 most significant. The filtered dataset helps — it preserves the detail the narrative glosses over.

What to Read Next

For the conceptual framework behind data storytelling — why narrative structure matters and how the five-part pattern works — read what is data storytelling.

For practical guidance on crafting data stories that land with stakeholders, including audience-specific framing techniques, see how to write a data story.

And to see narrative generation in action without writing code, try the DataStoryBot playground — upload a CSV and watch the narrative assemble itself.

Ready to find your data story?

Upload a CSV and DataStoryBot will uncover the narrative in seconds.

Try DataStoryBot →