general7 min read

PDF Data Reports from AI: Generate, Format, Distribute

Convert AI-generated data narratives and charts into branded PDF reports using WeasyPrint or Puppeteer. Complete pipeline from CSV to shareable PDF.

By DataStoryBot Team

PDF Data Reports from AI: Generate, Format, Distribute

DataStoryBot produces markdown narratives and PNG charts. Your stakeholders want PDFs. This isn't a tooling limitation — it's a format translation problem, and a solved one. Take the API output, wrap it in HTML with your brand styles, convert to PDF, and distribute.

This article walks through the complete pipeline: DataStoryBot API for the analysis, HTML templating for the layout, and PDF generation via WeasyPrint (Python) or Puppeteer (Node.js). By the end, you'll have a script that turns a CSV file into a branded PDF report.

The Pipeline

CSV → DataStoryBot API → Markdown + Charts → HTML Template → PDF

Each step is independent and composable. You can swap WeasyPrint for Puppeteer, use Jinja2 or Handlebars for templating, or skip the template entirely and use a markdown-to-PDF converter. The DataStoryBot part is always the same three API calls.

Step 1: Generate the Analysis

import requests

BASE_URL = "https://datastory.bot/api"

def analyze_csv(csv_path, steering=None):
    """Upload CSV and get analysis with narrative + charts."""

    # Upload
    with open(csv_path, "rb") as f:
        upload = requests.post(f"{BASE_URL}/upload", files={"file": f})
        upload.raise_for_status()

    container_id = upload.json()["containerId"]

    # Analyze
    payload = {"containerId": container_id}
    if steering:
        payload["steeringPrompt"] = steering

    stories = requests.post(f"{BASE_URL}/analyze", json=payload)
    stories.raise_for_status()
    angles = stories.json()

    # Refine the top story
    report = requests.post(f"{BASE_URL}/refine", json={
        "containerId": container_id,
        "selectedStoryTitle": angles[0]["title"]
    })
    report.raise_for_status()
    result = report.json()

    # Download charts
    charts = []
    for i, chart in enumerate(result.get("charts", [])):
        img = requests.get(
            f"{BASE_URL}/files/{container_id}/{chart['fileId']}"
        )
        path = f"/tmp/chart_{i+1}.png"
        with open(path, "wb") as f:
            f.write(img.content)
        charts.append({"path": path, "caption": chart["caption"]})

    return {
        "narrative": result["narrative"],
        "charts": charts,
        "title": angles[0]["title"]
    }

This gives you a dictionary with the narrative as markdown, chart images saved to disk, and the story title. Everything you need for the PDF.

Step 2: Build the HTML Template

The PDF is just rendered HTML. Use any templating engine — here's a minimal Jinja2 template:

from jinja2 import Template
import markdown
import base64
from pathlib import Path

REPORT_TEMPLATE = Template("""
<!DOCTYPE html>
<html>
<head>
<style>
  @page {
    size: letter;
    margin: 1in;
    @bottom-center {
      content: "Page " counter(page) " of " counter(pages);
      font-size: 9px;
      color: #666;
    }
  }
  body {
    font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif;
    font-size: 11pt;
    line-height: 1.6;
    color: #1a1a1a;
  }
  .header {
    border-bottom: 3px solid #2563eb;
    padding-bottom: 16px;
    margin-bottom: 32px;
  }
  .header h1 { font-size: 22pt; margin: 0; color: #111; }
  .header .meta { color: #666; font-size: 10pt; margin-top: 8px; }
  .narrative { margin-bottom: 24px; }
  .narrative h2 { color: #2563eb; font-size: 14pt; margin-top: 24px; }
  .narrative blockquote {
    border-left: 3px solid #2563eb;
    padding-left: 16px;
    margin-left: 0;
    color: #333;
    background: #f8fafc;
    padding: 12px 16px;
  }
  .chart-container {
    page-break-inside: avoid;
    margin: 24px 0;
    text-align: center;
  }
  .chart-container img { max-width: 100%; border-radius: 4px; }
  .chart-container .caption {
    font-size: 9pt; color: #666;
    margin-top: 8px; font-style: italic;
  }
  .footer {
    margin-top: 48px;
    padding-top: 16px;
    border-top: 1px solid #e5e7eb;
    font-size: 9pt;
    color: #999;
  }
</style>
</head>
<body>
  <div class="header">
    <h1>{{ title }}</h1>
    <div class="meta">Generated {{ date }} | Source: {{ source_file }}</div>
  </div>

  <div class="narrative">
    {{ narrative_html }}
  </div>

  {% for chart in charts %}
  <div class="chart-container">
    <img src="data:image/png;base64,{{ chart.b64 }}" alt="{{ chart.caption }}">
    <div class="caption">{{ chart.caption }}</div>
  </div>
  {% endfor %}

  <div class="footer">
    Report generated by DataStoryBot API · datastory.bot
  </div>
</body>
</html>
""")

def build_html(analysis, source_file):
    """Convert analysis dict to styled HTML."""
    from datetime import date

    # Convert markdown narrative to HTML
    narrative_html = markdown.markdown(
        analysis["narrative"],
        extensions=["extra", "codehilite"]
    )

    # Embed charts as base64
    charts = []
    for chart in analysis["charts"]:
        b64 = base64.b64encode(Path(chart["path"]).read_bytes()).decode()
        charts.append({"b64": b64, "caption": chart["caption"]})

    return REPORT_TEMPLATE.render(
        title=analysis["title"],
        date=date.today().isoformat(),
        source_file=source_file,
        narrative_html=narrative_html,
        charts=charts
    )

Charts are embedded as base64 so the HTML is self-contained — no external file references to break during PDF conversion.

Step 3a: Generate PDF with WeasyPrint (Python)

WeasyPrint renders HTML/CSS to PDF with good fidelity. It handles page breaks, headers/footers, and @page rules.

pip install weasyprint
from weasyprint import HTML

def generate_pdf(html_content, output_path):
    """Convert HTML to PDF using WeasyPrint."""
    HTML(string=html_content).write_pdf(output_path)
    print(f"PDF saved: {output_path}")

Step 3b: Generate PDF with Puppeteer (Node.js)

If you're in a Node.js environment or need better CSS support (WeasyPrint doesn't handle all modern CSS), use Puppeteer:

const puppeteer = require("puppeteer");
const fs = require("fs");

async function generatePdf(htmlContent, outputPath) {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.setContent(htmlContent, { waitUntil: "networkidle0" });
  await page.pdf({
    path: outputPath,
    format: "Letter",
    printBackground: true,
    margin: { top: "1in", right: "1in", bottom: "1in", left: "1in" },
    displayHeaderFooter: true,
    footerTemplate: `
      <div style="font-size: 9px; color: #999; text-align: center; width: 100%;">
        Page <span class="pageNumber"></span> of <span class="totalPages"></span>
      </div>
    `,
  });
  await browser.close();
  console.log(`PDF saved: ${outputPath}`);
}

Puppeteer uses Chromium's rendering engine, so your PDF looks exactly like the HTML in a browser. The trade-off is a heavier dependency (headless Chromium vs. a Python library).

The Complete Pipeline (Python)

from datetime import date

def csv_to_pdf(csv_path, output_path, steering=None):
    """End-to-end: CSV file to branded PDF report."""

    # Step 1: Analyze
    analysis = analyze_csv(csv_path, steering=steering)

    # Step 2: Template
    html = build_html(analysis, source_file=csv_path)

    # Step 3: PDF
    generate_pdf(html, output_path)

    return output_path

# Usage
csv_to_pdf(
    "q1_sales.csv",
    f"sales_report_{date.today().isoformat()}.pdf",
    steering="Focus on regional comparisons and quarter-over-quarter trends."
)

Three function calls. CSV in, PDF out. The entire pipeline runs in under a minute for most datasets.

Branding the Template

The HTML template above is minimal. For production reports, you'll want to add:

Logo and brand colors:

.header {
  border-bottom: 3px solid #your-brand-color;
}
.header::before {
  content: "";
  display: block;
  width: 120px;
  height: 40px;
  background: url(data:image/svg+xml;base64,...) no-repeat;
  margin-bottom: 12px;
}

Cover page:

<div class="cover-page" style="page-break-after: always; text-align: center; padding-top: 40%;">
  <img src="data:image/png;base64,{{ logo_b64 }}" style="width: 200px;">
  <h1 style="font-size: 28pt; margin-top: 48px;">{{ title }}</h1>
  <p style="color: #666; font-size: 14pt;">{{ date }} · Confidential</p>
</div>

Table of contents: If you're generating multi-story reports (analyzing multiple story angles from the same CSV), add a TOC page that links to each section using # anchors.

Multi-Story Reports

A single DataStoryBot analysis returns 2-4 story angles. For a comprehensive report, refine all of them:

def full_report(csv_path, output_path, steering=None):
    """Generate a multi-story PDF report."""

    with open(csv_path, "rb") as f:
        upload = requests.post(f"{BASE_URL}/upload", files={"file": f})
    container_id = upload.json()["containerId"]

    payload = {"containerId": container_id}
    if steering:
        payload["steeringPrompt"] = steering

    stories = requests.post(f"{BASE_URL}/analyze", json=payload)
    angles = stories.json()

    # Refine all stories
    sections = []
    for angle in angles:
        report = requests.post(f"{BASE_URL}/refine", json={
            "containerId": container_id,
            "selectedStoryTitle": angle["title"]
        })
        result = report.json()

        charts = []
        for i, chart in enumerate(result.get("charts", [])):
            img = requests.get(
                f"{BASE_URL}/files/{container_id}/{chart['fileId']}"
            )
            path = f"/tmp/chart_{angle['id']}_{i}.png"
            with open(path, "wb") as f:
                f.write(img.content)
            charts.append({"path": path, "caption": chart["caption"]})

        sections.append({
            "title": angle["title"],
            "narrative": result["narrative"],
            "charts": charts
        })

    # Build multi-section HTML and convert to PDF
    # ... (extend the template with a loop over sections)

This produces a report with multiple chapters — each one a different data story from the same dataset. A single CSV can yield a 10-page report covering trends, comparisons, anomalies, and distributions.

Distribution Options

Once you have the PDF, distribution is a separate concern:

Email via SendGrid:

import sendgrid
from sendgrid.helpers.mail import Mail, Attachment, FileContent, FileName, FileType

sg = sendgrid.SendGridAPIClient(api_key="your-key")
with open("report.pdf", "rb") as f:
    encoded = base64.b64encode(f.read()).decode()

message = Mail(
    from_email="reports@yourcompany.com",
    to_emails="stakeholders@yourcompany.com",
    subject=f"Data Report — {date.today().isoformat()}"
)
message.attachment = Attachment(
    FileContent(encoded),
    FileName("report.pdf"),
    FileType("application/pdf")
)
sg.send(message)

Upload to S3/R2 for sharing:

import boto3

s3 = boto3.client("s3")
s3.upload_file(
    "report.pdf",
    "company-reports",
    f"weekly/{date.today().isoformat()}.pdf",
    ExtraArgs={"ContentType": "application/pdf"}
)

Post to Slack:

from slack_sdk import WebClient

client = WebClient(token="xoxb-your-token")
client.files_upload_v2(
    channel="C0123456789",
    file="report.pdf",
    title=f"Weekly Report — {date.today().isoformat()}"
)

Scheduling with Cron

Combine the pipeline with a cron job for automated recurring reports:

# Every Monday at 8am UTC
0 8 * * 1 /usr/bin/python3 /opt/reports/weekly_report.py

Or if you're using a serverless environment, trigger it via a scheduled Lambda, Cloud Function, or Vercel Cron. The DataStoryBot API is stateless — no persistent connections to maintain.

For a deeper dive into scheduled report pipelines, see automating weekly data reports with DataStoryBot.

What to Read Next

For the API fundamentals that this pipeline builds on, start with how to generate a data report from CSV in one API call.

To automate the entire scheduling and delivery pipeline, read automating weekly data reports.

Or jump to the DataStoryBot playground and generate a report interactively before wiring up the PDF pipeline.

Ready to find your data story?

Upload a CSV and DataStoryBot will uncover the narrative in seconds.

Try DataStoryBot →