PDF Data Reports from AI: Generate, Format, Distribute
Convert AI-generated data narratives and charts into branded PDF reports using WeasyPrint or Puppeteer. Complete pipeline from CSV to shareable PDF.
PDF Data Reports from AI: Generate, Format, Distribute
DataStoryBot produces markdown narratives and PNG charts. Your stakeholders want PDFs. This isn't a tooling limitation — it's a format translation problem, and a solved one. Take the API output, wrap it in HTML with your brand styles, convert to PDF, and distribute.
This article walks through the complete pipeline: DataStoryBot API for the analysis, HTML templating for the layout, and PDF generation via WeasyPrint (Python) or Puppeteer (Node.js). By the end, you'll have a script that turns a CSV file into a branded PDF report.
The Pipeline
CSV → DataStoryBot API → Markdown + Charts → HTML Template → PDF
Each step is independent and composable. You can swap WeasyPrint for Puppeteer, use Jinja2 or Handlebars for templating, or skip the template entirely and use a markdown-to-PDF converter. The DataStoryBot part is always the same three API calls.
Step 1: Generate the Analysis
import requests
BASE_URL = "https://datastory.bot/api"
def analyze_csv(csv_path, steering=None):
"""Upload CSV and get analysis with narrative + charts."""
# Upload
with open(csv_path, "rb") as f:
upload = requests.post(f"{BASE_URL}/upload", files={"file": f})
upload.raise_for_status()
container_id = upload.json()["containerId"]
# Analyze
payload = {"containerId": container_id}
if steering:
payload["steeringPrompt"] = steering
stories = requests.post(f"{BASE_URL}/analyze", json=payload)
stories.raise_for_status()
angles = stories.json()
# Refine the top story
report = requests.post(f"{BASE_URL}/refine", json={
"containerId": container_id,
"selectedStoryTitle": angles[0]["title"]
})
report.raise_for_status()
result = report.json()
# Download charts
charts = []
for i, chart in enumerate(result.get("charts", [])):
img = requests.get(
f"{BASE_URL}/files/{container_id}/{chart['fileId']}"
)
path = f"/tmp/chart_{i+1}.png"
with open(path, "wb") as f:
f.write(img.content)
charts.append({"path": path, "caption": chart["caption"]})
return {
"narrative": result["narrative"],
"charts": charts,
"title": angles[0]["title"]
}
This gives you a dictionary with the narrative as markdown, chart images saved to disk, and the story title. Everything you need for the PDF.
Step 2: Build the HTML Template
The PDF is just rendered HTML. Use any templating engine — here's a minimal Jinja2 template:
from jinja2 import Template
import markdown
import base64
from pathlib import Path
REPORT_TEMPLATE = Template("""
<!DOCTYPE html>
<html>
<head>
<style>
@page {
size: letter;
margin: 1in;
@bottom-center {
content: "Page " counter(page) " of " counter(pages);
font-size: 9px;
color: #666;
}
}
body {
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif;
font-size: 11pt;
line-height: 1.6;
color: #1a1a1a;
}
.header {
border-bottom: 3px solid #2563eb;
padding-bottom: 16px;
margin-bottom: 32px;
}
.header h1 { font-size: 22pt; margin: 0; color: #111; }
.header .meta { color: #666; font-size: 10pt; margin-top: 8px; }
.narrative { margin-bottom: 24px; }
.narrative h2 { color: #2563eb; font-size: 14pt; margin-top: 24px; }
.narrative blockquote {
border-left: 3px solid #2563eb;
padding-left: 16px;
margin-left: 0;
color: #333;
background: #f8fafc;
padding: 12px 16px;
}
.chart-container {
page-break-inside: avoid;
margin: 24px 0;
text-align: center;
}
.chart-container img { max-width: 100%; border-radius: 4px; }
.chart-container .caption {
font-size: 9pt; color: #666;
margin-top: 8px; font-style: italic;
}
.footer {
margin-top: 48px;
padding-top: 16px;
border-top: 1px solid #e5e7eb;
font-size: 9pt;
color: #999;
}
</style>
</head>
<body>
<div class="header">
<h1>{{ title }}</h1>
<div class="meta">Generated {{ date }} | Source: {{ source_file }}</div>
</div>
<div class="narrative">
{{ narrative_html }}
</div>
{% for chart in charts %}
<div class="chart-container">
<img src="data:image/png;base64,{{ chart.b64 }}" alt="{{ chart.caption }}">
<div class="caption">{{ chart.caption }}</div>
</div>
{% endfor %}
<div class="footer">
Report generated by DataStoryBot API · datastory.bot
</div>
</body>
</html>
""")
def build_html(analysis, source_file):
"""Convert analysis dict to styled HTML."""
from datetime import date
# Convert markdown narrative to HTML
narrative_html = markdown.markdown(
analysis["narrative"],
extensions=["extra", "codehilite"]
)
# Embed charts as base64
charts = []
for chart in analysis["charts"]:
b64 = base64.b64encode(Path(chart["path"]).read_bytes()).decode()
charts.append({"b64": b64, "caption": chart["caption"]})
return REPORT_TEMPLATE.render(
title=analysis["title"],
date=date.today().isoformat(),
source_file=source_file,
narrative_html=narrative_html,
charts=charts
)
Charts are embedded as base64 so the HTML is self-contained — no external file references to break during PDF conversion.
Step 3a: Generate PDF with WeasyPrint (Python)
WeasyPrint renders HTML/CSS to PDF with good fidelity. It handles page breaks, headers/footers, and @page rules.
pip install weasyprint
from weasyprint import HTML
def generate_pdf(html_content, output_path):
"""Convert HTML to PDF using WeasyPrint."""
HTML(string=html_content).write_pdf(output_path)
print(f"PDF saved: {output_path}")
Step 3b: Generate PDF with Puppeteer (Node.js)
If you're in a Node.js environment or need better CSS support (WeasyPrint doesn't handle all modern CSS), use Puppeteer:
const puppeteer = require("puppeteer");
const fs = require("fs");
async function generatePdf(htmlContent, outputPath) {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setContent(htmlContent, { waitUntil: "networkidle0" });
await page.pdf({
path: outputPath,
format: "Letter",
printBackground: true,
margin: { top: "1in", right: "1in", bottom: "1in", left: "1in" },
displayHeaderFooter: true,
footerTemplate: `
<div style="font-size: 9px; color: #999; text-align: center; width: 100%;">
Page <span class="pageNumber"></span> of <span class="totalPages"></span>
</div>
`,
});
await browser.close();
console.log(`PDF saved: ${outputPath}`);
}
Puppeteer uses Chromium's rendering engine, so your PDF looks exactly like the HTML in a browser. The trade-off is a heavier dependency (headless Chromium vs. a Python library).
The Complete Pipeline (Python)
from datetime import date
def csv_to_pdf(csv_path, output_path, steering=None):
"""End-to-end: CSV file to branded PDF report."""
# Step 1: Analyze
analysis = analyze_csv(csv_path, steering=steering)
# Step 2: Template
html = build_html(analysis, source_file=csv_path)
# Step 3: PDF
generate_pdf(html, output_path)
return output_path
# Usage
csv_to_pdf(
"q1_sales.csv",
f"sales_report_{date.today().isoformat()}.pdf",
steering="Focus on regional comparisons and quarter-over-quarter trends."
)
Three function calls. CSV in, PDF out. The entire pipeline runs in under a minute for most datasets.
Branding the Template
The HTML template above is minimal. For production reports, you'll want to add:
Logo and brand colors:
.header {
border-bottom: 3px solid #your-brand-color;
}
.header::before {
content: "";
display: block;
width: 120px;
height: 40px;
background: url(data:image/svg+xml;base64,...) no-repeat;
margin-bottom: 12px;
}
Cover page:
<div class="cover-page" style="page-break-after: always; text-align: center; padding-top: 40%;">
<img src="data:image/png;base64,{{ logo_b64 }}" style="width: 200px;">
<h1 style="font-size: 28pt; margin-top: 48px;">{{ title }}</h1>
<p style="color: #666; font-size: 14pt;">{{ date }} · Confidential</p>
</div>
Table of contents: If you're generating multi-story reports (analyzing multiple story angles from the same CSV), add a TOC page that links to each section using # anchors.
Multi-Story Reports
A single DataStoryBot analysis returns 2-4 story angles. For a comprehensive report, refine all of them:
def full_report(csv_path, output_path, steering=None):
"""Generate a multi-story PDF report."""
with open(csv_path, "rb") as f:
upload = requests.post(f"{BASE_URL}/upload", files={"file": f})
container_id = upload.json()["containerId"]
payload = {"containerId": container_id}
if steering:
payload["steeringPrompt"] = steering
stories = requests.post(f"{BASE_URL}/analyze", json=payload)
angles = stories.json()
# Refine all stories
sections = []
for angle in angles:
report = requests.post(f"{BASE_URL}/refine", json={
"containerId": container_id,
"selectedStoryTitle": angle["title"]
})
result = report.json()
charts = []
for i, chart in enumerate(result.get("charts", [])):
img = requests.get(
f"{BASE_URL}/files/{container_id}/{chart['fileId']}"
)
path = f"/tmp/chart_{angle['id']}_{i}.png"
with open(path, "wb") as f:
f.write(img.content)
charts.append({"path": path, "caption": chart["caption"]})
sections.append({
"title": angle["title"],
"narrative": result["narrative"],
"charts": charts
})
# Build multi-section HTML and convert to PDF
# ... (extend the template with a loop over sections)
This produces a report with multiple chapters — each one a different data story from the same dataset. A single CSV can yield a 10-page report covering trends, comparisons, anomalies, and distributions.
Distribution Options
Once you have the PDF, distribution is a separate concern:
Email via SendGrid:
import sendgrid
from sendgrid.helpers.mail import Mail, Attachment, FileContent, FileName, FileType
sg = sendgrid.SendGridAPIClient(api_key="your-key")
with open("report.pdf", "rb") as f:
encoded = base64.b64encode(f.read()).decode()
message = Mail(
from_email="reports@yourcompany.com",
to_emails="stakeholders@yourcompany.com",
subject=f"Data Report — {date.today().isoformat()}"
)
message.attachment = Attachment(
FileContent(encoded),
FileName("report.pdf"),
FileType("application/pdf")
)
sg.send(message)
Upload to S3/R2 for sharing:
import boto3
s3 = boto3.client("s3")
s3.upload_file(
"report.pdf",
"company-reports",
f"weekly/{date.today().isoformat()}.pdf",
ExtraArgs={"ContentType": "application/pdf"}
)
Post to Slack:
from slack_sdk import WebClient
client = WebClient(token="xoxb-your-token")
client.files_upload_v2(
channel="C0123456789",
file="report.pdf",
title=f"Weekly Report — {date.today().isoformat()}"
)
Scheduling with Cron
Combine the pipeline with a cron job for automated recurring reports:
# Every Monday at 8am UTC
0 8 * * 1 /usr/bin/python3 /opt/reports/weekly_report.py
Or if you're using a serverless environment, trigger it via a scheduled Lambda, Cloud Function, or Vercel Cron. The DataStoryBot API is stateless — no persistent connections to maintain.
For a deeper dive into scheduled report pipelines, see automating weekly data reports with DataStoryBot.
What to Read Next
For the API fundamentals that this pipeline builds on, start with how to generate a data report from CSV in one API call.
To automate the entire scheduling and delivery pipeline, read automating weekly data reports.
Or jump to the DataStoryBot playground and generate a report interactively before wiring up the PDF pipeline.
Ready to find your data story?
Upload a CSV and DataStoryBot will uncover the narrative in seconds.
Try DataStoryBot →