general13 min read

Automating Weekly Chart Reports from Live Data

Build a pipeline that pulls data, sends it to DataStoryBot's API, extracts charts, and assembles them into a PDF or email report.

By DataStoryBot Team

Automating Weekly Chart Reports from Live Data

This article builds a pipeline that runs on a schedule, pulls fresh CSV data, sends it through the DataStoryBot API, downloads the generated chart PNGs, and assembles them into either an HTML email or a PDF. The focus is specifically on the chart extraction and assembly step — the part that trips people up when they move from interactive API calls to production automation.

If you want the broader narrative-and-email pipeline, that is covered in automating weekly data reports. And if you need to understand chart downloading mechanics in isolation, download and embed AI-generated charts covers that in depth. This article focuses on the full chart-forward pipeline: getting the charts out reliably and rendering them into a finished report document.

Pipeline Architecture

The pipeline has five stages:

[Cron / Scheduler]
    → [Pull CSV from data source]
    → [DataStoryBot API: upload → analyze → refine]
    → [Extract chart file IDs → download PNGs via /files endpoint]
    → [Assemble into HTML email or PDF]
    → [Deliver]

Each stage is a function with a clear input and output. The container lifetime is 20 minutes from upload — short enough that you cannot afford to be slow between the upload and the chart download steps. Structure the pipeline so chart downloads happen immediately after the refine call returns, before you do anything else with the data.

Prerequisites

pip install requests weasyprint markdown python-dotenv
  • requests — HTTP calls to the DataStoryBot API
  • weasyprint — HTML-to-PDF conversion (alternatively use reportlab or pdfkit)
  • markdown — Markdown-to-HTML rendering for the narrative text
  • python-dotenv — environment variable management

You will also need a DATASTORYBOT_API_KEY environment variable if your account requires one. During the current open beta, the API is unauthenticated.

Step 1: Pull CSV Data from Your Source

The pipeline starts by fetching fresh data. The exact implementation depends on your source — a database, an S3 bucket, a REST API, or a file share. Here are three common patterns:

import os
import csv
import io
import boto3
import requests
from datetime import date, timedelta

def fetch_from_s3(bucket: str, key: str) -> bytes:
    """Download a CSV from S3 and return raw bytes."""
    s3 = boto3.client("s3")
    response = s3.get_object(Bucket=bucket, Key=key)
    return response["Body"].read()

def fetch_from_postgres(dsn: str, query: str) -> bytes:
    """Run a query and return results as CSV bytes."""
    import psycopg2
    conn = psycopg2.connect(dsn)
    cur = conn.cursor()
    cur.execute(query)
    rows = cur.fetchall()
    headers = [desc[0] for desc in cur.description]
    conn.close()

    buf = io.StringIO()
    writer = csv.writer(buf)
    writer.writerow(headers)
    writer.writerows(rows)
    return buf.getvalue().encode("utf-8")

def fetch_from_url(url: str) -> bytes:
    """Fetch a CSV from an HTTP endpoint."""
    response = requests.get(url, timeout=30)
    response.raise_for_status()
    return response.content

The return value is always raw CSV bytes. That keeps the rest of the pipeline source-agnostic.

Step 2: Upload and Analyze with DataStoryBot

This is the three-call sequence: upload the CSV, discover story angles, refine the chosen story into a narrative with charts. The key constraint is the 20-minute container TTL — once you upload, you have 20 minutes to complete the analyze and refine calls and download all chart files.

BASE_URL = "https://datastory.bot/api"

def run_analysis(csv_bytes: bytes, filename: str, steering: str = None) -> dict:
    """
    Upload CSV bytes, run analyze and refine, return the full result.

    Returns a dict with:
        - container_id: str
        - title: str
        - narrative: str (Markdown)
        - charts: list of {fileId, caption}
    """
    # Upload
    upload_resp = requests.post(
        f"{BASE_URL}/upload",
        files={"file": (filename, csv_bytes, "text/csv")},
        timeout=60,
    )
    upload_resp.raise_for_status()
    upload_data = upload_resp.json()
    container_id = upload_data["containerId"]

    print(f"Uploaded {filename}: container {container_id}")
    print(f"  {upload_data['metadata']['rowCount']} rows, "
          f"{upload_data['metadata']['columnCount']} columns")

    # Analyze — discover story angles
    analyze_payload = {"containerId": container_id}
    if steering:
        analyze_payload["steeringPrompt"] = steering

    analyze_resp = requests.post(
        f"{BASE_URL}/analyze",
        json=analyze_payload,
        timeout=120,
    )
    analyze_resp.raise_for_status()
    stories = analyze_resp.json()

    print(f"Found {len(stories)} story angles — selecting: {stories[0]['title']}")

    # Refine the top story
    refine_resp = requests.post(
        f"{BASE_URL}/refine",
        json={
            "containerId": container_id,
            "selectedStoryTitle": stories[0]["title"],
        },
        timeout=180,
    )
    refine_resp.raise_for_status()
    refine_data = refine_resp.json()

    return {
        "container_id": container_id,
        "title": stories[0]["title"],
        "narrative": refine_data["narrative"],
        "charts": refine_data.get("charts", []),
    }

The timeout values matter for unattended runs. The analyze call can take 30–90 seconds; refine can take up to 3 minutes for complex datasets. Do not use the default timeout.

Step 3: Extract Chart URLs and Download PNGs

The refine response contains charts, an array of objects with fileId and caption. The file proxy URL pattern is:

GET https://datastory.bot/api/files/{containerId}/{fileId}

Download all charts immediately after the refine call returns — before any other processing. The container TTL is 20 minutes from upload, and you have already spent some of that time on the analyze and refine calls.

import time

def download_charts(container_id: str, charts: list, output_dir: str = "/tmp") -> list:
    """
    Download all chart PNGs from the file proxy.

    Returns a list of dicts with:
        - path: absolute path to the downloaded PNG
        - caption: chart caption from the API response
        - file_id: original fileId
        - size_bytes: file size
    """
    os.makedirs(output_dir, exist_ok=True)
    results = []

    for i, chart in enumerate(charts):
        file_id = chart["fileId"]
        url = f"{BASE_URL}/files/{container_id}/{file_id}"

        # Retry up to 3 times with backoff
        for attempt in range(3):
            try:
                resp = requests.get(url, timeout=30)
                resp.raise_for_status()
                break
            except requests.RequestException as e:
                if attempt == 2:
                    raise RuntimeError(
                        f"Failed to download chart {file_id} after 3 attempts: {e}"
                    )
                time.sleep(2 ** attempt)

        # Derive a clean filename from the caption
        slug = chart["caption"][:60].lower()
        slug = "".join(c if c.isalnum() or c == " " else "" for c in slug)
        slug = slug.strip().replace(" ", "_")
        filename = os.path.join(output_dir, f"chart_{i+1:02d}_{slug}.png")

        with open(filename, "wb") as f:
            f.write(resp.content)

        results.append({
            "path": filename,
            "caption": chart["caption"],
            "file_id": file_id,
            "size_bytes": len(resp.content),
        })
        print(f"  Downloaded chart {i+1}: {os.path.basename(filename)} "
              f"({len(resp.content) // 1024} KB)")

    return results


def run_analysis_and_download(csv_bytes, filename, steering=None, output_dir="/tmp"):
    """Full pipeline: analyze CSV, download charts, return everything needed for assembly."""
    result = run_analysis(csv_bytes, filename, steering)

    print(f"\nDownloading {len(result['charts'])} charts...")
    downloaded = download_charts(result["container_id"], result["charts"], output_dir)

    return {
        "title": result["title"],
        "narrative": result["narrative"],
        "charts": downloaded,
        "container_id": result["container_id"],
    }

Note that download_charts returns the local file path, caption, and size — everything the assembly step needs. The container ID is no longer needed after this point.

Step 4: Assemble as HTML Email

For email delivery, charts embed as CID (Content-ID) attachments. This is the most compatible approach across Gmail, Outlook, and Apple Mail.

import base64
import markdown as md
from datetime import date
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from email.mime.image import MIMEImage
import smtplib

def build_html_email_body(title: str, narrative: str, charts: list) -> str:
    """Convert narrative and charts into an HTML email body."""
    html_narrative = md.markdown(
        narrative,
        extensions=["tables", "fenced_code", "nl2br"],
    )

    charts_section = ""
    for chart in charts:
        cid = chart["file_id"]
        charts_section += f"""
        <div style="background-color:#141414;padding:16px;border-radius:8px;margin:20px 0;">
            <img src="cid:{cid}"
                 alt="{chart['caption']}"
                 width="600"
                 style="max-width:100%;height:auto;display:block;" />
            <p style="color:#999999;font-size:13px;margin:8px 0 0 0;line-height:1.4;">
                {chart['caption']}
            </p>
        </div>
        """

    report_date = date.today().strftime("%B %d, %Y")

    return f"""<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width,initial-scale=1">
</head>
<body style="font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;
             color:#222222;max-width:680px;margin:0 auto;padding:24px 16px;">

  <p style="color:#888888;font-size:13px;margin:0 0 8px 0;">{report_date}</p>
  <h1 style="font-size:22px;font-weight:700;color:#111111;margin:0 0 24px 0;">
    {title}
  </h1>

  <div style="font-size:15px;line-height:1.7;color:#333333;">
    {html_narrative}
  </div>

  <h2 style="font-size:17px;font-weight:600;color:#111111;
             margin:36px 0 16px 0;border-top:1px solid #eeeeee;padding-top:24px;">
    Charts
  </h2>
  {charts_section}

  <hr style="border:none;border-top:1px solid #eeeeee;margin:36px 0 16px 0;" />
  <p style="font-size:12px;color:#aaaaaa;margin:0;">
    Generated by <a href="https://datastory.bot" style="color:#aaaaaa;">DataStoryBot</a>
  </p>
</body>
</html>"""


def send_email_report(
    report: dict,
    from_addr: str,
    to_addrs: list,
    smtp_host: str,
    smtp_port: int = 587,
    smtp_user: str = None,
    smtp_password: str = None,
) -> None:
    """Send the chart report as an HTML email with inline chart attachments."""
    html_body = build_html_email_body(
        report["title"], report["narrative"], report["charts"]
    )

    msg = MIMEMultipart("related")
    msg["From"] = from_addr
    msg["To"] = ", ".join(to_addrs)
    msg["Subject"] = f"Weekly Report: {report['title']}"

    # HTML body
    msg.attach(MIMEText(html_body, "html", "utf-8"))

    # Inline chart attachments
    for chart in report["charts"]:
        with open(chart["path"], "rb") as f:
            img = MIMEImage(f.read(), "png")
        img.add_header("Content-ID", f"<{chart['file_id']}>")
        img.add_header(
            "Content-Disposition", "inline",
            filename=os.path.basename(chart["path"])
        )
        msg.attach(img)

    with smtplib.SMTP(smtp_host, smtp_port) as server:
        server.ehlo()
        server.starttls()
        if smtp_user and smtp_password:
            server.login(smtp_user, smtp_password)
        server.sendmail(from_addr, to_addrs, msg.as_string())

    print(f"Email sent to {', '.join(to_addrs)}")

Step 5: Assemble as PDF

For PDF output, WeasyPrint converts HTML to PDF. The same HTML template used for email works here — the only difference is that images are referenced by file path instead of cid: URLs, and the HTML is rendered to a file rather than sent over SMTP.

from weasyprint import HTML, CSS

def build_pdf_html(title: str, narrative: str, charts: list) -> str:
    """Build HTML for PDF rendering. Images reference local paths, not CIDs."""
    html_narrative = md.markdown(
        narrative,
        extensions=["tables", "fenced_code"],
    )

    charts_section = ""
    for chart in charts:
        # WeasyPrint reads local files via file:// or absolute path
        charts_section += f"""
        <div class="chart-block">
            <img src="file://{chart['path']}" alt="{chart['caption']}" />
            <p class="caption">{chart['caption']}</p>
        </div>
        """

    report_date = date.today().strftime("%B %d, %Y")

    return f"""<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<style>
  @page {{
    size: A4;
    margin: 2cm 2.5cm;
  }}
  body {{
    font-family: -apple-system, 'Helvetica Neue', Arial, sans-serif;
    font-size: 11pt;
    line-height: 1.6;
    color: #222222;
  }}
  h1 {{ font-size: 18pt; font-weight: 700; margin: 0 0 6pt 0; }}
  h2 {{ font-size: 14pt; font-weight: 600; margin: 18pt 0 8pt 0; }}
  h3 {{ font-size: 12pt; font-weight: 600; margin: 12pt 0 6pt 0; }}
  p {{ margin: 0 0 8pt 0; }}
  table {{ border-collapse: collapse; width: 100%; margin: 12pt 0; font-size: 9pt; }}
  th, td {{ border: 1px solid #cccccc; padding: 4pt 8pt; text-align: left; }}
  th {{ background-color: #f5f5f5; font-weight: 600; }}
  .date {{ color: #888888; font-size: 9pt; margin: 0 0 12pt 0; }}
  .chart-block {{
    background-color: #141414;
    border-radius: 6pt;
    padding: 12pt;
    margin: 16pt 0;
    page-break-inside: avoid;
  }}
  .chart-block img {{
    max-width: 100%;
    height: auto;
    display: block;
  }}
  .caption {{
    color: #999999;
    font-size: 8.5pt;
    margin: 6pt 0 0 0;
  }}
  .footer {{
    margin-top: 24pt;
    padding-top: 12pt;
    border-top: 1pt solid #eeeeee;
    font-size: 8pt;
    color: #aaaaaa;
  }}
</style>
</head>
<body>
  <p class="date">{report_date}</p>
  <h1>{title}</h1>

  {html_narrative}

  <h2>Charts</h2>
  {charts_section}

  <div class="footer">Generated by DataStoryBot — datastory.bot</div>
</body>
</html>"""


def export_pdf(report: dict, output_path: str) -> str:
    """Render the report to a PDF file. Returns the output path."""
    html_string = build_pdf_html(
        report["title"], report["narrative"], report["charts"]
    )
    HTML(string=html_string).write_pdf(output_path)
    size_kb = os.path.getsize(output_path) // 1024
    print(f"PDF written: {output_path} ({size_kb} KB)")
    return output_path

A typical report with three charts produces a PDF between 800 KB and 2 MB depending on chart complexity. WeasyPrint handles page-break-inside: avoid on chart blocks, so charts do not split across pages.

Step 6: The Complete Pipeline Script

Putting all the pieces together into a single script:

#!/usr/bin/env python3
"""
weekly_chart_report.py — automated weekly chart report pipeline.

Usage:
    python weekly_chart_report.py --mode email
    python weekly_chart_report.py --mode pdf --output /tmp/report.pdf
"""
import argparse
import os
import sys
import tempfile
from datetime import date
from dotenv import load_dotenv

load_dotenv()

# Configuration from environment
CSV_SOURCE = os.environ["REPORT_CSV_SOURCE"]   # s3://bucket/key or https:// URL
STEERING   = os.environ.get("REPORT_STEERING", "")
SMTP_HOST  = os.environ.get("SMTP_HOST", "smtp.gmail.com")
SMTP_PORT  = int(os.environ.get("SMTP_PORT", "587"))
SMTP_USER  = os.environ.get("SMTP_USER")
SMTP_PASS  = os.environ.get("SMTP_PASSWORD")
FROM_EMAIL = os.environ.get("REPORT_FROM")
TO_EMAILS  = os.environ.get("REPORT_TO", "").split(",")


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--mode", choices=["email", "pdf"], default="pdf")
    parser.add_argument("--output", default=f"/tmp/report_{date.today()}.pdf")
    args = parser.parse_args()

    # 1. Fetch CSV
    print("Fetching CSV data...")
    if CSV_SOURCE.startswith("s3://"):
        parts = CSV_SOURCE[5:].split("/", 1)
        csv_bytes = fetch_from_s3(parts[0], parts[1])
        filename = parts[1].split("/")[-1]
    else:
        csv_bytes = fetch_from_url(CSV_SOURCE)
        filename = CSV_SOURCE.split("/")[-1] or "data.csv"

    print(f"Fetched {len(csv_bytes) // 1024} KB from {CSV_SOURCE}")

    # 2. Analyze and download charts into a temp directory
    with tempfile.TemporaryDirectory(prefix="dsbcharts_") as tmpdir:
        report = run_analysis_and_download(
            csv_bytes,
            filename,
            steering=STEERING or None,
            output_dir=tmpdir,
        )

        print(f"\nAnalysis complete: '{report['title']}'")
        print(f"{len(report['charts'])} charts downloaded")

        # 3. Assemble and deliver
        if args.mode == "email":
            send_email_report(
                report,
                from_addr=FROM_EMAIL,
                to_addrs=[e.strip() for e in TO_EMAILS if e.strip()],
                smtp_host=SMTP_HOST,
                smtp_port=SMTP_PORT,
                smtp_user=SMTP_USER,
                smtp_password=SMTP_PASS,
            )
        else:
            export_pdf(report, args.output)
            print(f"\nReport saved to: {args.output}")


if __name__ == "__main__":
    main()

The temporary directory ensures chart PNGs are cleaned up after delivery. For PDF mode, the PDF is written to --output before the temp directory is deleted.

Step 7: Schedule with Cron or a Cloud Scheduler

For a Linux/macOS server, add a crontab entry:

# Edit crontab
crontab -e

# Every Monday at 7:00 AM UTC — PDF mode, log to file
0 7 * * 1 /usr/bin/python3 /opt/reports/weekly_chart_report.py \
    --mode pdf \
    --output /opt/reports/output/report_$(date +\%Y-\%m-\%d).pdf \
    >> /var/log/weekly_chart_report.log 2>&1

For cloud environments:

GitHub Actions — add a scheduled workflow:

# .github/workflows/weekly-report.yml
name: Weekly Chart Report
on:
  schedule:
    - cron: '0 7 * * 1'
  workflow_dispatch:  # allow manual runs

jobs:
  generate-report:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'
      - name: Install dependencies
        run: pip install requests weasyprint markdown python-dotenv
      - name: Generate and send report
        env:
          REPORT_CSV_SOURCE: ${{ secrets.REPORT_CSV_SOURCE }}
          REPORT_STEERING: ${{ secrets.REPORT_STEERING }}
          SMTP_HOST: ${{ secrets.SMTP_HOST }}
          SMTP_USER: ${{ secrets.SMTP_USER }}
          SMTP_PASSWORD: ${{ secrets.SMTP_PASSWORD }}
          REPORT_FROM: ${{ secrets.REPORT_FROM }}
          REPORT_TO: ${{ secrets.REPORT_TO }}
        run: python weekly_chart_report.py --mode email

AWS — use EventBridge + Lambda or ECS. For Lambda, the WeasyPrint dependency requires a layer or a container image because of its native library dependencies (libpango, libcairo). ECS is simpler: build a Docker image with the dependencies installed and run it on a schedule.

Google Cloud Scheduler — trigger a Cloud Run job. Cloud Run containers have enough memory and CPU for WeasyPrint without special configuration.

Error Handling

Three failure modes matter for unattended operation:

Container TTL expiry — If more than 20 minutes pass between the upload and a file download, you will get a 404. This should not happen if the pipeline runs sequentially, but it can happen if an intermediate step hangs. Detect it:

resp = requests.get(url, timeout=30)
if resp.status_code == 404:
    raise RuntimeError(
        f"Container {container_id} expired before chart download completed. "
        "Re-run the full pipeline."
    )
resp.raise_for_status()

Empty chart list — occasionally the refine call returns no charts (the model determined the story did not need them). Handle it gracefully rather than crashing the assembly step:

if not result["charts"]:
    print("Warning: no charts generated. Report will be narrative-only.")

SMTP or PDF write failures — these happen after the analysis is complete, so the work is not lost. Catch delivery failures separately and surface them loudly:

try:
    send_email_report(report, ...)
except Exception as e:
    # Log the error with context
    print(f"Delivery failed: {e}", file=sys.stderr)
    # Save the PDF as a fallback
    export_pdf(report, f"/tmp/report_unsent_{date.today()}.pdf")
    raise

Persisting Charts for Audit Trails

For regulated environments or anywhere you need to reproduce a past report, save the chart PNGs and narrative to durable storage before the temp directory is deleted:

import boto3
import json

def persist_report_to_s3(report: dict, bucket: str, prefix: str) -> dict:
    """Upload charts and narrative to S3 for archival."""
    s3 = boto3.client("s3")
    today = date.today().isoformat()
    stored_charts = []

    for chart in report["charts"]:
        key = f"{prefix}/{today}/{os.path.basename(chart['path'])}"
        s3.upload_file(
            chart["path"], bucket, key,
            ExtraArgs={"ContentType": "image/png"}
        )
        stored_charts.append({
            "s3_key": key,
            "caption": chart["caption"],
            "file_id": chart["file_id"],
        })
        print(f"Archived: s3://{bucket}/{key}")

    # Also store the narrative as JSON
    meta_key = f"{prefix}/{today}/report_metadata.json"
    s3.put_object(
        Bucket=bucket,
        Key=meta_key,
        Body=json.dumps({
            "title": report["title"],
            "narrative": report["narrative"],
            "charts": stored_charts,
            "generated_at": today,
        }).encode("utf-8"),
        ContentType="application/json",
    )

    return stored_charts

Call persist_report_to_s3 inside the with tempfile.TemporaryDirectory(...) block, before the context manager exits and deletes the files.

What to Read Next

For the mechanics of chart file downloading in detail — including base64 embedding, CID attachments, and optimization — see download and embed AI-generated charts.

For automatic chart generation from CSV without the full report pipeline, see how to generate charts from CSV data automatically.

For the broader automated reporting pipeline including narrative delivery and multi-audience segmentation, see automating weekly data reports.

Ready to find your data story?

Upload a CSV and DataStoryBot will uncover the narrative in seconds.

Try DataStoryBot →