general8 min read

Automating Weekly Data Reports with DataStoryBot

Build a cron-triggered pipeline that pulls CSV data, generates narratives and charts via the DataStoryBot API, and emails polished reports via SendGrid.

By DataStoryBot Team

Automating Weekly Data Reports with DataStoryBot

Every Monday morning, someone on your team exports a CSV, opens a notebook, builds charts, writes a summary, and emails it to stakeholders. It takes an hour or two. The charts look slightly different every time. The narrative varies depending on who writes it. And if that person is on vacation, the report either doesn't go out or it gets half-assed.

This is a pipeline problem disguised as an analyst problem. The data source is the same every week. The audience is the same. The format is the same. The only thing that changes is the data itself — and that's exactly the part an API can handle.

This article builds a complete automated reporting pipeline: a cron job pulls fresh CSV data, sends it through the DataStoryBot API to generate a narrative with charts, formats it as an HTML email, and delivers it via SendGrid. The whole thing runs unattended.

Architecture Overview

The pipeline has four stages:

[Cron Trigger] → [Fetch CSV] → [DataStoryBot API] → [SendGrid Email]
  1. Cron trigger — runs every Monday at 7:00 AM UTC (or whatever cadence you need).
  2. Fetch CSV — pull the latest export from your data warehouse, S3 bucket, or local directory.
  3. DataStoryBot API — upload the CSV, discover story angles, refine the best one into a full narrative with charts.
  4. SendGrid email — format the narrative as HTML, embed charts inline, and send to the distribution list.

Each stage is a function. If one fails, the error is clear and the fix is local.

Prerequisites

You need three things:

  • Python 3.8+ with requests, sendgrid, and markdown packages installed.
  • A SendGrid account with an API key that has mail send permissions. Free tier handles 100 emails per day.
  • A CSV data source — a file path, an S3 URL, or a database export script. This article uses a local file for clarity.

No DataStoryBot API key is required during the current open beta.

Step 1: The DataStoryBot Pipeline

Before wiring up email delivery, let's build the core function that turns a CSV into a report. This is the same three-call flow covered in the getting started guide, wrapped in a reusable function:

import requests

BASE_URL = "https://datastory.bot/api"

def generate_report(csv_path, steering=None):
    """Upload a CSV, discover stories, and return the top narrative with charts."""

    # Step 1: Upload
    with open(csv_path, "rb") as f:
        upload = requests.post(f"{BASE_URL}/upload", files={"file": f})
        upload.raise_for_status()
        upload_data = upload.json()

    container_id = upload_data["containerId"]
    metadata = upload_data["metadata"]
    print(f"Uploaded {metadata['fileName']}: {metadata['rowCount']} rows, "
          f"{metadata['columnCount']} columns")

    # Step 2: Analyze — discover story angles
    analyze_payload = {"containerId": container_id}
    if steering:
        analyze_payload["steeringPrompt"] = steering

    analyze_resp = requests.post(f"{BASE_URL}/analyze", json=analyze_payload)
    analyze_resp.raise_for_status()
    stories = analyze_resp.json()

    print(f"Found {len(stories)} story angles:")
    for s in stories:
        print(f"  - {s['title']}")

    # Step 3: Refine the top story into a full narrative
    refine_resp = requests.post(f"{BASE_URL}/refine", json={
        "containerId": container_id,
        "selectedStoryTitle": stories[0]["title"]
    })
    refine_resp.raise_for_status()
    report = refine_resp.json()

    # Download all charts as bytes
    chart_images = []
    for chart in report["charts"]:
        img_resp = requests.get(
            f"{BASE_URL}/files/{container_id}/{chart['fileId']}"
        )
        img_resp.raise_for_status()
        chart_images.append({
            "bytes": img_resp.content,
            "caption": chart["caption"],
            "cid": chart["fileId"]
        })

    return {
        "title": stories[0]["title"],
        "narrative": report["narrative"],
        "charts": chart_images
    }

The steering parameter is optional but valuable for weekly reports. If you know the audience cares about week-over-week changes, pass that context in:

report = generate_report(
    "/data/exports/weekly_metrics.csv",
    steering="Focus on week-over-week changes, anomalies, and any metrics that deviated more than 10% from the prior week."
)

Step 2: Format as HTML Email

The narrative comes back as Markdown. Convert it to HTML and embed the charts inline using Content-ID references:

import markdown

def format_email_html(report):
    """Convert the narrative to HTML and embed chart images."""

    # Convert markdown narrative to HTML
    html_narrative = markdown.markdown(
        report["narrative"],
        extensions=["tables", "fenced_code"]
    )

    # Build chart HTML with inline CID references
    charts_html = ""
    for chart in report["charts"]:
        charts_html += f'''
        <div style="background-color: #141414; padding: 16px; border-radius: 8px; margin: 16px 0;">
            <img src="cid:{chart['cid']}" alt="{chart['caption']}" width="600"
                 style="max-width: 100%; height: auto;" />
            <p style="color: #999999; font-size: 13px; margin-top: 8px;">
                {chart['caption']}
            </p>
        </div>
        '''

    # Assemble the full email body
    return f'''
    <html>
    <body style="font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif;
                 color: #333333; max-width: 680px; margin: 0 auto; padding: 24px;">
        <h1 style="font-size: 22px; color: #111111;">{report['title']}</h1>
        {html_narrative}
        <h2 style="font-size: 18px; color: #111111; margin-top: 32px;">Charts</h2>
        {charts_html}
        <hr style="border: none; border-top: 1px solid #eeeeee; margin: 32px 0;" />
        <p style="font-size: 12px; color: #999999;">
            Generated by <a href="https://datastory.bot">DataStoryBot</a>.
            Data analyzed on {{date}}.
        </p>
    </body>
    </html>
    '''

Step 3: Send via SendGrid

Now wire it up to SendGrid. The key detail is attaching chart images as inline attachments using Content-ID so they render inside the email body rather than as separate downloads:

import base64
from sendgrid import SendGridAPIClient
from sendgrid.helpers.mail import (
    Mail, Attachment, FileContent, FileName, FileType,
    Disposition, ContentId, Content, MimeType
)
from datetime import date

SENDGRID_API_KEY = "your-sendgrid-api-key"  # Use env var in production
FROM_EMAIL = "reports@yourcompany.com"
TO_EMAILS = ["team@yourcompany.com", "leadership@yourcompany.com"]

def send_report_email(report):
    """Send the formatted report via SendGrid with inline chart images."""

    html_body = format_email_html(report)
    html_body = html_body.replace("{{date}}", date.today().isoformat())

    message = Mail(
        from_email=FROM_EMAIL,
        to_emails=TO_EMAILS,
        subject=f"Weekly Data Report: {report['title']}",
        html_content=Content(MimeType.html, html_body)
    )

    # Attach each chart as an inline image
    for chart in report["charts"]:
        encoded = base64.b64encode(chart["bytes"]).decode("utf-8")
        attachment = Attachment(
            FileContent(encoded),
            FileName(f"{chart['cid']}.png"),
            FileType("image/png"),
            Disposition("inline"),
            ContentId(chart["cid"])
        )
        message.attachment = attachment

    sg = SendGridAPIClient(SENDGRID_API_KEY)
    response = sg.send(message)
    print(f"Email sent: {response.status_code}")
    return response.status_code

Step 4: Putting It Together

The three functions above — generate_report(), format_email_html(), and send_report_email() — compose into a single script. In production, pull configuration from environment variables (SENDGRID_API_KEY, REPORT_CSV_PATH, REPORT_TO_EMAILS) rather than hardcoding them.

Step 5: Schedule with Cron

Add a crontab entry to run the script every Monday at 7:00 AM UTC:

# Edit crontab
crontab -e

# Add this line — every Monday at 7:00 AM UTC
0 7 * * 1 /usr/bin/python3 /opt/reports/weekly_report.py >> /var/log/weekly_report.log 2>&1

If you're using a cloud environment, equivalent options include:

  • AWS: CloudWatch Events triggering a Lambda function or ECS task.
  • GCP: Cloud Scheduler triggering a Cloud Function.
  • GitHub Actions: A scheduled workflow with cron: '0 7 * * 1'.

For GitHub Actions, use a scheduled workflow with cron: '0 7 * * 1' and pass SENDGRID_API_KEY from repository secrets.

Error Handling for Unattended Pipelines

When a script runs unattended, silent failures are worse than loud ones. The three most common failure modes:

  1. Network timeouts — wrap the generate_report call in a retry loop with exponential backoff (30s, 60s).
  2. Container expiry — if you get a 404 on a file download or an error on analyze/refine, the 20-minute TTL expired. Re-upload and restart.
  3. SendGrid delivery failures — check the response status code and alert on anything outside 200-202.

Customizing the Report for Different Audiences

One CSV can produce different reports for different teams by varying the steeringPrompt and refinementPrompt:

audiences = {
    "executive": {
        "steering": "Focus on revenue impact and strategic metrics. High-level trends only.",
        "refinement": "Executive audience. Under 200 words. Lead with the bottom line.",
        "recipients": ["ceo@company.com", "cfo@company.com"]
    },
    "product": {
        "steering": "Focus on feature adoption, user engagement, and retention metrics.",
        "refinement": "Product team audience. Include segment breakdowns.",
        "recipients": ["product@company.com"]
    },
    "ops": {
        "steering": "Focus on operational efficiency, error rates, and throughput.",
        "refinement": "Operations team. Flag anything outside normal ranges.",
        "recipients": ["ops@company.com"]
    }
}

for audience_name, config in audiences.items():
    report = generate_report(CSV_PATH, steering=config["steering"])
    # The refinement prompt adjusts tone and focus
    # You'd pass refinementPrompt in the refine call
    send_report_email(report, recipients=config["recipients"])
    print(f"Sent {audience_name} report")

This is where automated reporting starts to pay for itself. What used to be three separate analyst tasks — build the exec report, build the product report, build the ops report — becomes three configurations of the same pipeline.

Monitoring the Pipeline

For production use, add observability:

  • Log every run with timestamps, CSV metadata (row count, file hash), and story titles selected.
  • Track email delivery via SendGrid webhooks — know if bounces or blocks occur.
  • Alert on failures — if the Monday report doesn't send by 8:00 AM, trigger a Slack notification or PagerDuty alert.

A simple health check: if the log file hasn't been updated since last Monday, something is wrong.

What to Read Next

For details on the three-step API pipeline used in this article, see the getting started guide.

To understand how DataStoryBot discovers story angles from raw CSV data — and how to steer it toward the insights that matter most — read how to generate a data report from CSV in one API call.

Or try the full pipeline interactively in the DataStoryBot playground before wiring up automation.

Ready to find your data story?

Upload a CSV and DataStoryBot will uncover the narrative in seconds.

Try DataStoryBot →