generalMarch 24, 20267 min read

How to Use the OpenAI Containers API for File-Based Workflows

Technical deep-dive into the OpenAI Containers API: creating containers, uploading files, managing TTLs, and retrieving outputs for data analysis workflows.

By DataStoryBot Team

How to Use the OpenAI Containers API for File-Based Workflows

The OpenAI Containers API is the plumbing underneath Code Interpreter. When you upload a file to ChatGPT and ask it to analyze your data, a container spins up, your file gets mounted, Python runs inside it, and the results come back. The Containers API lets you do the same thing programmatically — without the chat interface.

This matters for data workflows because files are the interface. You upload a CSV, the container processes it, and you retrieve the output files (charts, transformed datasets, analysis results). Understanding the container lifecycle — creation, file mounting, execution, and expiry — is essential for building reliable data pipelines.

DataStoryBot uses the Containers API internally. This article explains the API layer that makes it work.

Container Lifecycle

A container goes through four phases:

Create → Upload Files → Execute Code → Expire

Create: You get a container ID. The container is an isolated Linux environment with Python, pandas, matplotlib, seaborn, numpy, scipy, and other common data science libraries pre-installed. No network access — the container is sandboxed.

Upload Files: Mount your data files into the container's filesystem. Files are accessible at a standard path inside the container. You can upload multiple files.

Execute Code: Run Python code inside the container via the Responses API with the code_interpreter tool. The code can read your uploaded files, process them, generate charts, write new files, and return results.

Expire: Containers have a TTL (time-to-live). After the TTL expires, the container and all its files are deleted. Default is 30 minutes; DataStoryBot uses 20 minutes. You can set custom TTLs.

Creating a Container

curl -X POST https://api.openai.com/v1/containers \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "sales-analysis",
    "expires_after": {
      "anchor": "last_active_at",
      "minutes": 30
    }
  }'

{
  "id": "ctr_abc123def456",
  "name": "sales-analysis",
  "status": "active",
  "expires_after": {
    "anchor": "last_active_at",
    "minutes": 30
  },
  "created_at": 1711234567,
  "expires_at": 1711236367
}

The expires_after configuration controls TTL behavior:

anchor: "last_active_at" — TTL resets every time the container is used (file upload, code execution). The container stays alive as long as you keep interacting with it.
anchor: "created_at" — TTL is absolute from creation time. The container dies regardless of activity.

For interactive analysis workflows, use last_active_at. For batch pipelines where you want predictable cleanup, use created_at.

Uploading Files

curl -X POST "https://api.openai.com/v1/containers/ctr_abc123def456/files" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -F "file=@quarterly_sales.csv"

{
  "id": "file-xyz789",
  "container_id": "ctr_abc123def456",
  "name": "quarterly_sales.csv",
  "size": 245780,
  "created_at": 1711234600
}

You can upload multiple files to the same container. Each file gets a unique ID and is accessible inside the container's filesystem. The Code Interpreter knows the file paths and can reference them in generated code.

Size limits: Individual files can be up to 512 MB. Total container storage is 10 GB. For most CSV analysis workflows, you'll never hit these limits — but if you're working with large datasets, be aware of them.

Executing Code

Code execution happens through the Responses API with the code_interpreter tool enabled and the container attached:

curl -X POST https://api.openai.com/v1/responses \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "instructions": "You are a data analyst. Analyze the uploaded CSV file.",
    "input": "Analyze quarterly_sales.csv. Find the top-performing regions and create a bar chart comparing regional revenue.",
    "tools": [
      {
        "type": "code_interpreter",
        "container": {
          "id": "ctr_abc123def456"
        }
      }
    ]
  }'

The response includes the generated code, its output, and any files created:

{
  "id": "resp_001",
  "output": [
    {
      "type": "code_interpreter_call",
      "id": "ci_001",
      "code": "import pandas as pd\nimport matplotlib.pyplot as plt\n\ndf = pd.read_csv('/mnt/data/quarterly_sales.csv')\nrevenue_by_region = df.groupby('region')['revenue'].sum().sort_values(ascending=False)\n\nfig, ax = plt.subplots(figsize=(10, 6))\nrevenue_by_region.plot(kind='bar', ax=ax, color='#2563eb')\nax.set_title('Revenue by Region')\nax.set_ylabel('Revenue ($)')\nplt.tight_layout()\nplt.savefig('/mnt/data/regional_revenue.png', dpi=150)\nprint(revenue_by_region.to_string())",
      "results": [
        {
          "type": "text",
          "text": "region\nWest       2847391\nNortheast  2134567\nSoutheast  1923456\nMidwest    1567890\nSouthwest  1234567"
        },
        {
          "type": "file",
          "file": {
            "id": "file-chart001",
            "name": "regional_revenue.png",
            "size": 34567
          }
        }
      ]
    },
    {
      "type": "message",
      "content": "The West region leads revenue at $2.85M, followed by Northeast at $2.13M..."
    }
  ]
}

The code runs inside the container. Files written to /mnt/data/ persist for the container's lifetime and can be retrieved via the files API.

Retrieving Output Files

curl -X GET "https://api.openai.com/v1/containers/ctr_abc123def456/files/file-chart001/content" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  --output regional_revenue.png

This downloads the chart PNG that Code Interpreter generated. You can retrieve any file in the container — both uploaded inputs and generated outputs.

To list all files in a container:

curl -X GET "https://api.openai.com/v1/containers/ctr_abc123def456/files" \
  -H "Authorization: Bearer $OPENAI_API_KEY"

Container Management

Check container status:

curl -X GET "https://api.openai.com/v1/containers/ctr_abc123def456" \
  -H "Authorization: Bearer $OPENAI_API_KEY"

Delete a container early:

curl -X DELETE "https://api.openai.com/v1/containers/ctr_abc123def456" \
  -H "Authorization: Bearer $OPENAI_API_KEY"

List your containers:

curl -X GET "https://api.openai.com/v1/containers" \
  -H "Authorization: Bearer $OPENAI_API_KEY"

For production workflows, always delete containers after you've retrieved your results. Don't rely on TTL expiry alone — explicit cleanup prevents hitting concurrent container limits.

Python SDK Example

The OpenAI Python SDK wraps these endpoints:

from openai import OpenAI

client = OpenAI()

# Create container
container = client.containers.create(
    name="sales-analysis",
    expires_after={"anchor": "last_active_at", "minutes": 20}
)

# Upload file
with open("quarterly_sales.csv", "rb") as f:
    file = client.containers.files.create(
        container_id=container.id,
        file=f
    )

# Run analysis
response = client.responses.create(
    model="gpt-4o",
    input="Analyze quarterly_sales.csv and create visualizations for the key findings.",
    tools=[{
        "type": "code_interpreter",
        "container": {"id": container.id}
    }]
)

# Extract generated files
for item in response.output:
    if item.type == "code_interpreter_call":
        for result in item.results:
            if result.type == "file":
                content = client.containers.files.content(
                    container_id=container.id,
                    file_id=result.file.id
                )
                with open(result.file.name, "wb") as f:
                    f.write(content.read())
                print(f"Saved: {result.file.name}")

# Cleanup
client.containers.delete(container.id)

How DataStoryBot Uses Containers

DataStoryBot's three-endpoint flow maps directly to container operations:

/upload → Creates a container, uploads the CSV, returns containerId
/analyze → Runs Code Interpreter in the container with a structured prompt, extracts story angles from the output
/refine → Runs a follow-up Code Interpreter call in the same container, generating the full narrative and charts for a selected story

The container reuse is important. Between /analyze and /refine, any DataFrames created during analysis are still in memory (if the Code Interpreter session persists) and files are still on disk. This means the refine step can build on the analysis step's work without re-reading and re-processing the original CSV.

DataStoryBot sets a 20-minute TTL with last_active_at anchoring. Every API call resets the clock, giving users time to browse the story angles before choosing one to refine.

TTL Strategy for Different Use Cases

Interactive analysis (20-30 min, last_active_at): For applications where users upload data and explore results interactively. The container stays alive as long as the user is engaged.

Batch processing (5-10 min, created_at): For pipelines that upload, analyze, retrieve results, and move on. Short fixed TTLs prevent container accumulation.

Extended sessions (60 min, last_active_at): For applications that need longer analysis windows — multi-step workflows, iterative refinement, or situations where users might step away and come back.

The cost implication: containers consume compute resources while alive. Shorter TTLs = lower costs. DataStoryBot's 20-minute window balances user experience against resource usage.

Error Handling

Common failure modes with the Containers API:

Container expired: If you try to use a container after its TTL, you'll get a 404. Solution: check container status before operations, or catch 404s and re-create.

File too large: 512 MB per file limit. Solution: pre-aggregate or chunk large files before upload.

Code execution timeout: Code Interpreter has an execution timeout (typically 300 seconds). Complex operations on large datasets can hit this. Solution: pre-process data to reduce size, or use simpler analysis prompts.

Concurrent container limits: There's a cap on how many containers you can have active simultaneously. Solution: delete containers promptly after use, don't rely on TTL alone.

What to Read Next

For the complete Code Interpreter architecture including the Responses API flow, see OpenAI Code Interpreter for data analysis: a complete guide.

To see how DataStoryBot builds on these primitives to create a higher-level data analysis API, read getting started with the DataStoryBot API.

For the Responses API patterns that drive code execution, see building a Code Interpreter workflow with the Responses API.

Ready to find your data story?

Upload a CSV and DataStoryBot will uncover the narrative in seconds.

Try DataStoryBot →