generalMarch 24, 202610 min read

Managing Container Lifecycle: TTLs, Cleanup, and Cost Control

Practical guide to container TTL management, cost implications, and cleanup strategies for OpenAI Code Interpreter containers.

By DataStoryBot Team

Managing Container Lifecycle: TTLs, Cleanup, and Cost Control

Every container you create with the OpenAI Containers API represents compute resources that cost money. Containers run in an isolated Linux environment, persist files between API calls, and stay alive until their TTL expires or you delete them explicitly. If you're running containers carelessly — leaving them alive long after work is done, or spinning up new ones when an existing one would do — you're wasting money.

This article covers the mechanics of container lifecycle management, the cost model, and concrete strategies for keeping costs predictable.

For background on the Containers API itself, see How to Use the OpenAI Containers API for File-Based Workflows. For a broader treatment of what Code Interpreter is doing inside those containers, see OpenAI Code Interpreter: Complete Guide.

The Container Lifecycle

A container moves through a simple state machine:

created → active → expired
                ↘ deleted (explicit)

Created: The container exists. It has an ID, a name, and a configured TTL. No compute is happening yet.

Active: Someone interacted with the container — a file was uploaded, or code was executed. The last_active_at timestamp updates. If the TTL anchor is last_active_at, the expiry clock resets.

Expired: The TTL elapsed without activity (or from creation, depending on anchor). The container and all its files are permanently deleted. You cannot recover files from an expired container.

Deleted: You called DELETE /v1/containers/{id} explicitly. Same result as expiry — gone immediately, no recovery.

Understanding which state your containers are in at any given time is the foundation of cost management.

TTL Configuration

When you create a container, you configure two things: the anchor (what triggers the TTL countdown) and the duration.

curl -X POST https://api.openai.com/v1/containers \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "analysis-job-7842",
    "expires_after": {
      "anchor": "last_active_at",
      "minutes": 20
    }
  }'

anchor: "last_active_at" — The expiry time resets on every interaction. A container configured for 20 minutes with this anchor will stay alive as long as you keep using it, expiring 20 minutes after the last file upload or code execution call. Good for interactive sessions and multi-step workflows where you don't know upfront how long the work will take.

anchor: "created_at" — The TTL counts down from container creation, regardless of activity. A container configured for 20 minutes with this anchor dies 20 minutes after creation, full stop. Good for batch jobs where you want hard guarantees about cleanup time.

DataStoryBot uses 20-minute TTLs with last_active_at. A user's analysis session typically completes well within that window. If they come back to the same dataset, the container is likely expired, so a new one gets created. The 20-minute window is long enough to cover multi-step workflows without leaving containers alive indefinitely.

Cost Implications

The OpenAI billing model for containers charges based on compute time — specifically, the time your container is alive and consuming resources. This is distinct from token costs for the model calls that run inside containers.

The practical implications:

Short-lived containers cost less. A container alive for 5 minutes costs roughly a quarter of a container alive for 20 minutes, for the same amount of actual computation. The idle time matters.

Concurrent containers multiply costs. If you're processing requests in parallel — 10 simultaneous analysis jobs — you have 10 containers running. Each one has its own TTL clock ticking. Monitor concurrent container count as a key metric.

Premature expiry has a hidden cost too. If you set a TTL that's too short and a container expires mid-workflow, you have to recreate it, re-upload files, and re-run setup code. That recovery work costs both time and tokens. Overly aggressive TTLs trade compute cost for latency and token cost.

File storage inside containers is transient. Unlike OpenAI Files (the persistent file storage API), container files don't persist beyond the container's lifetime. You don't pay a separate storage fee for container files — but you also lose everything when the container expires. Design your workflow to extract outputs before expiry.

Strategies by Workflow Type

One-Off Analysis

User uploads a CSV, gets a report, done. No follow-up questions, no iterative refinement.

Use the shortest TTL that safely covers the analysis time, with created_at as the anchor. If your analysis typically completes in 3-4 minutes, a 10-minute TTL gives you a reasonable buffer without leaving the container around unnecessarily.

def create_oneoff_container(name: str) -> str:
    """Short TTL for single-pass analysis jobs."""
    response = client.containers.create(
        name=name,
        expires_after={
            "anchor": "created_at",
            "minutes": 10
        }
    )
    return response.id

After the analysis completes, don't wait for the TTL — delete the container explicitly:

def run_analysis_and_cleanup(file_path: str, prompt: str) -> dict:
    container_id = create_oneoff_container(f"analysis-{uuid.uuid4().hex[:8]}")
    try:
        file_id = upload_file(container_id, file_path)
        result = execute_analysis(container_id, file_id, prompt)
        return result
    finally:
        # Always clean up, even if analysis fails
        client.containers.delete(container_id)

The finally block ensures cleanup happens regardless of success or failure. This is the most cost-efficient pattern for batch workloads.

Multi-Step Interactive Workflow

User uploads data, asks an initial question, sees results, asks follow-ups. The container needs to stay alive across multiple API calls, and you don't know upfront how long the session will run.

Use last_active_at as the anchor. Each interaction resets the TTL clock. The container expires when the user goes idle.

class AnalysisSession:
    def __init__(self, ttl_minutes: int = 20):
        self.container_id = self._create_container(ttl_minutes)
        self.file_ids: list[str] = []
        self.created_at = time.time()

    def _create_container(self, ttl_minutes: int) -> str:
        response = client.containers.create(
            name=f"session-{uuid.uuid4().hex[:8]}",
            expires_after={
                "anchor": "last_active_at",
                "minutes": ttl_minutes
            }
        )
        return response.id

    def upload(self, file_path: str) -> str:
        file_id = upload_file(self.container_id, file_path)
        self.file_ids.append(file_id)
        return file_id

    def ask(self, prompt: str) -> dict:
        return execute_analysis(self.container_id, self.file_ids, prompt)

    def close(self):
        """Explicit cleanup when session ends."""
        try:
            client.containers.delete(self.container_id)
        except Exception:
            pass  # Container may already be expired

    def __enter__(self):
        return self

    def __exit__(self, *args):
        self.close()

Using the session as a context manager guarantees cleanup:

with AnalysisSession(ttl_minutes=20) as session:
    session.upload("q3_sales.csv")
    first_result = session.ask("Summarize the key trends")
    second_result = session.ask("Compare this to the previous quarter")
    # Container deleted when block exits

Batch Pipeline

Scheduled job that processes many files — nightly reports, weekly summaries, automated data refreshes. Predictability matters more than interactivity.

Use created_at anchors with conservative TTLs, and always delete explicitly on completion. Track container creation and deletion in your job logs so you can audit for orphans.

def run_batch_pipeline(files: list[str], ttl_minutes: int = 15) -> list[dict]:
    results = []
    container_id = None

    try:
        container_id = client.containers.create(
            name=f"batch-{datetime.now().strftime('%Y%m%d-%H%M%S')}",
            expires_after={
                "anchor": "created_at",
                "minutes": ttl_minutes
            }
        ).id

        logger.info("container_created", container_id=container_id)

        for file_path in files:
            file_id = upload_file(container_id, file_path)
            result = execute_analysis(container_id, file_id, BATCH_PROMPT)
            results.append(result)

        return results

    finally:
        if container_id:
            try:
                client.containers.delete(container_id)
                logger.info("container_deleted", container_id=container_id)
            except Exception as e:
                logger.warning("container_delete_failed",
                               container_id=container_id, error=str(e))

    return results

Monitoring and Auditing

Listing Active Containers

The Containers API supports listing containers so you can see what's currently running:

curl https://api.openai.com/v1/containers \
  -H "Authorization: Bearer $OPENAI_API_KEY"

{
  "object": "list",
  "data": [
    {
      "id": "ctr_abc123",
      "name": "session-7a3f",
      "status": "active",
      "created_at": 1711234567,
      "expires_at": 1711235767,
      "last_active_at": 1711235467
    },
    {
      "id": "ctr_def456",
      "name": "batch-20260324-143022",
      "status": "active",
      "created_at": 1711235000,
      "expires_at": 1711235900
    }
  ]
}

Build a monitoring script that alerts when active container count exceeds expected levels:

def check_container_health(max_expected: int = 50) -> dict:
    containers = client.containers.list()
    active = [c for c in containers.data if c.status == "active"]

    now = time.time()
    stale_threshold = 30 * 60  # 30 minutes
    stale = [
        c for c in active
        if (now - c.last_active_at) > stale_threshold
    ]

    return {
        "active_count": len(active),
        "stale_count": len(stale),
        "stale_ids": [c.id for c in stale],
        "alert": len(active) > max_expected or len(stale) > 0
    }

Detecting and Cleaning Up Orphaned Containers

An orphaned container is one that should be dead but isn't — typically because a process crashed before it could call delete, or an application bug skipped cleanup. These are pure waste.

def cleanup_orphans(max_age_minutes: int = 60) -> int:
    """Delete containers older than max_age_minutes that are still active."""
    containers = client.containers.list()
    now = time.time()
    cutoff = now - (max_age_minutes * 60)
    deleted = 0

    for container in containers.data:
        if container.status != "active":
            continue
        if container.created_at < cutoff:
            try:
                client.containers.delete(container.id)
                logger.info("orphan_deleted",
                            container_id=container.id,
                            age_minutes=int((now - container.created_at) / 60))
                deleted += 1
            except Exception as e:
                logger.error("orphan_delete_failed",
                             container_id=container.id, error=str(e))

    return deleted

Run this as a scheduled job — every 15 minutes is reasonable for most workloads. Any container older than your maximum expected workflow duration is a candidate for forced cleanup.

Naming Conventions for Auditability

Use structured names that encode context into the container ID. This makes it easier to trace containers back to the jobs that created them:

def container_name(job_type: str, job_id: str) -> str:
    """
    Examples:
      container_name("batch", "nightly-2026-03-24") → "batch-nightly-2026-03-24"
      container_name("session", "user-4521") → "session-user-4521"
    """
    return f"{job_type}-{job_id}"

When you see batch-nightly-2026-03-24 in your container list, you immediately know it's a stale batch job that should have been cleaned up. Unnamed or randomly-named containers are hard to reason about at scale.

What Happens on Expiry

When a container expires, several things happen simultaneously:

All files inside the container are deleted. You cannot retrieve them via the Files API or the Containers API after expiry.
Any in-progress code execution is terminated.
The container ID becomes invalid. Subsequent API calls referencing it will return 404.

The practical implication: extract all outputs before the container expires. If you generated charts or transformed datasets inside the container, retrieve them while the container is still alive.

def extract_outputs(container_id: str, output_file_ids: list[str]) -> list[bytes]:
    """Retrieve generated files from container before it expires."""
    outputs = []
    for file_id in output_file_ids:
        content = client.containers.files.retrieve_content(
            container_id=container_id,
            file_id=file_id
        )
        outputs.append(content)
    return outputs

For more on the security model underlying container isolation — including what the sandbox does and doesn't protect against — see Sandboxed Python Execution: Why It Matters for Data APIs.

Practical Cost Targets

There's no universal "right" TTL — it depends on your workload. Some rough guidelines:

Workflow Type	Recommended TTL	Anchor	Explicit Delete?
Single analysis, automated	10 min	`created_at`	Yes, always
Interactive session	20 min	`last_active_at`	Yes, on session end
Batch pipeline	15 min	`created_at`	Yes, always
Development/testing	5 min	`created_at`	Yes

Set your TTL conservatively — long enough that normal workflows complete, short enough that crashes and abandoned sessions don't leave containers running for hours. Then layer explicit delete calls on top so the TTL is a fallback rather than the primary cleanup mechanism.

Track your container-related costs as a separate line item in your OpenAI spend. If container costs are rising faster than your request volume, you likely have an orphan problem.

Summary

Container lifecycle management is not complicated, but it requires deliberate handling:

Choose the right TTL anchor (last_active_at for interactive sessions, created_at for batch jobs)
Always delete containers explicitly when work is done — don't rely on TTL expiry as your only cleanup mechanism
Extract all outputs before the container expires; files are not recoverable after expiry
Run a periodic orphan cleanup job to catch containers that escaped normal deletion
Use structured container names to make auditing and debugging tractable
Monitor active container count as a leading indicator of cost problems

The containers themselves are stateless and ephemeral by design — your application code is where lifecycle management has to happen.

Ready to find your data story?

Upload a CSV and DataStoryBot will uncover the narrative in seconds.

Try DataStoryBot →