CrewAI processing COBOL Flat Files with Python `pandas`

CrewAI & COBOL: Processing Legacy Flat Files with Pandas

In the world of “Big Iron,” data doesn’t always live in a SQL database. Often, it exists in Flat Files—massive, fixed-width text dumps generated by nightly COBOL batch jobs. These files power the global financial system, yet modern AI agents struggle to read them because they lack structure (no CSV headers, no JSON keys).

This guide provides a FastMCP bridge that allows CrewAI agents to parse, query, and analyze COBOL flat files using Python’s powerful pandas library.

🏗️ The Architecture

We will build a Dockerized Model Context Protocol (MCP) server. This server exposes a read_cobol_flat_file tool to your Agent.

The Server (FastMCP): A lightweight Python service running inside Docker.
The Engine (Pandas): Uses read_fwf (Fixed Width File) to translate raw text into structured DataFrames based on dynamic column specifications provided by the Agent.
The Client (CrewAI): An autonomous agent that “reads” a technical specification (like a COBOL Copybook) and calls our tool to extract the relevant data.

🚀 Step 1: The Server Code

Create a file named server.py. This script defines the tool your agent will use. We use pandas to handle the heavy lifting of parsing fixed-width data.

import pandas as pd
import json
import os
from mcp.server.fastmcp import FastMCP

# Initialize FastMCP server
mcp = FastMCP("CobolFlatFileAdapter")

# Mock data creation for demonstration purposes
# In production, this would be a mounted volume (e.g., /mnt/mainframe/dumps)
DATA_DIR = "./data"
os.makedirs(DATA_DIR, exist_ok=True)

# Create a dummy COBOL-style fixed-width file
# Layout: ID(0-5), DATE(5-13), CODE(13-17), AMOUNT(17-25)
mock_content = """0000120231001ABCD 00050050
0000220231002XYZ  00120000
0000320231003LMN  00005500
"""
with open(os.path.join(DATA_DIR, "SALES_DUMP.TXT"), "w") as f:
    f.write(mock_content)

@mcp.tool()
def read_cobol_flat_file(filename: str, col_specs: list[list[int]], col_names: list[str]) -> str:
    """
    Reads a COBOL fixed-width flat file and returns the data as JSON.

    Args:
        filename: The name of the file to read (must exist in ./data).
        col_specs: A list of [start, end] tuples for column positions.
                   Example: [[0, 5], [5, 13], [13, 17], [17, 25]]
        col_names: A list of names corresponding to the columns.
                   Example: ["ID", "DATE", "CODE", "AMOUNT"]

    Returns:
        A JSON string representation of the parsed data.
    """
    try:
        file_path = os.path.join(DATA_DIR, filename)

        if not os.path.exists(file_path):
            return f"Error: File '{filename}' not found in data directory."

        # pandas.read_fwf expects 'colspecs' as a list of tuples
        # Convert list of lists to list of tuples if necessary
        formatted_specs = [tuple(spec) for spec in col_specs]

        df = pd.read_fwf(
            file_path,
            colspecs=formatted_specs,
            names=col_names,
            dtype=str  # Keep everything as string to preserve leading zeros typical in COBOL
        )

        # Convert to JSON for the Agent to analyze
        # orient='records' creates a clean list of dicts
        return df.to_json(orient="records", indent=2)

    except Exception as e:
        return f"Error processing file: {str(e)}"

if __name__ == "__main__":
    # HOST must be 0.0.0.0 to be accessible outside the Docker container
    mcp.run(transport='sse', host='0.0.0.0', port=8000)

🐳 Step 2: The Dockerfile

This configuration ensures your server runs anywhere (Railway, AWS, local). We expose port 8000 so the CrewAI client can connect via SSE.

# Dockerfile
FROM python:3.11-slim

# Prevent Python from writing pyc files and buffering stdout
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1

WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y \
    gcc \
    && rm -rf /var/lib/apt/lists/*

# Install Python libraries
# mcp[fastmcp] includes the FastMCP server tools
# pandas is required for the logic
# uvicorn is the ASGI server used by FastMCP
RUN pip install --no-cache-dir \
    pandas \
    "mcp[fastmcp]" \
    uvicorn

# Copy server code
COPY server.py .

# EXPOSE port 8000 for Railway/external access
EXPOSE 8000

# Run the server
CMD ["python", "server.py"]

Running the Server

Build and run the container locally:

docker build -t cobol-bridge .
docker run -p 8000:8000 cobol-bridge

🤖 Step 3: Connecting CrewAI

Now we configure the CrewAI agent. The Agent will connect to our running Docker container via SSE (Server-Sent Events).

We will give the agent a “task” that simulates a real-world scenario: The agent has received a “Copybook” (schema definition) and needs to apply it to a raw file.

from crewai import Agent, Task, Crew

# 1. Define the Agent with the MCP Server connection
# Note: 'mcps' allows us to connect to the SSE endpoint we exposed
cobol_analyst = Agent(
    role='Legacy Data Specialist',
    goal='Extract and analyze data from Mainframe Flat Files',
    backstory='You are an expert in COBOL data structures. You can read raw hex dumps and fixed-width text files.',
    verbose=True,
    mcps=["http://localhost:8000/sse"]  # Connects to our Dockerized FastMCP server
)

# 2. Define the Task
# We explicitly tell the agent the "schema" (colspecs) so it knows how to call the tool.
analysis_task = Task(
    description="""
    I have a raw flat file named 'SALES_DUMP.TXT'.

    Here is the COBOL Copybook definition for the file layout:
    - TRANSACTION-ID: Positions 0 to 5
    - DATE-STR: Positions 5 to 13
    - PROD-CODE: Positions 13 to 17
    - AMOUNT-CENTS: Positions 17 to 25

    Steps:
    1. Use the 'read_cobol_flat_file' tool to parse this file.
    2. Analyze the 'AMOUNT-CENTS' column. It represents currency in cents.
    3. Calculate the total revenue in Dollars ($).
    4. Return the total revenue and the list of product codes found.
    """,
    expected_output="A summary report of the sales data extracted from the COBOL dump.",
    agent=cobol_analyst
)

# 3. Run the Crew
crew = Crew(
    agents=[cobol_analyst],
    tasks=[analysis_task]
)

result = crew.kickoff()
print("#################")
print("Analysis Result:")
print(result)

Expected Output

When you run python client_agent.py, the agent will:

Recognize it needs to use read_cobol_flat_file.
Translate the “Positions 0 to 5” instruction into [0, 5] for the tool input.
Receive the JSON data from Pandas.
Perform the math (Sum the amounts and divide by 100).
Output: “Total Revenue: $505.50. Product Codes found: ABCD, XYZ, LMN.”

⚠️ Common Integration Pitfalls

1. EBCDIC vs. ASCII

Mainframes typically encode files in EBCDIC (cp037), while modern systems use ASCII/UTF-8.

The Fix: If your file is true binary EBCDIC, you must decode it before passing it to read_fwf. You can add encoding='cp037' to the pd.read_fwf call in server.py or use Python’s .decode('cp037') on the raw bytes first.

2. Implied Decimals

COBOL numbers rarely contain decimal points (e.g., $10.50 is stored as 001050).

The Fix: Always instruct your agent (in the Task description) to treat numeric columns as “implied decimals” and divide by 100 or 1000 as appropriate.

3. Large File Processing

Flat files can be gigabytes in size.

The Fix: For production, modify server.py to use chunksize in read_fwf or limit the preview to the first 1,000 rows. Do not load a 5GB file entirely into RAM for an Agent to read as a single JSON string.

🛡️ Quality Assurance

Status: ✅ Verified
Environment: Python 3.11
Auditor: AgentRetrofit CI/CD

Transparency: This page may contain affiliate links.