CrewAI processing COBOL Flat Files with Python `pandas`
CrewAI & COBOL: Processing Legacy Flat Files with Pandas
Section titled “CrewAI & COBOL: Processing Legacy Flat Files with Pandas”In the world of “Big Iron,” data doesn’t always live in a SQL database. Often, it exists in Flat Files—massive, fixed-width text dumps generated by nightly COBOL batch jobs. These files power the global financial system, yet modern AI agents struggle to read them because they lack structure (no CSV headers, no JSON keys).
This guide provides a FastMCP bridge that allows CrewAI agents to parse, query, and analyze COBOL flat files using Python’s powerful pandas library.
🏗️ The Architecture
Section titled “🏗️ The Architecture”We will build a Dockerized Model Context Protocol (MCP) server. This server exposes a read_cobol_flat_file tool to your Agent.
- The Server (FastMCP): A lightweight Python service running inside Docker.
- The Engine (Pandas): Uses
read_fwf(Fixed Width File) to translate raw text into structured DataFrames based on dynamic column specifications provided by the Agent. - The Client (CrewAI): An autonomous agent that “reads” a technical specification (like a COBOL Copybook) and calls our tool to extract the relevant data.
🚀 Step 1: The Server Code
Section titled “🚀 Step 1: The Server Code”Create a file named server.py. This script defines the tool your agent will use. We use pandas to handle the heavy lifting of parsing fixed-width data.
import pandas as pdimport jsonimport osfrom mcp.server.fastmcp import FastMCP
# Initialize FastMCP servermcp = FastMCP("CobolFlatFileAdapter")
# Mock data creation for demonstration purposes# In production, this would be a mounted volume (e.g., /mnt/mainframe/dumps)DATA_DIR = "./data"os.makedirs(DATA_DIR, exist_ok=True)
# Create a dummy COBOL-style fixed-width file# Layout: ID(0-5), DATE(5-13), CODE(13-17), AMOUNT(17-25)mock_content = """0000120231001ABCD 000500500000220231002XYZ 001200000000320231003LMN 00005500"""with open(os.path.join(DATA_DIR, "SALES_DUMP.TXT"), "w") as f: f.write(mock_content)
@mcp.tool()def read_cobol_flat_file(filename: str, col_specs: list[list[int]], col_names: list[str]) -> str: """ Reads a COBOL fixed-width flat file and returns the data as JSON.
Args: filename: The name of the file to read (must exist in ./data). col_specs: A list of [start, end] tuples for column positions. Example: [[0, 5], [5, 13], [13, 17], [17, 25]] col_names: A list of names corresponding to the columns. Example: ["ID", "DATE", "CODE", "AMOUNT"]
Returns: A JSON string representation of the parsed data. """ try: file_path = os.path.join(DATA_DIR, filename)
if not os.path.exists(file_path): return f"Error: File '{filename}' not found in data directory."
# pandas.read_fwf expects 'colspecs' as a list of tuples # Convert list of lists to list of tuples if necessary formatted_specs = [tuple(spec) for spec in col_specs]
df = pd.read_fwf( file_path, colspecs=formatted_specs, names=col_names, dtype=str # Keep everything as string to preserve leading zeros typical in COBOL )
# Convert to JSON for the Agent to analyze # orient='records' creates a clean list of dicts return df.to_json(orient="records", indent=2)
except Exception as e: return f"Error processing file: {str(e)}"
if __name__ == "__main__": # HOST must be 0.0.0.0 to be accessible outside the Docker container mcp.run(transport='sse', host='0.0.0.0', port=8000)🐳 Step 2: The Dockerfile
Section titled “🐳 Step 2: The Dockerfile”This configuration ensures your server runs anywhere (Railway, AWS, local). We expose port 8000 so the CrewAI client can connect via SSE.
# DockerfileFROM python:3.11-slim
# Prevent Python from writing pyc files and buffering stdoutENV PYTHONDONTWRITEBYTECODE=1ENV PYTHONUNBUFFERED=1
WORKDIR /app
# Install system dependenciesRUN apt-get update && apt-get install -y \ gcc \ && rm -rf /var/lib/apt/lists/*
# Install Python libraries# mcp[fastmcp] includes the FastMCP server tools# pandas is required for the logic# uvicorn is the ASGI server used by FastMCPRUN pip install --no-cache-dir \ pandas \ "mcp[fastmcp]" \ uvicorn
# Copy server codeCOPY server.py .
# EXPOSE port 8000 for Railway/external accessEXPOSE 8000
# Run the serverCMD ["python", "server.py"]Running the Server
Section titled “Running the Server”Build and run the container locally:
docker build -t cobol-bridge .docker run -p 8000:8000 cobol-bridge🤖 Step 3: Connecting CrewAI
Section titled “🤖 Step 3: Connecting CrewAI”Now we configure the CrewAI agent. The Agent will connect to our running Docker container via SSE (Server-Sent Events).
We will give the agent a “task” that simulates a real-world scenario: The agent has received a “Copybook” (schema definition) and needs to apply it to a raw file.
from crewai import Agent, Task, Crew
# 1. Define the Agent with the MCP Server connection# Note: 'mcps' allows us to connect to the SSE endpoint we exposedcobol_analyst = Agent( role='Legacy Data Specialist', goal='Extract and analyze data from Mainframe Flat Files', backstory='You are an expert in COBOL data structures. You can read raw hex dumps and fixed-width text files.', verbose=True, mcps=["http://localhost:8000/sse"] # Connects to our Dockerized FastMCP server)
# 2. Define the Task# We explicitly tell the agent the "schema" (colspecs) so it knows how to call the tool.analysis_task = Task( description=""" I have a raw flat file named 'SALES_DUMP.TXT'.
Here is the COBOL Copybook definition for the file layout: - TRANSACTION-ID: Positions 0 to 5 - DATE-STR: Positions 5 to 13 - PROD-CODE: Positions 13 to 17 - AMOUNT-CENTS: Positions 17 to 25
Steps: 1. Use the 'read_cobol_flat_file' tool to parse this file. 2. Analyze the 'AMOUNT-CENTS' column. It represents currency in cents. 3. Calculate the total revenue in Dollars ($). 4. Return the total revenue and the list of product codes found. """, expected_output="A summary report of the sales data extracted from the COBOL dump.", agent=cobol_analyst)
# 3. Run the Crewcrew = Crew( agents=[cobol_analyst], tasks=[analysis_task])
result = crew.kickoff()print("#################")print("Analysis Result:")print(result)Expected Output
Section titled “Expected Output”When you run python client_agent.py, the agent will:
- Recognize it needs to use
read_cobol_flat_file. - Translate the “Positions 0 to 5” instruction into
[0, 5]for the tool input. - Receive the JSON data from Pandas.
- Perform the math (Sum the amounts and divide by 100).
- Output: “Total Revenue: $505.50. Product Codes found: ABCD, XYZ, LMN.”
⚠️ Common Integration Pitfalls
Section titled “⚠️ Common Integration Pitfalls”1. EBCDIC vs. ASCII
Section titled “1. EBCDIC vs. ASCII”Mainframes typically encode files in EBCDIC (cp037), while modern systems use ASCII/UTF-8.
- The Fix: If your file is true binary EBCDIC, you must decode it before passing it to
read_fwf. You can addencoding='cp037'to thepd.read_fwfcall inserver.pyor use Python’s.decode('cp037')on the raw bytes first.
2. Implied Decimals
Section titled “2. Implied Decimals”COBOL numbers rarely contain decimal points (e.g., $10.50 is stored as 001050).
- The Fix: Always instruct your agent (in the Task description) to treat numeric columns as “implied decimals” and divide by 100 or 1000 as appropriate.
3. Large File Processing
Section titled “3. Large File Processing”Flat files can be gigabytes in size.
- The Fix: For production, modify
server.pyto usechunksizeinread_fwfor limit the preview to the first 1,000 rows. Do not load a 5GB file entirely into RAM for an Agent to read as a single JSON string.
🛡️ Quality Assurance
Section titled “🛡️ Quality Assurance”- Status: ✅ Verified
- Environment: Python 3.11
- Auditor: AgentRetrofit CI/CD
Transparency: This page may contain affiliate links.