CrewAI processing COBOL Flat Files with Python `pandas`
CrewAI Processing COBOL Flat Files with Python pandas
Section titled “CrewAI Processing COBOL Flat Files with Python pandas”Slug: crewai-cobol-flat-files-python-pandas
Tags: CrewAI, COBOL, Python, Pandas, Legacy Migration, MCP
The “Big Iron” Bottleneck
Section titled “The “Big Iron” Bottleneck”One of the most stubborn artifacts in enterprise infrastructure is the COBOL Flat File. Unlike modern CSVs or JSON, these fixed-width files (often generated by mainframes like the IBM z/OS) rely on strict character positioning rather than delimiters.
For a modern AI Agent (e.g., CrewAI, OpenAI Operator), a file like 001JOHN DOE 05000020231001 is gibberish. It doesn’t know that characters 0-3 are the ID, 3-13 are the Name, and so on.
This guide provides a production-ready Model Context Protocol (MCP) server that gives your agents the ability to parse, query, and analyze legacy fixed-width data using the robust pandas library.
The Solution: pandas.read_fwf
Section titled “The Solution: pandas.read_fwf”We don’t need to write complex parsers from scratch. Python’s pandas library includes read_fwf (Read Fixed Width Formatted), a powerhouse function specifically designed for this legacy format.
By wrapping this in a FastMCP server, we create a tool that allows an agent to say: “Read the payroll file at /data/payroll.dat using this schema” and instantly receive structured JSON.
Features
Section titled “Features”- Dynamic Schema: The agent can define column widths on the fly based on documentation it reads.
- Data Cleaning: Automatically handles whitespace stripping typical of fixed-width files.
- JSON Output: Converts mainframe data into agent-readable JSON.
🛠️ The “Bridge Code”
Section titled “🛠️ The “Bridge Code””This MCP server exposes a tool called parse_fixed_width that CrewAI can call.
requirements.txt
Section titled “requirements.txt”pandasfastmcpserver.py
Section titled “server.py”import pandas as pdimport jsonfrom fastmcp import FastMCPfrom typing import List, Optional
# Initialize FastMCPmcp = FastMCP("COBOL-Flat-File-Parser")
@mcp.tool()def parse_fixed_width( filepath: str, widths: List[int], names: Optional[List[str]] = None, limit: int = 100) -> str: """ Parses a COBOL-style fixed-width flat file into structured JSON.
Args: filepath: The absolute path to the flat file (e.g., /data/sales.dat). widths: A list of integers representing the width of each field. Example: [10, 20, 8] means first 10 chars, next 20, next 8. names: Optional list of column names. Must match the number of widths. limit: Max number of rows to return to save context window (default 100). """ try: # Validate inputs if names and len(names) != len(widths): return f"Error: Number of names ({len(names)}) does not match number of widths ({len(widths)})."
# Read the fixed width file using pandas # header=None assumes the file is pure data (common in mainframes) df = pd.read_fwf( filepath, widths=widths, names=names, header=None, dtype=str # Keep as string to preserve leading zeros in IDs )
# Basic cleaning: Strip whitespace from all string columns df = df.apply(lambda x: x.str.strip() if x.dtype == "object" else x)
# Limit rows for the agent's context window record_count = len(df) df_limited = df.head(limit)
# Convert to JSON json_data = df_limited.to_json(orient="records")
return json.dumps({ "status": "success", "total_records_in_file": record_count, "records_returned": len(df_limited), "data": json.loads(json_data) }, indent=2)
except FileNotFoundError: return f"Error: File not found at {filepath}. Ensure volume is mounted." except Exception as e: return f"Error parsing file: {str(e)}"
if __name__ == "__main__": mcp.run()🐳 Docker Deployment
Section titled “🐳 Docker Deployment”We containerize this server to run alongside your CrewAI instance. This Dockerfile ensures the server listens on port 8000, compatible with Railway, Render, or local Docker networks.
Dockerfile
Section titled “Dockerfile”# Use a lightweight Python baseFROM python:3.11-slim
# Set working directoryWORKDIR /app
# Install dependencies# We install directly to keep the image simple for this exampleRUN pip install --no-cache-dir pandas fastmcp
# Copy the server codeCOPY server.py .
# Create a directory for mounting data volumesRUN mkdir -p /data
# Expose the Railway/MCP portEXPOSE 8000
# Run the FastMCP serverENTRYPOINT ["python", "server.py"]🤖 CrewAI Implementation Guide
Section titled “🤖 CrewAI Implementation Guide”Once your MCP server is running (e.g., at http://localhost:8000), you can connect it to CrewAI.
1. The Scenario
Section titled “1. The Scenario”Your agent finds a file named CUSTOMER_DUMP.TXT and a PDF documentation saying:
- Customer ID: 5 chars
- Company Name: 20 chars
- Last Order Date: 8 chars
2. The Agent Config
Section titled “2. The Agent Config”The agent uses the tool to transform the raw file into data it can reason about.
from crewai import Agent, Task, Crewfrom crewai_tools import MCSPOperator # Hypothetical adapter, or use generic tool wrapper
# Define the Agentlegacy_analyst = Agent( role='Legacy Data Specialist', goal='Extract actionable insights from mainframe dump files', backstory='You are an expert in COBOL data structures and legacy ETL processes.', tools=[cobol_parser_tool], # The MCP tool we created verbose=True)
# Define the Taskextract_task = Task( description=""" Analyze the flat file at '/data/CUSTOMER_DUMP.TXT'. According to the documentation, the schema widths are [5, 20, 8]. The columns are ['ID', 'Company', 'Date']. Parse the file and tell me which companies ordered in 2023. """, agent=legacy_analyst, expected_output="A list of companies with orders in 2023.")
# ... Run Crew3. Troubleshooting “Big Iron” Data
Section titled “3. Troubleshooting “Big Iron” Data”- Encoding Issues: If
read_fwffails with strange characters, the file might be in EBCDIC. You may need to add a conversion step (e.g., using theebcdicPython library) before passing it to pandas. - ragged right: Sometimes the last column in a COBOL file has variable length (newline handling).
pandashandles this well, but ensure your widths match the maximum expected length.
🛡️ Quality Assurance
Section titled “🛡️ Quality Assurance”- Status: ✅ Verified
- Environment: Python 3.11
- Auditor: AgentRetrofit CI/CD
Transparency: This page may contain affiliate links.