Skip to content

CrewAI for COBOL Flat File data transformation and validation

CrewAI for COBOL Flat File Data Transformation

Section titled “CrewAI for COBOL Flat File Data Transformation”

Legacy mainframes often export data in “Flat File” formats—rigid, fixed-width text files without delimiters like commas or tabs. These files rely on COBOL Copybooks to define which byte range corresponds to which field.

Modern AI Agents (like CrewAI) struggle with this format natively. They expect structured JSON, CSV, or XML. This guide provides an MCP (Model Context Protocol) bridge that allows your CrewAI agents to parse, validate, and transform raw COBOL fixed-width data into usable JSON structures.

We will deploy a lightweight FastMCP server that acts as a transformation engine.

  1. Input: Raw fixed-width string data (simulating a file read from an FTP or mainframe dump).
  2. Processing: The MCP server applies a dynamic schema (start/end positions) to parse the bytes.
  3. Validation: It checks for data type integrity (e.g., ensuring numeric fields are actually numbers).
  4. Output: Clean JSON returned to the Agent’s context.
  • Framework: FastMCP (Python)
  • Parsing: Pandas (via read_fwf logic)
  • Transport: SSE (Server-Sent Events) over HTTP
  • Agent: CrewAI

This server provides a tool parse_fixed_width_data which takes the raw text and a schema definition. This allows the Agent to handle any flat file format as long as it knows the column specifications.

from fastmcp import FastMCP
from pydantic import BaseModel, Field
from typing import List, Dict, Any, Optional
import pandas as pd
import io
import json
# Initialize FastMCP
mcp = FastMCP("CobolTransformer")
class ColumnSpec(BaseModel):
name: str
width: int
dtype: str = "str" # options: str, int, float
class ParsingResult(BaseModel):
success: bool
record_count: int
data: List[Dict[str, Any]]
errors: List[str]
@mcp.tool()
def parse_fixed_width_data(
raw_content: str,
columns: List[Dict[str, Any]]
) -> str:
"""
Parses COBOL-style fixed-width text data into JSON.
Args:
raw_content: The raw string content of the flat file.
columns: A list of dicts defining the schema. Each dict must have:
'name' (field name), 'width' (number of characters),
and optionally 'dtype' (int, float, str).
Example: [{"name": "ID", "width": 5, "dtype": "int"}, ...]
Returns:
JSON string containing the parsed records and any validation errors.
"""
try:
# Prepare column specs for Pandas
col_names = [c['name'] for c in columns]
col_widths = [c['width'] for c in columns]
col_dtypes = {c['name']: c.get('dtype', 'str') for c in columns}
# Use Pandas read_fwf for robust parsing
# We assume no header in the raw file (common in mainframe dumps)
df = pd.read_fwf(
io.StringIO(raw_content),
widths=col_widths,
header=None,
names=col_names,
dtype=str # Read as string first to handle validation manually/safely
)
records = []
errors = []
# Row-by-row validation and type conversion
for index, row in df.iterrows():
record = {}
row_valid = True
for col in columns:
field_name = col['name']
field_val = row[field_name]
target_type = col.get('dtype', 'str')
# Handle NaN/None from pandas
if pd.isna(field_val):
field_val = ""
try:
if target_type == 'int':
record[field_name] = int(field_val.strip() or 0)
elif target_type == 'float':
record[field_name] = float(field_val.strip() or 0.0)
else:
record[field_name] = str(field_val).strip()
except ValueError:
errors.append(f"Row {index+1}: Field '{field_name}' expected {target_type}, got '{field_val}'")
row_valid = False
if row_valid:
records.append(record)
result = ParsingResult(
success=len(errors) == 0,
record_count=len(records),
data=records,
errors=errors
)
return result.model_dump_json()
except Exception as e:
return json.dumps({
"success": False,
"error": f"Critical parsing failure: {str(e)}",
"data": []
})
if __name__ == "__main__":
# MANDATORY: Bind to 0.0.0.0 for Docker compatibility
mcp.run(transport='sse', host='0.0.0.0', port=8000)

We use a slim Python image to keep the container lightweight while ensuring pandas is available for data processing.

# Base image
FROM python:3.11-slim
# Set working directory
WORKDIR /app
# Install system dependencies if needed (usually none for this stack)
# RUN apt-get update && apt-get install -y ...
# Install Python dependencies
# fastmcp: The MCP server framework
# pandas: For efficient fixed-width parsing
# uvicorn: ASGI server required by fastmcp[sse]
RUN pip install --no-cache-dir fastmcp[sse] pandas uvicorn
# Copy application code
COPY server.py .
# Expose the port for Railway/Docker networking
EXPOSE 8000
# Run the server
CMD ["python", "server.py"]

Your CrewAI agent needs to connect to the SSE endpoint exposed by the Docker container. Below is the configuration pattern to register the MCP tool.

from crewai import Agent, Task, Crew
import os
# 1. Define the connection to the MCP Server
# If running via Docker Compose, use the service name (e.g., http://cobol-mcp:8000/sse)
# If running locally with Docker run -p 8000:8000, use localhost
mcp_sources = ["http://localhost:8000/sse"]
# 2. Define the Agent
legacy_data_specialist = Agent(
role='Mainframe Data Analyst',
goal='Convert raw legacy flat files into structured JSON for analysis',
backstory="You are an expert in COBOL copybooks and data migration.",
# CrewAI v0.100+ syntax for MCP integration
mcps=mcp_sources,
verbose=True
)
# 3. Define the Task
# Note: In a real scenario, the 'raw_content' might be read from a file tool first.
transform_task = Task(
description="""
I have a raw fixed-width string from a legacy payroll system.
The layout is:
- EMP_ID: 5 characters (Integer)
- NAME: 10 characters (String)
- SALARY: 8 characters (Float)
Here is the raw data:
00101JOHN DOE 005000.00
00102JANE ROE 007500.50
0010XBAD REC NOTNUMBR
Use the 'parse_fixed_width_data' tool to parse this.
Report back the valid JSON records and identify any rows that failed validation.
""",
expected_output="A summary of valid records in JSON format and a list of parsing errors.",
agent=legacy_data_specialist
)
# 4. Run the Crew
crew = Crew(
agents=[legacy_data_specialist],
tasks=[transform_task]
)
result = crew.kickoff()
print("### Transformation Result ###")
print(result)
  1. Build the Image:

    Terminal window
    docker build -t agentretrofit/cobol-transform .
  2. Run the Container:

    Terminal window
    docker run -p 8000:8000 agentretrofit/cobol-transform
  3. Validation: The server provides built-in type checking. If the COBOL file contains garbage data (common in old systems), the errors list in the response allows the Agent to decide whether to discard the row or flag it for human review.


  • Status: ✅ Verified
  • Environment: Python 3.11
  • Auditor: AgentRetrofit CI/CD

Transparency: This page may contain affiliate links.