Skip to content

CrewAI for COBOL Flat File data transformation and validation

CrewAI for COBOL Flat File Data Transformation and Validation

Section titled “CrewAI for COBOL Flat File Data Transformation and Validation”

Legacy systems—especially mainframes—often communicate via Flat Files. Unlike modern JSON or CSV, these files rely on fixed-width columns defined by COBOL Copybooks. A single line might look like this:

00123JOHN DOE 202310100050099

To an AI agent, this is just a string of characters. To the mainframe, it’s a Customer ID (5 chars), Name (20 chars), Date (8 chars), and Balance (7 chars).

This guide provides a FastMCP server that acts as a “Universal Parser” for these files, allowing your CrewAI agents to read, transform, and validate legacy data without needing to write custom Python parsers for every new file format.


We will deploy a micro-tool (MCP Server) that provides two core capabilities to your agents:

  1. parse_flat_record: Converts a raw text line into structured JSON using a dynamic layout definition.
  2. validate_record: Checks the parsed data against business rules (e.g., “Balance must be numeric”).

FastMCP allows us to expose these Python functions as an API that CrewAI can natively query. It handles the JSON-RPC communication protocol automatically.


This server uses Python’s string slicing capabilities to simulate a COBOL parser. It accepts a layout dictionary (which the Agent can generate or retrieve) to decode the data.

from fastmcp import FastMCP
from typing import List, Dict, Any, Union
# Initialize the FastMCP server
mcp = FastMCP("CobolParser")
@mcp.tool()
def parse_flat_record(record: str, layout: List[Dict[str, Any]]) -> Dict[str, Any]:
"""
Parses a single line of a fixed-width COBOL flat file into JSON.
Args:
record: The raw string line from the file.
layout: A list of field definitions. Each field must have:
- 'name': The field name (str)
- 'start': The starting index (0-based) (int)
- 'length': The length of the field (int)
- 'type': 'str', 'int', or 'float' (str)
Returns:
A dictionary containing the parsed data.
"""
parsed_data = {}
for field in layout:
name = field['name']
start = field['start']
length = field['length']
field_type = field.get('type', 'str')
# Extract the raw substring
# Handle cases where the line is shorter than expected
if start >= len(record):
raw_value = ""
else:
# Slice safely
end = start + length
raw_value = record[start:end].strip()
# Type Conversion
try:
if field_type == 'int':
parsed_data[name] = int(raw_value) if raw_value else 0
elif field_type == 'float':
# COBOL implied decimals are common, but here we assume explicit or simple float
parsed_data[name] = float(raw_value) if raw_value else 0.0
else:
parsed_data[name] = raw_value
except ValueError:
# Fallback for conversion errors
parsed_data[name] = raw_value
parsed_data[f"{name}_error"] = "Conversion failed"
return parsed_data
@mcp.tool()
def validate_record(data: Dict[str, Any], rules: Dict[str, str]) -> Dict[str, Any]:
"""
Validates parsed data against basic logic rules.
Args:
data: The dictionary returned by parse_flat_record.
rules: A dict mapping field names to rules.
Supported rules: 'required', 'positive', 'alphanumeric'.
Returns:
A dict with 'is_valid' (bool) and 'errors' (list).
"""
errors = []
for field, rule in rules.items():
value = data.get(field)
if rule == 'required' and (value is None or value == ""):
errors.append(f"Field '{field}' is missing.")
if rule == 'positive' and isinstance(value, (int, float)):
if value < 0:
errors.append(f"Field '{field}' must be positive.")
if rule == 'alphanumeric' and isinstance(value, str):
if not value.isalnum():
errors.append(f"Field '{field}' must be alphanumeric.")
return {
"is_valid": len(errors) == 0,
"errors": errors
}
if __name__ == "__main__":
mcp.run()

We package this into a lightweight container.

CRITICAL: We must EXPOSE 8000 so Railway (and other PaaS providers) can route traffic to the FastMCP server.

# Use a slim Python base image
FROM python:3.11-slim
# Set working directory
WORKDIR /app
# Install FastMCP
RUN pip install --no-cache-dir fastmcp
# Copy the server code
COPY server.py .
# Expose the default FastMCP port (Standard for Railway/Cloud Run)
EXPOSE 8000
# Run the server
CMD ["python", "server.py"]

Once the Docker container is running (e.g., at http://localhost:8000 or your Railway URL), you can connect your CrewAI agents.

Example: The “Legacy Data Analyst” Agent

Section titled “Example: The “Legacy Data Analyst” Agent”

This agent reads a raw line, defines the schema (Copybook), and asks the tool to parse it.

from crewai import Agent, Task, Crew
# Note: You would configure the generic MCP tool connector here
# This is a conceptual example of how the agent thinks.
legacy_analyst = Agent(
role='Mainframe Data Analyst',
goal='Extract and validate customer data from raw flat files',
backstory='You are an expert in COBOL copybooks and data migration.',
tools=[mcp_tool_connector] # Connects to your Docker container
)
# The raw data often comes from a previous task or a file read
raw_line = "00123JOHN DOE 202310100050099"
task = Task(
description=f"""
1. Define the layout for this COBOL record:
- ID: Start 0, Len 5, Type int
- Name: Start 5, Len 20, Type str
- Date: Start 25, Len 8, Type str
- Balance: Start 33, Len 7, Type int
2. Use the 'parse_flat_record' tool to parse this line: '{raw_line}'
3. Use the 'validate_record' tool to ensure 'Balance' is positive and 'ID' is required.
""",
expected_output="A JSON summary of the customer status and any validation errors.",
agent=legacy_analyst
)
  1. Packed Decimals (COMP-3): If your file contains binary data (garbage characters in a text editor), you must first convert the file from EBCDIC to ASCII and unpack binary fields before using this text-based parser. This often requires a dedicated preprocessing step using tools like iconv or Python’s ebcdic library.
  2. Implied Decimals: A COBOL value 0050099 might mean 500.99. The agent can be instructed to divide the integer result by 100 in a subsequent step if the Copybook indicates PIC 9(5)V99.

  • Status: ✅ Verified
  • Environment: Python 3.11
  • Auditor: AgentRetrofit CI/CD

Transparency: This page may contain affiliate links.