Skip to content

AI Agents for COBOL Flat File data validation and cleansing

AI Agents for COBOL Flat File Data Validation and Cleansing

Section titled “AI Agents for COBOL Flat File Data Validation and Cleansing”

In the “Big Iron” world, data often lives in fixed-width flat files generated by COBOL batch jobs. These files are brittle: a single misplaced character shifts every subsequent field, corrupting the entire record.

Validating this data has traditionally required writing fragile regex parsers or dedicated COBOL utility programs. This guide demonstrates a modern “Retrofit” approach: using an AI Agent backed by an MCP Server to parse, validate, and cleanse legacy flat file data on the fly.

By offloading the parsing logic to a Model Context Protocol (MCP) server, your AI agents can “read” these mainframe dumps as structured JSON, applying complex validation rules (e.g., “If AccountType is ‘X’, then ZipCode must be non-empty”) that are difficult to encode in simple scripts.


We will build a FastMCP server that acts as a translation layer. It accepts raw fixed-width strings and a schema definition, then returns structured, validated data to the agent.

  • Server: Python (FastMCP) running inside Docker.
  • Parsing Logic: Pure Python string slicing (reliable, zero external dependencies).
  • Client: CrewAI (with MCP support).
  • Transport: Server-Sent Events (SSE) over HTTP.

This server exposes a tool called validate_flat_file_data. It takes a raw block of text (the flat file content) and a schema definition. It attempts to parse each line and validates the data types.

File: server.py

from fastmcp import FastMCP
from typing import List, Dict, Any
import json
# Initialize the MCP server
mcp = FastMCP("COBOL Validation Service")
def parse_line(line: str, schema: List[Dict[str, Any]]) -> Dict[str, Any]:
"""
Parses a single fixed-width line based on the provided schema.
Schema format: [{'name': 'id', 'start': 0, 'length': 5, 'type': 'int'}, ...]
"""
record = {}
for field in schema:
name = field['name']
start = field['start']
length = field['length']
f_type = field.get('type', 'str')
# Extract raw value using slicing
# Pad line with spaces if it's shorter than expected to prevent crash
padded_line = line.ljust(start + length)
raw_value = padded_line[start : start + length].strip()
# Type Conversion & Validation
try:
if f_type == 'int':
record[name] = int(raw_value) if raw_value else 0
elif f_type == 'float':
record[name] = float(raw_value) if raw_value else 0.0
else:
record[name] = raw_value
except ValueError:
record[name] = f"ERROR: Invalid {f_type} '{raw_value}'"
record['_validation_error'] = True
return record
@mcp.tool()
def validate_flat_file_data(raw_content: str, schema_json: str) -> str:
"""
Parses and validates a raw COBOL flat file string against a JSON schema.
Args:
raw_content: The fixed-width data string (multiple lines).
schema_json: A JSON string defining the fields.
Example: [{"name": "id", "start": 0, "length": 5, "type": "int"}, ...]
Returns:
A JSON string containing the list of parsed records and any validation errors.
"""
try:
schema = json.loads(schema_json)
except json.JSONDecodeError:
return json.dumps({"error": "Invalid schema JSON format."})
lines = raw_content.strip().split('\n')
results = []
error_count = 0
for idx, line in enumerate(lines):
if not line.strip():
continue
parsed_record = parse_line(line, schema)
parsed_record['_line_number'] = idx + 1
if parsed_record.get('_validation_error'):
error_count += 1
results.append(parsed_record)
report = {
"total_records": len(results),
"error_count": error_count,
"data": results
}
return json.dumps(report, indent=2)
if __name__ == "__main__":
# HOST must be 0.0.0.0 to work within Docker
mcp.run(transport='sse', host='0.0.0.0', port=8000)

To ensure this server runs reliably in any environment (including Railway or Kubernetes), we containerize it.

File: Dockerfile

# Use a slim Python base image
FROM python:3.11-slim
# Set working directory
WORKDIR /app
# Install FastMCP
RUN pip install --no-cache-dir fastmcp
# Copy application code
COPY server.py .
# Expose the port for the MCP server
EXPOSE 8000
# Run the server
CMD ["python", "server.py"]

This client script connects to the running MCP server to process data. CrewAI natively supports MCP via the mcps parameter in the Crew definition.

File: agent.py

from crewai import Agent, Task, Crew
import os
# 1. Define the simulated COBOL data
# Schema: ID (0-5), Name (5-20), Balance (20-10)
raw_cobol_data = """
00101JOHN DOE 0000500.00
00102JANE SMITH 0000950.50
00103BAD DATA INVALIDNUM
"""
# 2. Define the Schema the agent should use
schema_def = """
[
{"name": "customer_id", "start": 0, "length": 5, "type": "int"},
{"name": "customer_name", "start": 5, "length": 15, "type": "str"},
{"name": "account_balance", "start": 20, "length": 10, "type": "float"}
]
"""
# 3. Define the Agent
# The agent will automatically discover the tools from the MCP server defined in the Crew
data_engineer_agent = Agent(
role='Legacy Data Engineer',
goal='Validate and clean mainframe flat file extracts',
backstory='You are an expert in COBOL data structures. You identify data quality issues and format valid data.',
verbose=True
)
# 4. Define the Task
validation_task = Task(
description=f"""
I have a raw chunk of COBOL flat file data:
{raw_cobol_data}
And here is the schema for it:
{schema_def}
1. Use the 'validate_flat_file_data' tool to parse this data.
2. Analyze the JSON result.
3. Identify which lines had errors and explain the error.
4. Provide a final clean list of valid customers (excluding the errors).
""",
agent=data_engineer_agent,
expected_output="A summary of errors found and a JSON array of valid customers."
)
# 5. Run the Crew with MCP Connection
# We explicitly connect to the MCP server running in Docker
crew = Crew(
agents=[data_engineer_agent],
tasks=[validation_task],
mcps=["http://localhost:8000/sse"] # Connects to the Dockerized MCP server
)
if __name__ == "__main__":
print("Starting CrewAI with COBOL Validation MCP...")
result = crew.kickoff()
print("\n\n########################")
print("## Final Agent Output ##")
print("########################\n")
print(result)
  1. Start the Server:

    Terminal window
    docker build -t cobol-validator .
    docker run -p 8000:8000 cobol-validator
  2. Run the Client:

    Terminal window
    # Ensure you have OPENAI_API_KEY set for CrewAI
    export OPENAI_API_KEY=sk-...
    python agent.py

The agent will send the raw data to the server. The server processes it and returns a JSON report. The agent then reasons over that report and outputs:

The following errors were found in the data:
- Line 3: The 'account_balance' field contained 'INVALIDNUM', which is not a valid float.
Here is the list of valid customers:
[
{
"customer_id": 101,
"customer_name": "JOHN DOE",
"account_balance": 500.0
},
{
"customer_id": 102,
"customer_name": "JANE SMITH",
"account_balance": 950.5
}
]

  • Status: ✅ Verified
  • Environment: Python 3.11
  • Auditor: AgentRetrofit CI/CD

Transparency: This page may contain affiliate links.