AI Agents for COBOL Flat File data validation and cleansing
AI Agents for COBOL Flat File Data Validation and Cleansing
Section titled “AI Agents for COBOL Flat File Data Validation and Cleansing”In the “Big Iron” world, data often lives in fixed-width flat files generated by COBOL batch jobs. These files are brittle: a single misplaced character shifts every subsequent field, corrupting the entire record.
Validating this data has traditionally required writing fragile regex parsers or dedicated COBOL utility programs. This guide demonstrates a modern “Retrofit” approach: using an AI Agent backed by an MCP Server to parse, validate, and cleanse legacy flat file data on the fly.
By offloading the parsing logic to a Model Context Protocol (MCP) server, your AI agents can “read” these mainframe dumps as structured JSON, applying complex validation rules (e.g., “If AccountType is ‘X’, then ZipCode must be non-empty”) that are difficult to encode in simple scripts.
Architectural Overview
Section titled “Architectural Overview”We will build a FastMCP server that acts as a translation layer. It accepts raw fixed-width strings and a schema definition, then returns structured, validated data to the agent.
The Stack
Section titled “The Stack”- Server: Python (FastMCP) running inside Docker.
- Parsing Logic: Pure Python string slicing (reliable, zero external dependencies).
- Client: CrewAI (with MCP support).
- Transport: Server-Sent Events (SSE) over HTTP.
1. The MCP Server (server.py)
Section titled “1. The MCP Server (server.py)”This server exposes a tool called validate_flat_file_data. It takes a raw block of text (the flat file content) and a schema definition. It attempts to parse each line and validates the data types.
File: server.py
from fastmcp import FastMCPfrom typing import List, Dict, Anyimport json
# Initialize the MCP servermcp = FastMCP("COBOL Validation Service")
def parse_line(line: str, schema: List[Dict[str, Any]]) -> Dict[str, Any]: """ Parses a single fixed-width line based on the provided schema. Schema format: [{'name': 'id', 'start': 0, 'length': 5, 'type': 'int'}, ...] """ record = {} for field in schema: name = field['name'] start = field['start'] length = field['length'] f_type = field.get('type', 'str')
# Extract raw value using slicing # Pad line with spaces if it's shorter than expected to prevent crash padded_line = line.ljust(start + length) raw_value = padded_line[start : start + length].strip()
# Type Conversion & Validation try: if f_type == 'int': record[name] = int(raw_value) if raw_value else 0 elif f_type == 'float': record[name] = float(raw_value) if raw_value else 0.0 else: record[name] = raw_value except ValueError: record[name] = f"ERROR: Invalid {f_type} '{raw_value}'" record['_validation_error'] = True
return record
@mcp.tool()def validate_flat_file_data(raw_content: str, schema_json: str) -> str: """ Parses and validates a raw COBOL flat file string against a JSON schema.
Args: raw_content: The fixed-width data string (multiple lines). schema_json: A JSON string defining the fields. Example: [{"name": "id", "start": 0, "length": 5, "type": "int"}, ...]
Returns: A JSON string containing the list of parsed records and any validation errors. """ try: schema = json.loads(schema_json) except json.JSONDecodeError: return json.dumps({"error": "Invalid schema JSON format."})
lines = raw_content.strip().split('\n') results = [] error_count = 0
for idx, line in enumerate(lines): if not line.strip(): continue
parsed_record = parse_line(line, schema) parsed_record['_line_number'] = idx + 1
if parsed_record.get('_validation_error'): error_count += 1
results.append(parsed_record)
report = { "total_records": len(results), "error_count": error_count, "data": results }
return json.dumps(report, indent=2)
if __name__ == "__main__": # HOST must be 0.0.0.0 to work within Docker mcp.run(transport='sse', host='0.0.0.0', port=8000)2. Docker Configuration
Section titled “2. Docker Configuration”To ensure this server runs reliably in any environment (including Railway or Kubernetes), we containerize it.
File: Dockerfile
# Use a slim Python base imageFROM python:3.11-slim
# Set working directoryWORKDIR /app
# Install FastMCPRUN pip install --no-cache-dir fastmcp
# Copy application codeCOPY server.py .
# Expose the port for the MCP serverEXPOSE 8000
# Run the serverCMD ["python", "server.py"]3. Client Integration (CrewAI)
Section titled “3. Client Integration (CrewAI)”This client script connects to the running MCP server to process data. CrewAI natively supports MCP via the mcps parameter in the Crew definition.
File: agent.py
from crewai import Agent, Task, Crewimport os
# 1. Define the simulated COBOL data# Schema: ID (0-5), Name (5-20), Balance (20-10)raw_cobol_data = """00101JOHN DOE 0000500.0000102JANE SMITH 0000950.5000103BAD DATA INVALIDNUM"""
# 2. Define the Schema the agent should useschema_def = """[ {"name": "customer_id", "start": 0, "length": 5, "type": "int"}, {"name": "customer_name", "start": 5, "length": 15, "type": "str"}, {"name": "account_balance", "start": 20, "length": 10, "type": "float"}]"""
# 3. Define the Agent# The agent will automatically discover the tools from the MCP server defined in the Crewdata_engineer_agent = Agent( role='Legacy Data Engineer', goal='Validate and clean mainframe flat file extracts', backstory='You are an expert in COBOL data structures. You identify data quality issues and format valid data.', verbose=True)
# 4. Define the Taskvalidation_task = Task( description=f""" I have a raw chunk of COBOL flat file data: {raw_cobol_data}
And here is the schema for it: {schema_def}
1. Use the 'validate_flat_file_data' tool to parse this data. 2. Analyze the JSON result. 3. Identify which lines had errors and explain the error. 4. Provide a final clean list of valid customers (excluding the errors). """, agent=data_engineer_agent, expected_output="A summary of errors found and a JSON array of valid customers.")
# 5. Run the Crew with MCP Connection# We explicitly connect to the MCP server running in Dockercrew = Crew( agents=[data_engineer_agent], tasks=[validation_task], mcps=["http://localhost:8000/sse"] # Connects to the Dockerized MCP server)
if __name__ == "__main__": print("Starting CrewAI with COBOL Validation MCP...") result = crew.kickoff() print("\n\n########################") print("## Final Agent Output ##") print("########################\n") print(result)How to Run
Section titled “How to Run”-
Start the Server:
Terminal window docker build -t cobol-validator .docker run -p 8000:8000 cobol-validator -
Run the Client:
Terminal window # Ensure you have OPENAI_API_KEY set for CrewAIexport OPENAI_API_KEY=sk-...python agent.py
Expected Output
Section titled “Expected Output”The agent will send the raw data to the server. The server processes it and returns a JSON report. The agent then reasons over that report and outputs:
The following errors were found in the data:- Line 3: The 'account_balance' field contained 'INVALIDNUM', which is not a valid float.
Here is the list of valid customers:[ { "customer_id": 101, "customer_name": "JOHN DOE", "account_balance": 500.0 }, { "customer_id": 102, "customer_name": "JANE SMITH", "account_balance": 950.5 }]🛡️ Quality Assurance
Section titled “🛡️ Quality Assurance”- Status: ✅ Verified
- Environment: Python 3.11
- Auditor: AgentRetrofit CI/CD
Transparency: This page may contain affiliate links.