CrewAI for COBOL Flat File data transformation and validation
CrewAI for COBOL Flat File Data Transformation
Section titled “CrewAI for COBOL Flat File Data Transformation”Legacy mainframes often export data in “Flat File” formats—rigid, fixed-width text files without delimiters like commas or tabs. These files rely on COBOL Copybooks to define which byte range corresponds to which field.
Modern AI Agents (like CrewAI) struggle with this format natively. They expect structured JSON, CSV, or XML. This guide provides an MCP (Model Context Protocol) bridge that allows your CrewAI agents to parse, validate, and transform raw COBOL fixed-width data into usable JSON structures.
🏗️ Architecture
Section titled “🏗️ Architecture”We will deploy a lightweight FastMCP server that acts as a transformation engine.
- Input: Raw fixed-width string data (simulating a file read from an FTP or mainframe dump).
- Processing: The MCP server applies a dynamic schema (start/end positions) to parse the bytes.
- Validation: It checks for data type integrity (e.g., ensuring numeric fields are actually numbers).
- Output: Clean JSON returned to the Agent’s context.
The Stack
Section titled “The Stack”- Framework: FastMCP (Python)
- Parsing: Pandas (via
read_fwflogic) - Transport: SSE (Server-Sent Events) over HTTP
- Agent: CrewAI
💻 Server Implementation
Section titled “💻 Server Implementation”This server provides a tool parse_fixed_width_data which takes the raw text and a schema definition. This allows the Agent to handle any flat file format as long as it knows the column specifications.
server.py
Section titled “server.py”from fastmcp import FastMCPfrom pydantic import BaseModel, Fieldfrom typing import List, Dict, Any, Optionalimport pandas as pdimport ioimport json
# Initialize FastMCPmcp = FastMCP("CobolTransformer")
class ColumnSpec(BaseModel): name: str width: int dtype: str = "str" # options: str, int, float
class ParsingResult(BaseModel): success: bool record_count: int data: List[Dict[str, Any]] errors: List[str]
@mcp.tool()def parse_fixed_width_data( raw_content: str, columns: List[Dict[str, Any]]) -> str: """ Parses COBOL-style fixed-width text data into JSON.
Args: raw_content: The raw string content of the flat file. columns: A list of dicts defining the schema. Each dict must have: 'name' (field name), 'width' (number of characters), and optionally 'dtype' (int, float, str). Example: [{"name": "ID", "width": 5, "dtype": "int"}, ...]
Returns: JSON string containing the parsed records and any validation errors. """ try: # Prepare column specs for Pandas col_names = [c['name'] for c in columns] col_widths = [c['width'] for c in columns] col_dtypes = {c['name']: c.get('dtype', 'str') for c in columns}
# Use Pandas read_fwf for robust parsing # We assume no header in the raw file (common in mainframe dumps) df = pd.read_fwf( io.StringIO(raw_content), widths=col_widths, header=None, names=col_names, dtype=str # Read as string first to handle validation manually/safely )
records = [] errors = []
# Row-by-row validation and type conversion for index, row in df.iterrows(): record = {} row_valid = True
for col in columns: field_name = col['name'] field_val = row[field_name] target_type = col.get('dtype', 'str')
# Handle NaN/None from pandas if pd.isna(field_val): field_val = ""
try: if target_type == 'int': record[field_name] = int(field_val.strip() or 0) elif target_type == 'float': record[field_name] = float(field_val.strip() or 0.0) else: record[field_name] = str(field_val).strip() except ValueError: errors.append(f"Row {index+1}: Field '{field_name}' expected {target_type}, got '{field_val}'") row_valid = False
if row_valid: records.append(record)
result = ParsingResult( success=len(errors) == 0, record_count=len(records), data=records, errors=errors )
return result.model_dump_json()
except Exception as e: return json.dumps({ "success": False, "error": f"Critical parsing failure: {str(e)}", "data": [] })
if __name__ == "__main__": # MANDATORY: Bind to 0.0.0.0 for Docker compatibility mcp.run(transport='sse', host='0.0.0.0', port=8000)🐳 Docker Configuration
Section titled “🐳 Docker Configuration”We use a slim Python image to keep the container lightweight while ensuring pandas is available for data processing.
Dockerfile
Section titled “Dockerfile”# Base imageFROM python:3.11-slim
# Set working directoryWORKDIR /app
# Install system dependencies if needed (usually none for this stack)# RUN apt-get update && apt-get install -y ...
# Install Python dependencies# fastmcp: The MCP server framework# pandas: For efficient fixed-width parsing# uvicorn: ASGI server required by fastmcp[sse]RUN pip install --no-cache-dir fastmcp[sse] pandas uvicorn
# Copy application codeCOPY server.py .
# Expose the port for Railway/Docker networkingEXPOSE 8000
# Run the serverCMD ["python", "server.py"]🔌 Connecting CrewAI
Section titled “🔌 Connecting CrewAI”Your CrewAI agent needs to connect to the SSE endpoint exposed by the Docker container. Below is the configuration pattern to register the MCP tool.
agent.py
Section titled “agent.py”from crewai import Agent, Task, Crewimport os
# 1. Define the connection to the MCP Server# If running via Docker Compose, use the service name (e.g., http://cobol-mcp:8000/sse)# If running locally with Docker run -p 8000:8000, use localhostmcp_sources = ["http://localhost:8000/sse"]
# 2. Define the Agentlegacy_data_specialist = Agent( role='Mainframe Data Analyst', goal='Convert raw legacy flat files into structured JSON for analysis', backstory="You are an expert in COBOL copybooks and data migration.", # CrewAI v0.100+ syntax for MCP integration mcps=mcp_sources, verbose=True)
# 3. Define the Task# Note: In a real scenario, the 'raw_content' might be read from a file tool first.transform_task = Task( description=""" I have a raw fixed-width string from a legacy payroll system. The layout is: - EMP_ID: 5 characters (Integer) - NAME: 10 characters (String) - SALARY: 8 characters (Float)
Here is the raw data: 00101JOHN DOE 005000.00 00102JANE ROE 007500.50 0010XBAD REC NOTNUMBR
Use the 'parse_fixed_width_data' tool to parse this. Report back the valid JSON records and identify any rows that failed validation. """, expected_output="A summary of valid records in JSON format and a list of parsing errors.", agent=legacy_data_specialist)
# 4. Run the Crewcrew = Crew( agents=[legacy_data_specialist], tasks=[transform_task])
result = crew.kickoff()print("### Transformation Result ###")print(result)🛠️ Deployment Notes
Section titled “🛠️ Deployment Notes”-
Build the Image:
Terminal window docker build -t agentretrofit/cobol-transform . -
Run the Container:
Terminal window docker run -p 8000:8000 agentretrofit/cobol-transform -
Validation: The server provides built-in type checking. If the COBOL file contains garbage data (common in old systems), the
errorslist in the response allows the Agent to decide whether to discard the row or flag it for human review.
🛡️ Quality Assurance
Section titled “🛡️ Quality Assurance”- Status: ✅ Verified
- Environment: Python 3.11
- Auditor: AgentRetrofit CI/CD
Transparency: This page may contain affiliate links.