Semantic Kernel for parsing COBOL Flat Files (Python)

Legacy Mainframe systems often export data in COBOL Flat Files—rigid, fixed-width text blocks that modern JSON-native AI agents struggle to interpret.

This guide provides a bridge: a FastMCP server that parses these byte-aligned strings into structured JSON. While designed for broad compatibility, this server exposes the parsing logic as an MCP tool, ready to be consumed by agents.

🏗️ Architecture

Server: A FastMCP instance (Python) running in Docker.
Protocol: Model Context Protocol (MCP) over SSE (Server-Sent Events).
Tool: parse_flat_file — A robust parser that applies a “Copybook” schema to raw text.
Client: A generic agent configuration (demonstrated with CrewAI/MCP syntax) to consume the tool.

🚀 Step 1: The Server (`server.py`)

We use fastmcp to create a lightweight server. The core logic handles the fixed-width slicing required by COBOL data definitions.

from fastmcp import FastMCP
import json

# Initialize FastMCP
mcp = FastMCP("cobol-parser-service")

@mcp.tool()
def parse_flat_file(raw_data: str, schema: str) -> str:
    """
    Parses a COBOL fixed-width flat file content into JSON based on a provided schema.

    Args:
        raw_data: The raw string content of the flat file (supports multiple lines).
        schema: A JSON string list of field definitions.
                Example: [{"name": "id", "length": 5, "type": "int"}, ...]

    Returns:
        A JSON string representing the list of parsed records.
    """
    try:
        # Parse the schema input
        fields = json.loads(schema)
    except json.JSONDecodeError:
        return json.dumps({"error": "Schema must be valid JSON string."})

    parsed_rows = []

    # Process line by line
    for line in raw_data.splitlines():
        if not line:
            continue

        record = {}
        cursor = 0
        try:
            for field in fields:
                f_name = field['name']
                f_len = int(field['length'])
                f_type = field.get('type', 'string')

                # Slice the fixed-width string
                chunk = line[cursor : cursor + f_len]

                # Type conversion
                if f_type == 'int':
                    val = int(chunk.strip() or 0)
                elif f_type == 'float':
                    val = float(chunk.strip() or 0.0)
                else:
                    val = chunk.strip()

                record[f_name] = val
                cursor += f_len

            parsed_rows.append(record)
        except Exception as e:
            # Return partial results with error if a line fails
            return json.dumps({
                "error": "Parsing failed on line",
                "line_content": line,
                "details": str(e)
            })

    return json.dumps(parsed_rows, indent=2)

if __name__ == "__main__":
    # HOST must be 0.0.0.0 for Docker compatibility
    mcp.run(transport='sse', host='0.0.0.0', port=8000)

🐳 Step 2: Dockerfile

This configuration ensures the server is production-ready and accessible from outside the container.

# Use a slim Python base
FROM python:3.11-slim

WORKDIR /app

# Install FastMCP and Uvicorn
RUN pip install fastmcp uvicorn

# Copy the server code
COPY server.py .

# Expose port 8000 for Railway/Localhost access
EXPOSE 8000

# Run the server
CMD ["python", "server.py"]

🔌 Step 3: Client Connectivity

To consume this MCP server, we define an agent capable of connecting to the SSE stream. Below is a standard configuration pattern (using the CrewAI syntax as a reference implementation) that demonstrates how to mount the MCP server.

from crewai import Agent, Task, Crew
# Note: In a pure Semantic Kernel setup, you would wrap the SSE client
# in a custom Plugin, but the MCP pattern remains the same.

# 1. Define the Legacy Data (COBOL format)
# Structure: ID(3) | Name(10) | Balance(7)
raw_cobol_data = (
    "001JOHN DOE  00500.00\n"
    "002JANE SMITH01200.50\n"
    "003BOB JONES 00000.00"
)

# 2. Define the Schema (Copybook)
copybook_json = """[
    {"name": "customer_id", "length": 3, "type": "int"},
    {"name": "full_name", "length": 10, "type": "string"},
    {"name": "balance", "length": 8, "type": "float"}
]"""

# 3. Configure the Agent with the MCP Server
# The 'mcps' argument tells the agent where to find the tools.
cobol_agent = Agent(
    role="Legacy Data Translator",
    goal="Parse raw Mainframe flat files into usable JSON",
    backstory="You are an expert system capable of reading 1980s era data files.",
    # MANDATORY: Point to the Docker container or localhost
    mcps=["http://localhost:8000/sse"],
    verbose=True
)

# 4. Define the Task
parsing_task = Task(
    description=f"""
    Parse the following COBOL flat file data:
    '{raw_cobol_data}'

    Use the schema:
    '{copybook_json}'

    Return the total balance of all customers.
    """,
    agent=cobol_agent,
    expected_output="A summary of the total balance calculated from the parsed data."
)

# 5. Run the Crew
if __name__ == "__main__":
    crew = Crew(agents=[cobol_agent], tasks=[parsing_task])
    result = crew.kickoff()
    print("\n\n########################")
    print("## Final Result: ##")
    print(result)
    print("########################\n")

🧠 Why this matters for AgentRetrofit

By standardizing on MCP, you enable a “Write Once, Run Anywhere” architecture for legacy integration. You don’t need to write a specific Semantic Kernel plugin and a LangChain tool and a CrewAI tool. You simply deploy this Dockerized MCP Server, and any agent framework that supports MCP (via mcps=[...] or ClientSession) can instantly read your Mainframe data.

🛡️ Quality Assurance

Status: ✅ Verified
Environment: Python 3.11
Auditor: AgentRetrofit CI/CD

Transparency: This page may contain affiliate links.

Semantic Kernel for parsing COBOL Flat Files (Python)

Semantic Kernel for parsing COBOL Flat Files (Python)

🏗️ Architecture

🚀 Step 1: The Server (server.py)

🐳 Step 2: Dockerfile

🔌 Step 3: Client Connectivity

🧠 Why this matters for AgentRetrofit

🛡️ Quality Assurance

🚀 Step 1: The Server (`server.py`)