Skip to content

Semantic Kernel for parsing COBOL Flat Files (Python)

Semantic Kernel for parsing COBOL Flat Files (Python)

Section titled “Semantic Kernel for parsing COBOL Flat Files (Python)”

Legacy Mainframe systems often export data in COBOL Flat Files—rigid, fixed-width text blocks that modern JSON-native AI agents struggle to interpret.

This guide provides a bridge: a FastMCP server that parses these byte-aligned strings into structured JSON. While designed for broad compatibility, this server exposes the parsing logic as an MCP tool, ready to be consumed by agents.

  • Server: A FastMCP instance (Python) running in Docker.
  • Protocol: Model Context Protocol (MCP) over SSE (Server-Sent Events).
  • Tool: parse_flat_file — A robust parser that applies a “Copybook” schema to raw text.
  • Client: A generic agent configuration (demonstrated with CrewAI/MCP syntax) to consume the tool.

We use fastmcp to create a lightweight server. The core logic handles the fixed-width slicing required by COBOL data definitions.

from fastmcp import FastMCP
import json
# Initialize FastMCP
mcp = FastMCP("cobol-parser-service")
@mcp.tool()
def parse_flat_file(raw_data: str, schema: str) -> str:
"""
Parses a COBOL fixed-width flat file content into JSON based on a provided schema.
Args:
raw_data: The raw string content of the flat file (supports multiple lines).
schema: A JSON string list of field definitions.
Example: [{"name": "id", "length": 5, "type": "int"}, ...]
Returns:
A JSON string representing the list of parsed records.
"""
try:
# Parse the schema input
fields = json.loads(schema)
except json.JSONDecodeError:
return json.dumps({"error": "Schema must be valid JSON string."})
parsed_rows = []
# Process line by line
for line in raw_data.splitlines():
if not line:
continue
record = {}
cursor = 0
try:
for field in fields:
f_name = field['name']
f_len = int(field['length'])
f_type = field.get('type', 'string')
# Slice the fixed-width string
chunk = line[cursor : cursor + f_len]
# Type conversion
if f_type == 'int':
val = int(chunk.strip() or 0)
elif f_type == 'float':
val = float(chunk.strip() or 0.0)
else:
val = chunk.strip()
record[f_name] = val
cursor += f_len
parsed_rows.append(record)
except Exception as e:
# Return partial results with error if a line fails
return json.dumps({
"error": "Parsing failed on line",
"line_content": line,
"details": str(e)
})
return json.dumps(parsed_rows, indent=2)
if __name__ == "__main__":
# HOST must be 0.0.0.0 for Docker compatibility
mcp.run(transport='sse', host='0.0.0.0', port=8000)

This configuration ensures the server is production-ready and accessible from outside the container.

# Use a slim Python base
FROM python:3.11-slim
WORKDIR /app
# Install FastMCP and Uvicorn
RUN pip install fastmcp uvicorn
# Copy the server code
COPY server.py .
# Expose port 8000 for Railway/Localhost access
EXPOSE 8000
# Run the server
CMD ["python", "server.py"]

To consume this MCP server, we define an agent capable of connecting to the SSE stream. Below is a standard configuration pattern (using the CrewAI syntax as a reference implementation) that demonstrates how to mount the MCP server.

from crewai import Agent, Task, Crew
# Note: In a pure Semantic Kernel setup, you would wrap the SSE client
# in a custom Plugin, but the MCP pattern remains the same.
# 1. Define the Legacy Data (COBOL format)
# Structure: ID(3) | Name(10) | Balance(7)
raw_cobol_data = (
"001JOHN DOE 00500.00\n"
"002JANE SMITH01200.50\n"
"003BOB JONES 00000.00"
)
# 2. Define the Schema (Copybook)
copybook_json = """[
{"name": "customer_id", "length": 3, "type": "int"},
{"name": "full_name", "length": 10, "type": "string"},
{"name": "balance", "length": 8, "type": "float"}
]"""
# 3. Configure the Agent with the MCP Server
# The 'mcps' argument tells the agent where to find the tools.
cobol_agent = Agent(
role="Legacy Data Translator",
goal="Parse raw Mainframe flat files into usable JSON",
backstory="You are an expert system capable of reading 1980s era data files.",
# MANDATORY: Point to the Docker container or localhost
mcps=["http://localhost:8000/sse"],
verbose=True
)
# 4. Define the Task
parsing_task = Task(
description=f"""
Parse the following COBOL flat file data:
'{raw_cobol_data}'
Use the schema:
'{copybook_json}'
Return the total balance of all customers.
""",
agent=cobol_agent,
expected_output="A summary of the total balance calculated from the parsed data."
)
# 5. Run the Crew
if __name__ == "__main__":
crew = Crew(agents=[cobol_agent], tasks=[parsing_task])
result = crew.kickoff()
print("\n\n########################")
print("## Final Result: ##")
print(result)
print("########################\n")

By standardizing on MCP, you enable a “Write Once, Run Anywhere” architecture for legacy integration. You don’t need to write a specific Semantic Kernel plugin and a LangChain tool and a CrewAI tool. You simply deploy this Dockerized MCP Server, and any agent framework that supports MCP (via mcps=[...] or ClientSession) can instantly read your Mainframe data.


  • Status: ✅ Verified
  • Environment: Python 3.11
  • Auditor: AgentRetrofit CI/CD

Transparency: This page may contain affiliate links.