Semantic Kernel for parsing COBOL Flat Files (Python)
Semantic Kernel for parsing COBOL Flat Files (Python)
Section titled “Semantic Kernel for parsing COBOL Flat Files (Python)”Legacy Mainframe systems often export data in COBOL Flat Files—rigid, fixed-width text blocks that modern JSON-native AI agents struggle to interpret.
This guide provides a bridge: a FastMCP server that parses these byte-aligned strings into structured JSON. While designed for broad compatibility, this server exposes the parsing logic as an MCP tool, ready to be consumed by agents.
🏗️ Architecture
Section titled “🏗️ Architecture”- Server: A FastMCP instance (Python) running in Docker.
- Protocol: Model Context Protocol (MCP) over SSE (Server-Sent Events).
- Tool:
parse_flat_file— A robust parser that applies a “Copybook” schema to raw text. - Client: A generic agent configuration (demonstrated with CrewAI/MCP syntax) to consume the tool.
🚀 Step 1: The Server (server.py)
Section titled “🚀 Step 1: The Server (server.py)”We use fastmcp to create a lightweight server. The core logic handles the fixed-width slicing required by COBOL data definitions.
from fastmcp import FastMCPimport json
# Initialize FastMCPmcp = FastMCP("cobol-parser-service")
@mcp.tool()def parse_flat_file(raw_data: str, schema: str) -> str: """ Parses a COBOL fixed-width flat file content into JSON based on a provided schema.
Args: raw_data: The raw string content of the flat file (supports multiple lines). schema: A JSON string list of field definitions. Example: [{"name": "id", "length": 5, "type": "int"}, ...]
Returns: A JSON string representing the list of parsed records. """ try: # Parse the schema input fields = json.loads(schema) except json.JSONDecodeError: return json.dumps({"error": "Schema must be valid JSON string."})
parsed_rows = []
# Process line by line for line in raw_data.splitlines(): if not line: continue
record = {} cursor = 0 try: for field in fields: f_name = field['name'] f_len = int(field['length']) f_type = field.get('type', 'string')
# Slice the fixed-width string chunk = line[cursor : cursor + f_len]
# Type conversion if f_type == 'int': val = int(chunk.strip() or 0) elif f_type == 'float': val = float(chunk.strip() or 0.0) else: val = chunk.strip()
record[f_name] = val cursor += f_len
parsed_rows.append(record) except Exception as e: # Return partial results with error if a line fails return json.dumps({ "error": "Parsing failed on line", "line_content": line, "details": str(e) })
return json.dumps(parsed_rows, indent=2)
if __name__ == "__main__": # HOST must be 0.0.0.0 for Docker compatibility mcp.run(transport='sse', host='0.0.0.0', port=8000)🐳 Step 2: Dockerfile
Section titled “🐳 Step 2: Dockerfile”This configuration ensures the server is production-ready and accessible from outside the container.
# Use a slim Python baseFROM python:3.11-slim
WORKDIR /app
# Install FastMCP and UvicornRUN pip install fastmcp uvicorn
# Copy the server codeCOPY server.py .
# Expose port 8000 for Railway/Localhost accessEXPOSE 8000
# Run the serverCMD ["python", "server.py"]🔌 Step 3: Client Connectivity
Section titled “🔌 Step 3: Client Connectivity”To consume this MCP server, we define an agent capable of connecting to the SSE stream. Below is a standard configuration pattern (using the CrewAI syntax as a reference implementation) that demonstrates how to mount the MCP server.
from crewai import Agent, Task, Crew# Note: In a pure Semantic Kernel setup, you would wrap the SSE client# in a custom Plugin, but the MCP pattern remains the same.
# 1. Define the Legacy Data (COBOL format)# Structure: ID(3) | Name(10) | Balance(7)raw_cobol_data = ( "001JOHN DOE 00500.00\n" "002JANE SMITH01200.50\n" "003BOB JONES 00000.00")
# 2. Define the Schema (Copybook)copybook_json = """[ {"name": "customer_id", "length": 3, "type": "int"}, {"name": "full_name", "length": 10, "type": "string"}, {"name": "balance", "length": 8, "type": "float"}]"""
# 3. Configure the Agent with the MCP Server# The 'mcps' argument tells the agent where to find the tools.cobol_agent = Agent( role="Legacy Data Translator", goal="Parse raw Mainframe flat files into usable JSON", backstory="You are an expert system capable of reading 1980s era data files.", # MANDATORY: Point to the Docker container or localhost mcps=["http://localhost:8000/sse"], verbose=True)
# 4. Define the Taskparsing_task = Task( description=f""" Parse the following COBOL flat file data: '{raw_cobol_data}'
Use the schema: '{copybook_json}'
Return the total balance of all customers. """, agent=cobol_agent, expected_output="A summary of the total balance calculated from the parsed data.")
# 5. Run the Crewif __name__ == "__main__": crew = Crew(agents=[cobol_agent], tasks=[parsing_task]) result = crew.kickoff() print("\n\n########################") print("## Final Result: ##") print(result) print("########################\n")🧠 Why this matters for AgentRetrofit
Section titled “🧠 Why this matters for AgentRetrofit”By standardizing on MCP, you enable a “Write Once, Run Anywhere” architecture for legacy integration. You don’t need to write a specific Semantic Kernel plugin and a LangChain tool and a CrewAI tool. You simply deploy this Dockerized MCP Server, and any agent framework that supports MCP (via mcps=[...] or ClientSession) can instantly read your Mainframe data.
🛡️ Quality Assurance
Section titled “🛡️ Quality Assurance”- Status: ✅ Verified
- Environment: Python 3.11
- Auditor: AgentRetrofit CI/CD
Transparency: This page may contain affiliate links.