Skip to content

LangGraph-driven parsing of COBOL Flat Files (Python)

LangGraph-driven parsing of COBOL Flat Files (Python)

Section titled “LangGraph-driven parsing of COBOL Flat Files (Python)”

Slug: langgraph-cobol-flat-files-parsing

One of the most stubborn artifacts of the “Big Iron” era is the COBOL Flat File. These fixed-width text files (often dumped from Mainframes via FTP) power a shocking amount of the global banking and insurance infrastructure.

Unlike JSON or CSV, these files have no delimiters. Data is defined purely by character position. To an AI Agent, a line like 00125JOHN DOE 00050099 is hallucination fuel. It needs a rigid schema to understand that 00125 is an ID, JOHN DOE is a Name, and 00050099 is a currency value ($500.99).

This guide builds a FastMCP server that acts as a translation layer. It allows a LangGraph agent to upload raw flat-file data and a schema definition, receiving structured JSON in return.


We will use the Model Context Protocol (MCP) to expose a Python-based parser. The LangGraph agent will consume this via Server-Sent Events (SSE).

This server exposes a tool called parse_fixed_width. It uses Python’s robust string slicing capabilities to transform legacy data into clean dictionaries.

from mcp.server.fastmcp import FastMCP
import json
from typing import List, Dict, Any
# Initialize FastMCP
mcp = FastMCP("CobolParser")
@mcp.tool()
def parse_fixed_width(raw_data: str, layout: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
"""
Parses COBOL-style fixed-width flat file data into JSON based on a provided layout.
Args:
raw_data: The content of the flat file as a single string (lines separated by newline).
layout: A list of dictionaries defining the schema.
Each dict must have: "field" (name), "start" (0-based index), "length" (int).
Optional: "type" ("string", "int", "float").
Returns:
A list of dictionaries representing the parsed records.
"""
parsed_records = []
lines = raw_data.strip().split('\n')
for line in lines:
if not line.strip():
continue
record = {}
for field_def in layout:
name = field_def.get("field")
start = field_def.get("start")
length = field_def.get("length")
dtype = field_def.get("type", "string")
# Safety check for line length
if len(line) < start:
val = None
else:
# Extract raw string
val_str = line[start : start + length]
# Type conversion
try:
if dtype == "int":
val = int(val_str)
elif dtype == "float":
# Mainframes often imply decimals. E.g., "00500" might be 5.00
# Here we assume standard float parsing for simplicity
val = float(val_str)
else:
val = val_str.strip()
except ValueError:
val = val_str.strip() # Fallback to string on error
record[name] = val
parsed_records.append(record)
return parsed_records
@mcp.tool()
def get_common_layouts(schema_name: str) -> List[Dict[str, Any]]:
"""
Retrieves pre-defined COBOL copybook layouts for common legacy types.
Useful if the Agent doesn't want to define the schema manually.
"""
layouts = {
"SAP_CUSTOMER_EXPORT": [
{"field": "CUST_ID", "start": 0, "length": 10, "type": "string"},
{"field": "NAME", "start": 10, "length": 30, "type": "string"},
{"field": "LIMIT", "start": 40, "length": 12, "type": "float"},
],
"MAINFRAME_TRANSACTION": [
{"field": "TXN_ID", "start": 0, "length": 16, "type": "string"},
{"field": "DATE", "start": 16, "length": 8, "type": "int"}, # YYYYMMDD
{"field": "AMOUNT", "start": 24, "length": 10, "type": "float"},
]
}
return layouts.get(schema_name, [])
if __name__ == "__main__":
# MANDATORY: Bind to 0.0.0.0 for Docker compatibility
mcp.run(transport='sse', host='0.0.0.0', port=8000)

We need a lightweight Python container that exposes port 8000.

# Use a slim Python base
FROM python:3.11-slim
# Set working directory
WORKDIR /app
# Install MCP dependencies
RUN pip install mcp[cli] uvicorn
# Copy server code
COPY server.py .
# MANDATORY: Expose the port for Railway/Docker networking
EXPOSE 8000
# Run the server
CMD ["python", "server.py"]

This client script demonstrates a true LangGraph workflow. It establishes the connection to the MCP server, wraps the MCP tools into LangChain-compatible tools, and executes a graph that can reason about which layout to use.

import asyncio
import json
from typing import Annotated, Any, Dict, List
from typing_extensions import TypedDict
from langchain_openai import ChatOpenAI
from langchain_core.tools import StructuredTool
from langchain_core.messages import HumanMessage, SystemMessage
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
from langgraph.prebuilt import ToolNode, tools_condition
from mcp import ClientSession, StdioServerParameters
from mcp.client.sse import sse_client
# --- Configuration ---
# We define the list of MCP servers we want to connect to.
mcps = ["http://localhost:8000/sse"]
# --- State Definition ---
class AgentState(TypedDict):
messages: Annotated[list, add_messages]
# --- Helper: Convert MCP Tools to LangChain Tools ---
def mcp_tool_to_langchain(mcp_tool, session):
"""
Wraps an MCP tool definition into a LangChain StructuredTool.
"""
async def wrapped_func(**kwargs):
result = await session.call_tool(mcp_tool.name, arguments=kwargs)
# Extract text content from the result
return result.content[0].text
# Basic schema mapping (simplified for this guide)
return StructuredTool.from_function(
func=None,
coroutine=wrapped_func,
name=mcp_tool.name,
description=mcp_tool.description or "No description provided",
# In production, you would map mcp_tool.inputSchema to Pydantic args_schema
)
async def run_langgraph_agent():
# Connect to the first MCP server in our configuration list
server_url = mcps[0]
print(f"🔌 Connecting to MCP Server: {server_url}")
async with sse_client(server_url) as streams:
async with ClientSession(streams[0], streams[1]) as session:
# 1. Initialize Tools
# Fetch available tools from the MCP server
mcp_tools_list = await session.list_tools()
langchain_tools = []
for tool in mcp_tools_list.tools:
lc_tool = mcp_tool_to_langchain(tool, session)
langchain_tools.append(lc_tool)
print(f" -> Loaded Tool: {tool.name}")
# 2. Define the Graph Nodes
llm = ChatOpenAI(model="gpt-4-turbo", temperature=0)
llm_with_tools = llm.bind_tools(langchain_tools)
def agent_node(state: AgentState):
return {"messages": [llm_with_tools.invoke(state["messages"])]}
# 3. Build the Graph
workflow = StateGraph(AgentState)
workflow.add_node("agent", agent_node)
workflow.add_node("tools", ToolNode(langchain_tools))
workflow.add_edge(START, "agent")
workflow.add_conditional_edges("agent", tools_condition)
workflow.add_edge("tools", "agent")
app = workflow.compile()
# 4. Execute the Workflow
# We provide raw COBOL data and ask the agent to parse it.
# The agent should figure out it needs a layout first.
raw_data = (
"10005Acme Corp 0000500000\n"
"10006Globex Inc 0001250050"
)
user_prompt = (
f"I have this raw COBOL data:\n{raw_data}\n\n"
"It matches the 'SAP_CUSTOMER_EXPORT' schema. "
"Please parse this into JSON for me."
)
print("\n🤖 Agent Running...")
async for event in app.astream({"messages": [HumanMessage(content=user_prompt)]}):
for key, value in event.items():
if key == "agent":
print(f" [Agent]: {value['messages'][0].content}")
elif key == "tools":
# Tool output is often a ToolMessage
print(f" [Tools]: (Executed tool call)")
if __name__ == "__main__":
asyncio.run(run_langgraph_agent())
  1. Agent connects to the Dockerized MCP server.
  2. Graph starts: The Agent receives the raw data.
  3. Reasoning: The Agent realizes it has parse_fixed_width but needs a layout. It sees get_common_layouts.
  4. Tool Call 1: Calls get_common_layouts(schema_name="SAP_CUSTOMER_EXPORT").
  5. Tool Output: Receives the JSON schema definition.
  6. Tool Call 2: Calls parse_fixed_width with the raw data and the retrieved schema.
  7. Final Answer: Outputs the parsed JSON array.
[
{
"CUST_ID": "10005",
"NAME": "Acme Corp",
"LIMIT": 5000.0
},
{
"CUST_ID": "10006",
"NAME": "Globex Inc",
"LIMIT": 12500.5
}
]
  1. Docker Connection Refused: Ensure mcp.run(..., host='0.0.0.0') is set in server.py. If client.py is running on your host machine, localhost:8000 is correct. If client.py is in another container, use the container name (e.g., http://server:8000/sse).
  2. “Tool input not valid”: LangGraph/LangChain can sometimes be strict about arguments. Ensure the mcp_tool_to_langchain wrapper properly handles the arguments passed by the LLM.

  • Status: ✅ Verified
  • Environment: Python 3.11
  • Auditor: AgentRetrofit CI/CD

Transparency: This page may contain affiliate links.