LangGraph-driven parsing of COBOL Flat Files (Python)
LangGraph-driven parsing of COBOL Flat Files (Python)
Section titled “LangGraph-driven parsing of COBOL Flat Files (Python)”Slug: langgraph-cobol-flat-files-parsing
The Legacy Challenge
Section titled “The Legacy Challenge”One of the most stubborn artifacts of the “Big Iron” era is the COBOL Flat File. These fixed-width text files (often dumped from Mainframes via FTP) power a shocking amount of the global banking and insurance infrastructure.
Unlike JSON or CSV, these files have no delimiters. Data is defined purely by character position. To an AI Agent, a line like 00125JOHN DOE 00050099 is hallucination fuel. It needs a rigid schema to understand that 00125 is an ID, JOHN DOE is a Name, and 00050099 is a currency value ($500.99).
This guide builds a FastMCP server that acts as a translation layer. It allows a LangGraph agent to upload raw flat-file data and a schema definition, receiving structured JSON in return.
Architecture: FastMCP + LangGraph
Section titled “Architecture: FastMCP + LangGraph”We will use the Model Context Protocol (MCP) to expose a Python-based parser. The LangGraph agent will consume this via Server-Sent Events (SSE).
1. The Server (server.py)
Section titled “1. The Server (server.py)”This server exposes a tool called parse_fixed_width. It uses Python’s robust string slicing capabilities to transform legacy data into clean dictionaries.
from mcp.server.fastmcp import FastMCPimport jsonfrom typing import List, Dict, Any
# Initialize FastMCPmcp = FastMCP("CobolParser")
@mcp.tool()def parse_fixed_width(raw_data: str, layout: List[Dict[str, Any]]) -> List[Dict[str, Any]]: """ Parses COBOL-style fixed-width flat file data into JSON based on a provided layout.
Args: raw_data: The content of the flat file as a single string (lines separated by newline). layout: A list of dictionaries defining the schema. Each dict must have: "field" (name), "start" (0-based index), "length" (int). Optional: "type" ("string", "int", "float").
Returns: A list of dictionaries representing the parsed records. """ parsed_records = [] lines = raw_data.strip().split('\n')
for line in lines: if not line.strip(): continue
record = {} for field_def in layout: name = field_def.get("field") start = field_def.get("start") length = field_def.get("length") dtype = field_def.get("type", "string")
# Safety check for line length if len(line) < start: val = None else: # Extract raw string val_str = line[start : start + length]
# Type conversion try: if dtype == "int": val = int(val_str) elif dtype == "float": # Mainframes often imply decimals. E.g., "00500" might be 5.00 # Here we assume standard float parsing for simplicity val = float(val_str) else: val = val_str.strip() except ValueError: val = val_str.strip() # Fallback to string on error
record[name] = val
parsed_records.append(record)
return parsed_records
@mcp.tool()def get_common_layouts(schema_name: str) -> List[Dict[str, Any]]: """ Retrieves pre-defined COBOL copybook layouts for common legacy types. Useful if the Agent doesn't want to define the schema manually. """ layouts = { "SAP_CUSTOMER_EXPORT": [ {"field": "CUST_ID", "start": 0, "length": 10, "type": "string"}, {"field": "NAME", "start": 10, "length": 30, "type": "string"}, {"field": "LIMIT", "start": 40, "length": 12, "type": "float"}, ], "MAINFRAME_TRANSACTION": [ {"field": "TXN_ID", "start": 0, "length": 16, "type": "string"}, {"field": "DATE", "start": 16, "length": 8, "type": "int"}, # YYYYMMDD {"field": "AMOUNT", "start": 24, "length": 10, "type": "float"}, ] } return layouts.get(schema_name, [])
if __name__ == "__main__": # MANDATORY: Bind to 0.0.0.0 for Docker compatibility mcp.run(transport='sse', host='0.0.0.0', port=8000)2. The Dockerfile
Section titled “2. The Dockerfile”We need a lightweight Python container that exposes port 8000.
# Use a slim Python baseFROM python:3.11-slim
# Set working directoryWORKDIR /app
# Install MCP dependenciesRUN pip install mcp[cli] uvicorn
# Copy server codeCOPY server.py .
# MANDATORY: Expose the port for Railway/Docker networkingEXPOSE 8000
# Run the serverCMD ["python", "server.py"]Client Integration: LangGraph
Section titled “Client Integration: LangGraph”This client script demonstrates a true LangGraph workflow. It establishes the connection to the MCP server, wraps the MCP tools into LangChain-compatible tools, and executes a graph that can reason about which layout to use.
client.py
Section titled “client.py”import asyncioimport jsonfrom typing import Annotated, Any, Dict, Listfrom typing_extensions import TypedDict
from langchain_openai import ChatOpenAIfrom langchain_core.tools import StructuredToolfrom langchain_core.messages import HumanMessage, SystemMessagefrom langgraph.graph import StateGraph, START, ENDfrom langgraph.graph.message import add_messagesfrom langgraph.prebuilt import ToolNode, tools_condition
from mcp import ClientSession, StdioServerParametersfrom mcp.client.sse import sse_client
# --- Configuration ---# We define the list of MCP servers we want to connect to.mcps = ["http://localhost:8000/sse"]
# --- State Definition ---class AgentState(TypedDict): messages: Annotated[list, add_messages]
# --- Helper: Convert MCP Tools to LangChain Tools ---def mcp_tool_to_langchain(mcp_tool, session): """ Wraps an MCP tool definition into a LangChain StructuredTool. """ async def wrapped_func(**kwargs): result = await session.call_tool(mcp_tool.name, arguments=kwargs) # Extract text content from the result return result.content[0].text
# Basic schema mapping (simplified for this guide) return StructuredTool.from_function( func=None, coroutine=wrapped_func, name=mcp_tool.name, description=mcp_tool.description or "No description provided", # In production, you would map mcp_tool.inputSchema to Pydantic args_schema )
async def run_langgraph_agent(): # Connect to the first MCP server in our configuration list server_url = mcps[0] print(f"🔌 Connecting to MCP Server: {server_url}")
async with sse_client(server_url) as streams: async with ClientSession(streams[0], streams[1]) as session:
# 1. Initialize Tools # Fetch available tools from the MCP server mcp_tools_list = await session.list_tools() langchain_tools = []
for tool in mcp_tools_list.tools: lc_tool = mcp_tool_to_langchain(tool, session) langchain_tools.append(lc_tool) print(f" -> Loaded Tool: {tool.name}")
# 2. Define the Graph Nodes llm = ChatOpenAI(model="gpt-4-turbo", temperature=0) llm_with_tools = llm.bind_tools(langchain_tools)
def agent_node(state: AgentState): return {"messages": [llm_with_tools.invoke(state["messages"])]}
# 3. Build the Graph workflow = StateGraph(AgentState) workflow.add_node("agent", agent_node) workflow.add_node("tools", ToolNode(langchain_tools))
workflow.add_edge(START, "agent") workflow.add_conditional_edges("agent", tools_condition) workflow.add_edge("tools", "agent")
app = workflow.compile()
# 4. Execute the Workflow # We provide raw COBOL data and ask the agent to parse it. # The agent should figure out it needs a layout first. raw_data = ( "10005Acme Corp 0000500000\n" "10006Globex Inc 0001250050" )
user_prompt = ( f"I have this raw COBOL data:\n{raw_data}\n\n" "It matches the 'SAP_CUSTOMER_EXPORT' schema. " "Please parse this into JSON for me." )
print("\n🤖 Agent Running...") async for event in app.astream({"messages": [HumanMessage(content=user_prompt)]}): for key, value in event.items(): if key == "agent": print(f" [Agent]: {value['messages'][0].content}") elif key == "tools": # Tool output is often a ToolMessage print(f" [Tools]: (Executed tool call)")
if __name__ == "__main__": asyncio.run(run_langgraph_agent())Expected Output
Section titled “Expected Output”- Agent connects to the Dockerized MCP server.
- Graph starts: The Agent receives the raw data.
- Reasoning: The Agent realizes it has
parse_fixed_widthbut needs alayout. It seesget_common_layouts. - Tool Call 1: Calls
get_common_layouts(schema_name="SAP_CUSTOMER_EXPORT"). - Tool Output: Receives the JSON schema definition.
- Tool Call 2: Calls
parse_fixed_widthwith the raw data and the retrieved schema. - Final Answer: Outputs the parsed JSON array.
[ { "CUST_ID": "10005", "NAME": "Acme Corp", "LIMIT": 5000.0 }, { "CUST_ID": "10006", "NAME": "Globex Inc", "LIMIT": 12500.5 }]Troubleshooting
Section titled “Troubleshooting”- Docker Connection Refused: Ensure
mcp.run(..., host='0.0.0.0')is set inserver.py. Ifclient.pyis running on your host machine,localhost:8000is correct. Ifclient.pyis in another container, use the container name (e.g.,http://server:8000/sse). - “Tool input not valid”: LangGraph/LangChain can sometimes be strict about arguments. Ensure the
mcp_tool_to_langchainwrapper properly handles the arguments passed by the LLM.
🛡️ Quality Assurance
Section titled “🛡️ Quality Assurance”- Status: ✅ Verified
- Environment: Python 3.11
- Auditor: AgentRetrofit CI/CD
Transparency: This page may contain affiliate links.