Parsing SAP IDocs for OpenAI Operator Order Processing
Parsing SAP IDocs for OpenAI Operator Order Processing
Section titled “Parsing SAP IDocs for OpenAI Operator Order Processing”The “Big Iron” Barrier
Section titled “The “Big Iron” Barrier”SAP ECC systems communicate via IDocs (Intermediate Documents). These are dense, hierarchical XML structures often dating back to the 1990s. For modern AI Agents (like OpenAI Operator or CrewAI), raw IDocs are noisy and token-heavy.
This guide provides a FastMCP server that acts as a secure translation layer. It accepts raw SAP ORDERS05 XML, extracts business-critical fields (Order #, Material, Quantity), and returns a clean JSON object that Agents can easily reason about.
🏗️ Architecture
Section titled “🏗️ Architecture”flowchart LR
A[CrewAI Agent] -->|MCP Protocol via SSE| B(FastMCP Server)
B -->|Parse XML| C{SAP IDoc}
C -->|Extract Data| D[Clean JSON]
D -->|Return| A
📋 Prerequisites
Section titled “📋 Prerequisites”- Python 3.10+
- Docker
- CrewAI (for the client/agent)
🛠️ The Solution
Section titled “🛠️ The Solution”We will build a FastMCP server using the secure defusedxml library (to prevent XML bomb attacks) to parse the IDocs.
1. The Server Code (server.py)
Section titled “1. The Server Code (server.py)”This server exposes a single tool: parse_idoc. It strictly defines the output schema so the AI doesn’t have to guess the structure.
from fastmcp import FastMCPimport defusedxml.ElementTree as ET
# Initialize FastMCPmcp = FastMCP("SAP IDoc Parser")
# Ensure your container has network access (e.g. via NordLayer)# if you plan to fetch IDocs from a remote FTP or SAP instance later.
@mcp.tool()def parse_idoc(idoc_xml: str) -> dict: """ Parses a raw SAP ORDERS05 IDoc XML string into a clean JSON structure. Extracts header info and line items.
Args: idoc_xml (str): The raw XML content of the IDoc. """ try: # defusedxml is used for security against XML vulnerabilities root = ET.fromstring(idoc_xml)
parsed_data = { "status": "success", "metadata": {}, "header": {}, "items": [] }
# 1. Control Record (EDI_DC40) # Handles Sender/Receiver and Message Type control = root.find(".//EDI_DC40") if control is not None: parsed_data["metadata"] = { "doc_num": control.findtext("DOCNUM"), "msg_type": control.findtext("MESTYP"), "sender": control.findtext("SNDPRN") }
# 2. Header Data (E1EDK01) # Handles Currency, Org Data header = root.find(".//E1EDK01") if header is not None: parsed_data["header"] = { "currency": header.findtext("CURCY"), "sales_org": header.findtext("VKORG"), "dist_channel": header.findtext("VTWEG") }
# 3. Line Items (E1EDP01) # Iterates through all items to find Quantities and Materials for item in root.findall(".//E1EDP01"): line_item = { "item_number": item.findtext("POSEX"), "quantity": item.findtext("MENGE"), # SAP standard quantity field "unit": item.findtext("MENEE"), # Unit of measure "material": None }
# Material number is often buried in E1EDP19 (Object ID) # Qualifier 001 = Customer Material, 002 = Vendor Material for obj in item.findall("E1EDP19"): qual = obj.findtext("QUALF") if qual == "001": line_item["material"] = obj.findtext("IDTNR") break
parsed_data["items"].append(line_item)
return parsed_data
except Exception as e: return { "status": "error", "message": f"Failed to parse IDoc: {str(e)}" }
if __name__ == "__main__": # MANDATORY: Bind to 0.0.0.0 for Docker visibility mcp.run(transport='sse', host='0.0.0.0', port=8000)2. The Dockerfile
Section titled “2. The Dockerfile”This configuration ensures the environment is ready for deployment on platforms like Railway or AWS ECS.
# Use a slim Python baseFROM python:3.11-slim
# Set working directoryWORKDIR /app
# Install dependencies# fastmcp: The MCP server framework# defusedxml: Secure XML parsing library available on PyPIRUN pip install --no-cache-dir fastmcp defusedxml
# Copy server codeCOPY server.py .
# EXPOSE 8000 for Railway compatibilityEXPOSE 8000
# Run the serverCMD ["python", "server.py"]🔌 Client Connection (agent.py)
Section titled “🔌 Client Connection (agent.py)”This is where the magic happens. We configure a CrewAI Agent to connect to our running Docker container using the mcps parameter. This allows the agent to “see” the parse_idoc tool as if it were a native function.
Note: Ensure your Docker container is running (docker run -p 8000:8000 sap-idoc-parser) before running this script.
from crewai import Agent, Task, Crew, Process
# Sample IDoc XML for testingSAMPLE_XML = """<ORDERS05> <IDOC BEGIN="1"> <EDI_DC40 SEGMENT="1"> <DOCNUM>1000001</DOCNUM> <MESTYP>ORDERS</MESTYP> <SNDPRN>CUST_01</SNDPRN> </EDI_DC40> <E1EDK01 SEGMENT="1"> <CURCY>USD</CURCY> <VKORG>1000</VKORG> </E1EDK01> <E1EDP01 SEGMENT="1"> <POSEX>10</POSEX> <MENGE>50.000</MENGE> <MENEE>PCE</MENEE> <E1EDP19 SEGMENT="1"> <QUALF>001</QUALF> <IDTNR>WIDGET-2000</IDTNR> </E1EDP19> </E1EDP01> </IDOC></ORDERS05>"""
# 1. Define the Agent with MCP capabilitiessap_specialist = Agent( role='SAP Integration Specialist', goal='Extract order details from legacy IDoc formats', backstory='You are an expert in SAP EDI formats. You transform raw XML into clean business data.', verbose=True, # MANDATORY: Connect to the FastMCP server via SSE mcps=["http://localhost:8000/sse"])
# 2. Define the Taskparsing_task = Task( description=f"Parse the following SAP IDoc XML and summarize the order details (Material, Quantity, Doc Num): {SAMPLE_XML}", expected_output="A summary of the order containing the Document Number and a list of materials with quantities.", agent=sap_specialist)
# 3. Create the Crewcrew = Crew( agents=[sap_specialist], tasks=[parsing_task], process=Process.sequential)
# 4. Executeif __name__ == "__main__": print("🚀 Starting CrewAI Agent...") result = crew.kickoff() print("\n✅ Final Result:") print(result)🚀 Deployment & Testing
Section titled “🚀 Deployment & Testing”-
Build the Docker Image:
Terminal window docker build -t sap-idoc-parser . -
Run the Container:
Terminal window docker run -p 8000:8000 sap-idoc-parser -
Run the Agent:
Terminal window python agent.py
Troubleshooting
Section titled “Troubleshooting”- Connection Refused: Ensure the Docker container is running and mapped to port 8000.
- Missing Tool: If the agent says “I don’t know how to parse…”, check the
mcpsURL. It must end in/sse. - XML Errors: The
defusedxmllibrary is strict. Ensure the XML is well-formed.
🛡️ Quality Assurance
Section titled “🛡️ Quality Assurance”- Status: ✅ Verified
- Environment: Python 3.11
- Auditor: AgentRetrofit CI/CD
Transparency: This page may contain affiliate links.