Skip to content

Parsing SAP IDocs for OpenAI Operator Order Processing

Parsing SAP IDocs for OpenAI Operator Order Processing

Section titled “Parsing SAP IDocs for OpenAI Operator Order Processing”

SAP ECC systems communicate via IDocs (Intermediate Documents). These are dense, hierarchical XML structures often dating back to the 1990s. For modern AI Agents (like OpenAI Operator or CrewAI), raw IDocs are noisy and token-heavy.

This guide provides a FastMCP server that acts as a secure translation layer. It accepts raw SAP ORDERS05 XML, extracts business-critical fields (Order #, Material, Quantity), and returns a clean JSON object that Agents can easily reason about.


flowchart LR
    A[CrewAI Agent] -->|MCP Protocol via SSE| B(FastMCP Server)
    B -->|Parse XML| C{SAP IDoc}
    C -->|Extract Data| D[Clean JSON]
    D -->|Return| A
  • Python 3.10+
  • Docker
  • CrewAI (for the client/agent)

We will build a FastMCP server using the secure defusedxml library (to prevent XML bomb attacks) to parse the IDocs.

This server exposes a single tool: parse_idoc. It strictly defines the output schema so the AI doesn’t have to guess the structure.

from fastmcp import FastMCP
import defusedxml.ElementTree as ET
# Initialize FastMCP
mcp = FastMCP("SAP IDoc Parser")
# Ensure your container has network access (e.g. via NordLayer)
# if you plan to fetch IDocs from a remote FTP or SAP instance later.
@mcp.tool()
def parse_idoc(idoc_xml: str) -> dict:
"""
Parses a raw SAP ORDERS05 IDoc XML string into a clean JSON structure.
Extracts header info and line items.
Args:
idoc_xml (str): The raw XML content of the IDoc.
"""
try:
# defusedxml is used for security against XML vulnerabilities
root = ET.fromstring(idoc_xml)
parsed_data = {
"status": "success",
"metadata": {},
"header": {},
"items": []
}
# 1. Control Record (EDI_DC40)
# Handles Sender/Receiver and Message Type
control = root.find(".//EDI_DC40")
if control is not None:
parsed_data["metadata"] = {
"doc_num": control.findtext("DOCNUM"),
"msg_type": control.findtext("MESTYP"),
"sender": control.findtext("SNDPRN")
}
# 2. Header Data (E1EDK01)
# Handles Currency, Org Data
header = root.find(".//E1EDK01")
if header is not None:
parsed_data["header"] = {
"currency": header.findtext("CURCY"),
"sales_org": header.findtext("VKORG"),
"dist_channel": header.findtext("VTWEG")
}
# 3. Line Items (E1EDP01)
# Iterates through all items to find Quantities and Materials
for item in root.findall(".//E1EDP01"):
line_item = {
"item_number": item.findtext("POSEX"),
"quantity": item.findtext("MENGE"), # SAP standard quantity field
"unit": item.findtext("MENEE"), # Unit of measure
"material": None
}
# Material number is often buried in E1EDP19 (Object ID)
# Qualifier 001 = Customer Material, 002 = Vendor Material
for obj in item.findall("E1EDP19"):
qual = obj.findtext("QUALF")
if qual == "001":
line_item["material"] = obj.findtext("IDTNR")
break
parsed_data["items"].append(line_item)
return parsed_data
except Exception as e:
return {
"status": "error",
"message": f"Failed to parse IDoc: {str(e)}"
}
if __name__ == "__main__":
# MANDATORY: Bind to 0.0.0.0 for Docker visibility
mcp.run(transport='sse', host='0.0.0.0', port=8000)

This configuration ensures the environment is ready for deployment on platforms like Railway or AWS ECS.

# Use a slim Python base
FROM python:3.11-slim
# Set working directory
WORKDIR /app
# Install dependencies
# fastmcp: The MCP server framework
# defusedxml: Secure XML parsing library available on PyPI
RUN pip install --no-cache-dir fastmcp defusedxml
# Copy server code
COPY server.py .
# EXPOSE 8000 for Railway compatibility
EXPOSE 8000
# Run the server
CMD ["python", "server.py"]

This is where the magic happens. We configure a CrewAI Agent to connect to our running Docker container using the mcps parameter. This allows the agent to “see” the parse_idoc tool as if it were a native function.

Note: Ensure your Docker container is running (docker run -p 8000:8000 sap-idoc-parser) before running this script.

from crewai import Agent, Task, Crew, Process
# Sample IDoc XML for testing
SAMPLE_XML = """
<ORDERS05>
<IDOC BEGIN="1">
<EDI_DC40 SEGMENT="1">
<DOCNUM>1000001</DOCNUM>
<MESTYP>ORDERS</MESTYP>
<SNDPRN>CUST_01</SNDPRN>
</EDI_DC40>
<E1EDK01 SEGMENT="1">
<CURCY>USD</CURCY>
<VKORG>1000</VKORG>
</E1EDK01>
<E1EDP01 SEGMENT="1">
<POSEX>10</POSEX>
<MENGE>50.000</MENGE>
<MENEE>PCE</MENEE>
<E1EDP19 SEGMENT="1">
<QUALF>001</QUALF>
<IDTNR>WIDGET-2000</IDTNR>
</E1EDP19>
</E1EDP01>
</IDOC>
</ORDERS05>
"""
# 1. Define the Agent with MCP capabilities
sap_specialist = Agent(
role='SAP Integration Specialist',
goal='Extract order details from legacy IDoc formats',
backstory='You are an expert in SAP EDI formats. You transform raw XML into clean business data.',
verbose=True,
# MANDATORY: Connect to the FastMCP server via SSE
mcps=["http://localhost:8000/sse"]
)
# 2. Define the Task
parsing_task = Task(
description=f"Parse the following SAP IDoc XML and summarize the order details (Material, Quantity, Doc Num): {SAMPLE_XML}",
expected_output="A summary of the order containing the Document Number and a list of materials with quantities.",
agent=sap_specialist
)
# 3. Create the Crew
crew = Crew(
agents=[sap_specialist],
tasks=[parsing_task],
process=Process.sequential
)
# 4. Execute
if __name__ == "__main__":
print("🚀 Starting CrewAI Agent...")
result = crew.kickoff()
print("\n✅ Final Result:")
print(result)
  1. Build the Docker Image:

    Terminal window
    docker build -t sap-idoc-parser .
  2. Run the Container:

    Terminal window
    docker run -p 8000:8000 sap-idoc-parser
  3. Run the Agent:

    Terminal window
    python agent.py
  • Connection Refused: Ensure the Docker container is running and mapped to port 8000.
  • Missing Tool: If the agent says “I don’t know how to parse…”, check the mcps URL. It must end in /sse.
  • XML Errors: The defusedxml library is strict. Ensure the XML is well-formed.

  • Status: ✅ Verified
  • Environment: Python 3.11
  • Auditor: AgentRetrofit CI/CD

Transparency: This page may contain affiliate links.