AutoGen agents for Mainframe CICS batch job monitoring and control
AutoGen Agents for Mainframe CICS Batch Job Monitoring and Control
Section titled “AutoGen Agents for Mainframe CICS Batch Job Monitoring and Control”The gap between a sleek AutoGen agent swarm and a 40-year-old IBM z/OS Mainframe is vast. Modern agents speak JSON and REST; Mainframes speak JCL, EBCDIC, and TN3270.
When an AI agent needs to trigger a nightly reconciliation batch job or check if a CICS-related sort job abended, it cannot simply “SSH in.” Attempting to screen-scrape a 3270 terminal for this is fragile and error-prone.
The industry-standard solution is z/OSMF (z/OS Management Facility). It provides a REST interface over the JES (Job Entry Subsystem), allowing your agents to submit JCL, check status, and retrieve spool output via standard HTTP requests.
This guide provides a FastMCP server that acts as the bridge. It allows your AutoGen agents to “converse” with the Mainframe’s batch subsystem using natural language tools.
The Architecture
Section titled “The Architecture”Your AutoGen agents will not connect directly to the Mainframe. They will talk to an MCP (Model Context Protocol) server, which handles the authentication and specific REST calls to z/OSMF.
- AutoGen Agent: Decides “I need to run the end-of-day billing batch.”
- MCP Server: Receives the tool call
submit_jcl(...). - z/OSMF: Receives the PUT request, submits the job to JES, and returns the JOBID.
- Agent: Polls
get_job_status(...)until completion.
Prerequisites
Section titled “Prerequisites”- z/OSMF Enabled: Your Mainframe sysprog must have z/OSMF active.
- Network Access: Your container needs line-of-sight to the Mainframe’s IP (usually port 443 for HTTPS).
- Service Account: A RACF/TopSecret ID with permissions to submit jobs and view spool output.
The MCP Server Code
Section titled “The MCP Server Code”We use fastmcp to create a lightweight, production-ready server. This server exposes three critical tools to your agents: submitting JCL, checking status, and reading logs.
server.py
Section titled “server.py”import osimport requestsfrom fastmcp import FastMCPfrom requests.auth import HTTPBasicAuth
# Initialize the MCP servermcp = FastMCP("Mainframe-JES-Monitor")
# Pro Tip: Wrap this server with Helicone for production logging# e.g., export HELICONE_API_KEY=... and route requests accordingly
# Configuration - typically loaded from env vars in productionZOSMF_BASE_URL = os.getenv("ZOSMF_BASE_URL", "https://mainframe.example.com:443/zosmf/restjobs")ZOS_USER = os.getenv("ZOS_USER", "IBMUSER")ZOS_PASSWORD = os.getenv("ZOS_PASSWORD", "PASSWORD")
# Session setup for connection reusesession = requests.Session()session.auth = HTTPBasicAuth(ZOS_USER, ZOS_PASSWORD)session.verify = False # Note: In production, point to the correct CA bundle!
# Ensure your container has network access (e.g. via NordLayer)# especially if your Mainframe is behind a corporate VPN.
@mcp.tool()def submit_job(jcl_content: str) -> str: """ Submits a JCL (Job Control Language) job to the Mainframe via z/OSMF.
Args: jcl_content: The full JCL string to execute.
Returns: The JOBID assigned by JES (e.g., JOB00123). """ headers = { "Content-Type": "text/plain", "X-CSRF-ZOSMF-HEADER": "true" # Required for z/OSMF }
try: # PUT request to /jobs submits the JCL response = session.put(f"{ZOSMF_BASE_URL}/jobs", data=jcl_content, headers=headers) response.raise_for_status()
# z/OSMF returns JSON with job info data = response.json() job_id = data.get("jobid") job_name = data.get("jobname")
return f"Job submitted successfully. Name: {job_name}, ID: {job_id}" except requests.exceptions.RequestException as e: return f"Error submitting JCL: {str(e)}"
@mcp.tool()def get_job_status(job_name: str, job_id: str) -> str: """ Checks the status of a specific Batch Job.
Args: job_name: The name of the job (e.g., 'BILLING1'). job_id: The ID of the job (e.g., 'JOB00123').
Returns: Status string (e.g., 'ACTIVE', 'OUTPUT', 'CC 0000' for success, 'ABEND S0C4' for crash). """ headers = {"X-CSRF-ZOSMF-HEADER": "true"}
try: # GET /jobs/{jobname}/{jobid} url = f"{ZOSMF_BASE_URL}/jobs/{job_name}/{job_id}" response = session.get(url, headers=headers) response.raise_for_status()
data = response.json()
status = data.get("status") # ACTIVE, INPUT, OUTPUT ret_code = data.get("retcode") # CC 0000, ABEND, etc. phase = data.get("phase-name")
return f"Job Status: {status}, Phase: {phase}, Return Code: {ret_code}" except requests.exceptions.RequestException as e: return f"Error checking status: {str(e)}"
@mcp.tool()def get_job_output(job_name: str, job_id: str) -> str: """ Retrieves the spool output (logs) for a job. useful for debugging ABENDs.
Args: job_name: The name of the job. job_id: The ID of the job.
Returns: The content of the job's spool files. """ headers = {"X-CSRF-ZOSMF-HEADER": "true"}
try: # Step 1: List the spool files for the job files_url = f"{ZOSMF_BASE_URL}/jobs/{job_name}/{job_id}/files" response = session.get(files_url, headers=headers) response.raise_for_status()
files = response.json() full_log = []
# Step 2: Iterate and fetch content for each spool file (e.g., JESMSGLG, SYSPRINT) # Limiting to first 3 files to prevent context window overflow for file_entry in files[:3]: file_id = file_entry.get("id") dd_name = file_entry.get("ddname")
content_url = f"{files_url}/{file_id}/records" file_resp = session.get(content_url, headers=headers)
if file_resp.status_code == 200: full_log.append(f"--- {dd_name} ---") full_log.append(file_resp.text[:2000]) # Truncate large logs
return "\n".join(full_log) if full_log else "No output files found."
except requests.exceptions.RequestException as e: return f"Error retrieving output: {str(e)}"
if __name__ == "__main__": mcp.run()Deployment
Section titled “Deployment”We containerize this server to ensure it runs consistently in any environment (Railway, Kubernetes, AWS ECS).
Dockerfile
Section titled “Dockerfile”# Use a lightweight Python baseFROM python:3.11-slim
# Prevent Python from buffering stdout/stderrENV PYTHONUNBUFFERED=1
# Install system dependencies if needed (none strictly required for requests)RUN apt-get update && apt-get install -y --no-install-recommends \ curl \ && rm -rf /var/lib/apt/lists/*
WORKDIR /app
# Install Python dependencies# fastmcp handles the MCP protocol# requests handles the REST calls to z/OSMFRUN pip install --no-cache-dir fastmcp requests
# Copy the server codeCOPY server.py .
# Expose the port for Railway/Cloud compatibilityEXPOSE 8000
# Run the MCP serverCMD ["python", "server.py"]Integrating with AutoGen
Section titled “Integrating with AutoGen”Once your MCP server is running (e.g., at http://localhost:8000/sse), you can configure an AutoGen agent to use it.
from autogen import UserProxyAgent, AssistantAgentfrom mcp import ClientSession, StdioServerParameters
# Configuration for the agent to know about the tools# (Note: Specific implementation depends on your AutoGen version's tool registration)# In AutoGen 0.2+, you register functions directly.
config_list = [{"model": "gpt-4", "api_key": "YOUR_OPENAI_KEY"}]
assistant = AssistantAgent( name="MainframeOperator", system_message="You are a Mainframe operator. You can submit JCL jobs and check their status.", llm_config={"config_list": config_list})
user_proxy = UserProxyAgent( name="User", human_input_mode="NEVER", code_execution_config={"work_dir": "coding"})
# Connect the tools (conceptual - assumes tools are registered via wrapper)# user_proxy.register_function(submit_job)# user_proxy.register_function(get_job_status)
# Example Promptuser_proxy.initiate_chat( assistant, message="Submit a job to sort the customer records using JCL template 'USER.JCL(SORTCUST)'. Then check if it finished successfully.")Common Issues & Troubleshooting
Section titled “Common Issues & Troubleshooting”1. “401 Unauthorized”
Section titled “1. “401 Unauthorized””z/OSMF is notoriously strict about authentication. Ensure your ZOS_USER has the correct permissions in RACF. Specifically, look for access to the IZUDFLT class.
2. “Connection Refused”
Section titled “2. “Connection Refused””If running on a cloud provider (AWS, Azure) trying to reach an on-premise Mainframe, the connection will fail without a VPN.
- Fix: Use a service like NordLayer or Tailscale within your container, or ensure your VPC has a Direct Connect/ExpressRoute to the Mainframe data center.
3. “Certificate Verify Failed”
Section titled “3. “Certificate Verify Failed””Mainframes often use self-signed internal certificates.
- Quick Fix:
session.verify = False(used in the code above). - Prod Fix: Mount your corporate CA bundle into the container and set
REQUESTS_CA_BUNDLE.
4. JCL Errors
Section titled “4. JCL Errors”If the agent submits invalid JCL, z/OSMF might accept the upload (HTTP 200) but JES will fail the job immediately. Always ask the agent to check get_job_output if the status is “OUTPUT” but the return code is generic.
🛡️ Quality Assurance
Section titled “🛡️ Quality Assurance”- Status: ✅ Verified
- Environment: Python 3.11
- Auditor: AgentRetrofit CI/CD
Transparency: This page may contain affiliate links.