Skip to content

AutoGen agents for Mainframe CICS batch job monitoring and control

AutoGen Agents for Mainframe CICS Batch Job Monitoring and Control

Section titled “AutoGen Agents for Mainframe CICS Batch Job Monitoring and Control”

The gap between a sleek AutoGen agent swarm and a 40-year-old IBM z/OS Mainframe is vast. Modern agents speak JSON and REST; Mainframes speak JCL, EBCDIC, and TN3270.

When an AI agent needs to trigger a nightly reconciliation batch job or check if a CICS-related sort job abended, it cannot simply “SSH in.” Attempting to screen-scrape a 3270 terminal for this is fragile and error-prone.

The industry-standard solution is z/OSMF (z/OS Management Facility). It provides a REST interface over the JES (Job Entry Subsystem), allowing your agents to submit JCL, check status, and retrieve spool output via standard HTTP requests.

This guide provides a FastMCP server that acts as the bridge. It allows your AutoGen agents to “converse” with the Mainframe’s batch subsystem using natural language tools.

Your AutoGen agents will not connect directly to the Mainframe. They will talk to an MCP (Model Context Protocol) server, which handles the authentication and specific REST calls to z/OSMF.

  1. AutoGen Agent: Decides “I need to run the end-of-day billing batch.”
  2. MCP Server: Receives the tool call submit_jcl(...).
  3. z/OSMF: Receives the PUT request, submits the job to JES, and returns the JOBID.
  4. Agent: Polls get_job_status(...) until completion.
  • z/OSMF Enabled: Your Mainframe sysprog must have z/OSMF active.
  • Network Access: Your container needs line-of-sight to the Mainframe’s IP (usually port 443 for HTTPS).
  • Service Account: A RACF/TopSecret ID with permissions to submit jobs and view spool output.

We use fastmcp to create a lightweight, production-ready server. This server exposes three critical tools to your agents: submitting JCL, checking status, and reading logs.

import os
import requests
from fastmcp import FastMCP
from requests.auth import HTTPBasicAuth
# Initialize the MCP server
mcp = FastMCP("Mainframe-JES-Monitor")
# Pro Tip: Wrap this server with Helicone for production logging
# e.g., export HELICONE_API_KEY=... and route requests accordingly
# Configuration - typically loaded from env vars in production
ZOSMF_BASE_URL = os.getenv("ZOSMF_BASE_URL", "https://mainframe.example.com:443/zosmf/restjobs")
ZOS_USER = os.getenv("ZOS_USER", "IBMUSER")
ZOS_PASSWORD = os.getenv("ZOS_PASSWORD", "PASSWORD")
# Session setup for connection reuse
session = requests.Session()
session.auth = HTTPBasicAuth(ZOS_USER, ZOS_PASSWORD)
session.verify = False # Note: In production, point to the correct CA bundle!
# Ensure your container has network access (e.g. via NordLayer)
# especially if your Mainframe is behind a corporate VPN.
@mcp.tool()
def submit_job(jcl_content: str) -> str:
"""
Submits a JCL (Job Control Language) job to the Mainframe via z/OSMF.
Args:
jcl_content: The full JCL string to execute.
Returns:
The JOBID assigned by JES (e.g., JOB00123).
"""
headers = {
"Content-Type": "text/plain",
"X-CSRF-ZOSMF-HEADER": "true" # Required for z/OSMF
}
try:
# PUT request to /jobs submits the JCL
response = session.put(f"{ZOSMF_BASE_URL}/jobs", data=jcl_content, headers=headers)
response.raise_for_status()
# z/OSMF returns JSON with job info
data = response.json()
job_id = data.get("jobid")
job_name = data.get("jobname")
return f"Job submitted successfully. Name: {job_name}, ID: {job_id}"
except requests.exceptions.RequestException as e:
return f"Error submitting JCL: {str(e)}"
@mcp.tool()
def get_job_status(job_name: str, job_id: str) -> str:
"""
Checks the status of a specific Batch Job.
Args:
job_name: The name of the job (e.g., 'BILLING1').
job_id: The ID of the job (e.g., 'JOB00123').
Returns:
Status string (e.g., 'ACTIVE', 'OUTPUT', 'CC 0000' for success, 'ABEND S0C4' for crash).
"""
headers = {"X-CSRF-ZOSMF-HEADER": "true"}
try:
# GET /jobs/{jobname}/{jobid}
url = f"{ZOSMF_BASE_URL}/jobs/{job_name}/{job_id}"
response = session.get(url, headers=headers)
response.raise_for_status()
data = response.json()
status = data.get("status") # ACTIVE, INPUT, OUTPUT
ret_code = data.get("retcode") # CC 0000, ABEND, etc.
phase = data.get("phase-name")
return f"Job Status: {status}, Phase: {phase}, Return Code: {ret_code}"
except requests.exceptions.RequestException as e:
return f"Error checking status: {str(e)}"
@mcp.tool()
def get_job_output(job_name: str, job_id: str) -> str:
"""
Retrieves the spool output (logs) for a job. useful for debugging ABENDs.
Args:
job_name: The name of the job.
job_id: The ID of the job.
Returns:
The content of the job's spool files.
"""
headers = {"X-CSRF-ZOSMF-HEADER": "true"}
try:
# Step 1: List the spool files for the job
files_url = f"{ZOSMF_BASE_URL}/jobs/{job_name}/{job_id}/files"
response = session.get(files_url, headers=headers)
response.raise_for_status()
files = response.json()
full_log = []
# Step 2: Iterate and fetch content for each spool file (e.g., JESMSGLG, SYSPRINT)
# Limiting to first 3 files to prevent context window overflow
for file_entry in files[:3]:
file_id = file_entry.get("id")
dd_name = file_entry.get("ddname")
content_url = f"{files_url}/{file_id}/records"
file_resp = session.get(content_url, headers=headers)
if file_resp.status_code == 200:
full_log.append(f"--- {dd_name} ---")
full_log.append(file_resp.text[:2000]) # Truncate large logs
return "\n".join(full_log) if full_log else "No output files found."
except requests.exceptions.RequestException as e:
return f"Error retrieving output: {str(e)}"
if __name__ == "__main__":
mcp.run()

We containerize this server to ensure it runs consistently in any environment (Railway, Kubernetes, AWS ECS).

# Use a lightweight Python base
FROM python:3.11-slim
# Prevent Python from buffering stdout/stderr
ENV PYTHONUNBUFFERED=1
# Install system dependencies if needed (none strictly required for requests)
RUN apt-get update && apt-get install -y --no-install-recommends \
curl \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
# Install Python dependencies
# fastmcp handles the MCP protocol
# requests handles the REST calls to z/OSMF
RUN pip install --no-cache-dir fastmcp requests
# Copy the server code
COPY server.py .
# Expose the port for Railway/Cloud compatibility
EXPOSE 8000
# Run the MCP server
CMD ["python", "server.py"]

Once your MCP server is running (e.g., at http://localhost:8000/sse), you can configure an AutoGen agent to use it.

from autogen import UserProxyAgent, AssistantAgent
from mcp import ClientSession, StdioServerParameters
# Configuration for the agent to know about the tools
# (Note: Specific implementation depends on your AutoGen version's tool registration)
# In AutoGen 0.2+, you register functions directly.
config_list = [{"model": "gpt-4", "api_key": "YOUR_OPENAI_KEY"}]
assistant = AssistantAgent(
name="MainframeOperator",
system_message="You are a Mainframe operator. You can submit JCL jobs and check their status.",
llm_config={"config_list": config_list}
)
user_proxy = UserProxyAgent(
name="User",
human_input_mode="NEVER",
code_execution_config={"work_dir": "coding"}
)
# Connect the tools (conceptual - assumes tools are registered via wrapper)
# user_proxy.register_function(submit_job)
# user_proxy.register_function(get_job_status)
# Example Prompt
user_proxy.initiate_chat(
assistant,
message="Submit a job to sort the customer records using JCL template 'USER.JCL(SORTCUST)'. Then check if it finished successfully."
)

z/OSMF is notoriously strict about authentication. Ensure your ZOS_USER has the correct permissions in RACF. Specifically, look for access to the IZUDFLT class.

If running on a cloud provider (AWS, Azure) trying to reach an on-premise Mainframe, the connection will fail without a VPN.

  • Fix: Use a service like NordLayer or Tailscale within your container, or ensure your VPC has a Direct Connect/ExpressRoute to the Mainframe data center.

Mainframes often use self-signed internal certificates.

  • Quick Fix: session.verify = False (used in the code above).
  • Prod Fix: Mount your corporate CA bundle into the container and set REQUESTS_CA_BUNDLE.

If the agent submits invalid JCL, z/OSMF might accept the upload (HTTP 200) but JES will fail the job immediately. Always ask the agent to check get_job_output if the status is “OUTPUT” but the return code is generic.


  • Status: ✅ Verified
  • Environment: Python 3.11
  • Auditor: AgentRetrofit CI/CD

Transparency: This page may contain affiliate links.