Automating IBM FileNet Content Retrieval via CMIS with CrewAI
Automating IBM FileNet Content Retrieval via CMIS with CrewAI
Section titled “Automating IBM FileNet Content Retrieval via CMIS with CrewAI”The enterprise world runs on documents, and for decades, IBM FileNet has been the vault where those documents live. However, modern AI agents like CrewAI struggle to access these “dark data” repositories because FileNet’s native APIs (P8/WSI) are notoriously complex SOAP-based protocols.
The solution? CMIS (Content Management Interoperability Services).
This guide provides a production-ready Model Context Protocol (MCP) server that bridges CrewAI agents to IBM FileNet using the CMIS standard. This allows your agents to search, retrieve, and analyze PDF contracts, invoices, or claims forms stored deep inside legacy ECM systems.
🏗️ Architecture
Section titled “🏗️ Architecture”We use FastMCP (Python) to spin up a lightweight API server. This server connects to IBM FileNet via the cmislib library. The CrewAI agent consumes this server via SSE (Server-Sent Events), giving it direct “tool” access to the document vault.
- CrewAI Agent: Requests a document (e.g., “Find the NDA for Acme Corp”).
- MCP Server: Translates the request into a CMIS SQL query.
- IBM FileNet: Returns the metadata and content stream.
- Agent Context: The document text is injected back into the agent’s context window.
🛠️ The Server Code (server.py)
Section titled “🛠️ The Server Code (server.py)”This server exposes two critical tools: search_documents and get_document_content.
Prerequisites:
pip install fastmcp cmislib- Ensure IBM FileNet CMIS is enabled (usually at
/fncmis/resources/Service).
import osimport iofrom fastmcp import FastMCPfrom cmislib.model import CmisClient
# Initialize the MCP Servermcp = FastMCP("FileNet-CMIS-Gateway")
# Configuration (Load from Environment Variables in Production)CMIS_URL = os.getenv("CMIS_URL", "http://filenet-server:9080/fncmis/resources/Service")CMIS_USER = os.getenv("CMIS_USER", "p8admin")CMIS_PASS = os.getenv("CMIS_PASS", "password123")REPO_ID = os.getenv("CMIS_REPO_ID", "OS1") # Object Store ID
def get_repo(): """Helper to authenticate and get the repository object.""" try: client = CmisClient(CMIS_URL, CMIS_USER, CMIS_PASS) repo = client.getRepository(REPO_ID) return repo except Exception as e: raise RuntimeError(f"Failed to connect to FileNet CMIS: {str(e)}")
@mcp.tool()def search_documents(query_text: str) -> str: """ Search for documents in IBM FileNet using a fuzzy text match on the document title. Returns a list of matching Document IDs and names.
Args: query_text: The title or keyword to search for (e.g., "Invoice 2024"). """ repo = get_repo()
# Construct CMIS Query Language (SQL-92 compliant) # cmis:document is the standard base type statement = ( f"SELECT cmis:objectId, cmis:name, cmis:creationDate " f"FROM cmis:document " f"WHERE cmis:name LIKE '%{query_text}%'" )
results = [] try: rs = repo.query(statement) for hit in rs: doc_id = hit.properties['cmis:objectId'] name = hit.properties['cmis:name'] date = hit.properties['cmis:creationDate'] results.append(f"ID: {doc_id} | Name: {name} | Date: {date}")
if not results: return "No documents found matching that query."
return "\n".join(results) except Exception as e: return f"Error executing CMIS query: {str(e)}"
@mcp.tool()def get_document_content(document_id: str) -> str: """ Retrieves the actual text content of a document by ID. Note: For binary files (PDFs), this returns a success message and metadata. In a real RAG pipeline, you would pipe the stream to an OCR/Parser.
Args: document_id: The unique UUID/GUID of the document (from search_documents). """ repo = get_repo()
try: doc = repo.getObject(document_id) name = doc.name mime_type = doc.properties['cmis:contentStreamMimeType']
# Get content stream stream = doc.getContentStream() content_bytes = stream.read()
# Simple handling for text files if "text/plain" in mime_type: return f"--- Content of {name} ---\n{content_bytes.decode('utf-8')}"
# For PDFs/Images, we return metadata and size # To handle PDFs, integrate 'pypdf' here to extract text. return ( f"Document '{name}' retrieved successfully.\n" f"Type: {mime_type}\n" f"Size: {len(content_bytes)} bytes.\n" f"Note: Binary content is ready for processing (OCR/Parsing needed for raw text)." )
except Exception as e: return f"Error retrieving document {document_id}: {str(e)}"
if __name__ == "__main__": # HOST must be 0.0.0.0 for Docker compatibility mcp.run(transport='sse', host='0.0.0.0', port=8000)🐳 Docker Deployment (Dockerfile)
Section titled “🐳 Docker Deployment (Dockerfile)”This Dockerfile ensures the service runs in an isolated environment and exposes the correct port for internal networking (Railway/Kubernetes).
# Use a slim Python baseFROM python:3.11-slim
# Set working directoryWORKDIR /app
# Install system dependencies if needed (e.g., for lxml)# RUN apt-get update && apt-get install -y libxml2-dev libxslt-dev
# Install Python dependencies# fastmcp for the server, cmislib for FileNet connectionRUN pip install fastmcp cmislib
# Copy application codeCOPY server.py .
# Environment variables (override these in your deployment dashboard)ENV CMIS_URL="http://host.docker.internal:9080/fncmis/resources/Service"ENV CMIS_REPO_ID="OS1"
# Expose port 8000 (Required for Railway/Generic Hosting)EXPOSE 8000
# Run the serverCMD ["python", "server.py"]🤖 Connecting CrewAI
Section titled “🤖 Connecting CrewAI”Once your Docker container is running (locally mapping port 8000:8000), you can connect your CrewAI agent. We use the mcps configuration to point the agent to the SSE endpoint.
agent.py
from crewai import Agent, Task, Crewfrom langchain_openai import ChatOpenAI
# 1. Define the connection to the running MCP Server# If running locally via Docker: http://localhost:8000/ssefilenet_mcp_source = { "url": "http://localhost:8000/sse", "method": "sse" # Server-Sent Events}
# 2. Create the Agentfilenet_archivist = Agent( role='Corporate Archivist', goal='Retrieve historical contracts from the IBM FileNet legacy system', backstory='You are an expert at navigating the company legacy ECM. You find documents by ID and extract their details.', llm=ChatOpenAI(model="gpt-4"), # This automatically loads 'search_documents' and 'get_document_content' tools mcps=[filenet_mcp_source], verbose=True)
# 3. Define a Taskfind_contract_task = Task( description=( "Search for the 'Vendor Agreement 2023' in the FileNet repository. " "Once found, retrieve its metadata/content and summarize the file size and type." ), expected_output="A summary of the found document including its ID and file properties.", agent=filenet_archivist)
# 4. Run the Crewcrew = Crew( agents=[filenet_archivist], tasks=[find_contract_task])
result = crew.kickoff()print("### Agent Result ###")print(result)Common Integration Issues
Section titled “Common Integration Issues”- CMIS Connection Refused: Ensure the
CMIS_URLis reachable from inside the Docker container. If FileNet is on localhost, usehost.docker.internalinstead oflocalhost. - Authentication: Legacy FileNet systems often use Basic Auth over HTTP. If you get SSL errors, you may need to disable SSL verification in
cmislib(not recommended for production) or mount the correct certificates. - Port Binding: If the Agent cannot find the MCP tools, ensure the server log shows
Running on http://0.0.0.0:8000. If it says127.0.0.1, the Docker container won’t expose it externally.
🛡️ Quality Assurance
Section titled “🛡️ Quality Assurance”- Status: ✅ Verified
- Environment: Python 3.11
- Auditor: AgentRetrofit CI/CD
Transparency: This page may contain affiliate links.