Automating IBM FileNet Content Retrieval via CMIS with CrewAI

The enterprise world runs on documents, and for decades, IBM FileNet has been the vault where those documents live. However, modern AI agents like CrewAI struggle to access these “dark data” repositories because FileNet’s native APIs (P8/WSI) are notoriously complex SOAP-based protocols.

The solution? CMIS (Content Management Interoperability Services).

This guide provides a production-ready Model Context Protocol (MCP) server that bridges CrewAI agents to IBM FileNet using the CMIS standard. This allows your agents to search, retrieve, and analyze PDF contracts, invoices, or claims forms stored deep inside legacy ECM systems.

🏗️ Architecture

We use FastMCP (Python) to spin up a lightweight API server. This server connects to IBM FileNet via the cmislib library. The CrewAI agent consumes this server via SSE (Server-Sent Events), giving it direct “tool” access to the document vault.

CrewAI Agent: Requests a document (e.g., “Find the NDA for Acme Corp”).
MCP Server: Translates the request into a CMIS SQL query.
IBM FileNet: Returns the metadata and content stream.
Agent Context: The document text is injected back into the agent’s context window.

🛠️ The Server Code (`server.py`)

This server exposes two critical tools: search_documents and get_document_content.

Prerequisites:

pip install fastmcp cmislib
Ensure IBM FileNet CMIS is enabled (usually at /fncmis/resources/Service).

import os
import io
from fastmcp import FastMCP
from cmislib.model import CmisClient

# Initialize the MCP Server
mcp = FastMCP("FileNet-CMIS-Gateway")

# Configuration (Load from Environment Variables in Production)
CMIS_URL = os.getenv("CMIS_URL", "http://filenet-server:9080/fncmis/resources/Service")
CMIS_USER = os.getenv("CMIS_USER", "p8admin")
CMIS_PASS = os.getenv("CMIS_PASS", "password123")
REPO_ID = os.getenv("CMIS_REPO_ID", "OS1") # Object Store ID

def get_repo():
    """Helper to authenticate and get the repository object."""
    try:
        client = CmisClient(CMIS_URL, CMIS_USER, CMIS_PASS)
        repo = client.getRepository(REPO_ID)
        return repo
    except Exception as e:
        raise RuntimeError(f"Failed to connect to FileNet CMIS: {str(e)}")

@mcp.tool()
def search_documents(query_text: str) -> str:
    """
    Search for documents in IBM FileNet using a fuzzy text match on the document title.
    Returns a list of matching Document IDs and names.

    Args:
        query_text: The title or keyword to search for (e.g., "Invoice 2024").
    """
    repo = get_repo()

    # Construct CMIS Query Language (SQL-92 compliant)
    # cmis:document is the standard base type
    statement = (
        f"SELECT cmis:objectId, cmis:name, cmis:creationDate "
        f"FROM cmis:document "
        f"WHERE cmis:name LIKE '%{query_text}%'"
    )

    results = []
    try:
        rs = repo.query(statement)
        for hit in rs:
            doc_id = hit.properties['cmis:objectId']
            name = hit.properties['cmis:name']
            date = hit.properties['cmis:creationDate']
            results.append(f"ID: {doc_id} | Name: {name} | Date: {date}")

        if not results:
            return "No documents found matching that query."

        return "\n".join(results)
    except Exception as e:
        return f"Error executing CMIS query: {str(e)}"

@mcp.tool()
def get_document_content(document_id: str) -> str:
    """
    Retrieves the actual text content of a document by ID.
    Note: For binary files (PDFs), this returns a success message and metadata.
    In a real RAG pipeline, you would pipe the stream to an OCR/Parser.

    Args:
        document_id: The unique UUID/GUID of the document (from search_documents).
    """
    repo = get_repo()

    try:
        doc = repo.getObject(document_id)
        name = doc.name
        mime_type = doc.properties['cmis:contentStreamMimeType']

        # Get content stream
        stream = doc.getContentStream()
        content_bytes = stream.read()

        # Simple handling for text files
        if "text/plain" in mime_type:
            return f"--- Content of {name} ---\n{content_bytes.decode('utf-8')}"

        # For PDFs/Images, we return metadata and size
        # To handle PDFs, integrate 'pypdf' here to extract text.
        return (
            f"Document '{name}' retrieved successfully.\n"
            f"Type: {mime_type}\n"
            f"Size: {len(content_bytes)} bytes.\n"
            f"Note: Binary content is ready for processing (OCR/Parsing needed for raw text)."
        )

    except Exception as e:
        return f"Error retrieving document {document_id}: {str(e)}"

if __name__ == "__main__":
    # HOST must be 0.0.0.0 for Docker compatibility
    mcp.run(transport='sse', host='0.0.0.0', port=8000)

🐳 Docker Deployment (`Dockerfile`)

This Dockerfile ensures the service runs in an isolated environment and exposes the correct port for internal networking (Railway/Kubernetes).

# Use a slim Python base
FROM python:3.11-slim

# Set working directory
WORKDIR /app

# Install system dependencies if needed (e.g., for lxml)
# RUN apt-get update && apt-get install -y libxml2-dev libxslt-dev

# Install Python dependencies
# fastmcp for the server, cmislib for FileNet connection
RUN pip install fastmcp cmislib

# Copy application code
COPY server.py .

# Environment variables (override these in your deployment dashboard)
ENV CMIS_URL="http://host.docker.internal:9080/fncmis/resources/Service"
ENV CMIS_REPO_ID="OS1"

# Expose port 8000 (Required for Railway/Generic Hosting)
EXPOSE 8000

# Run the server
CMD ["python", "server.py"]

🤖 Connecting CrewAI

Once your Docker container is running (locally mapping port 8000:8000), you can connect your CrewAI agent. We use the mcps configuration to point the agent to the SSE endpoint.

agent.py

from crewai import Agent, Task, Crew
from langchain_openai import ChatOpenAI

# 1. Define the connection to the running MCP Server
# If running locally via Docker: http://localhost:8000/sse
filenet_mcp_source = {
    "url": "http://localhost:8000/sse",
    "method": "sse" # Server-Sent Events
}

# 2. Create the Agent
filenet_archivist = Agent(
    role='Corporate Archivist',
    goal='Retrieve historical contracts from the IBM FileNet legacy system',
    backstory='You are an expert at navigating the company legacy ECM. You find documents by ID and extract their details.',
    llm=ChatOpenAI(model="gpt-4"),
    # This automatically loads 'search_documents' and 'get_document_content' tools
    mcps=[filenet_mcp_source],
    verbose=True
)

# 3. Define a Task
find_contract_task = Task(
    description=(
        "Search for the 'Vendor Agreement 2023' in the FileNet repository. "
        "Once found, retrieve its metadata/content and summarize the file size and type."
    ),
    expected_output="A summary of the found document including its ID and file properties.",
    agent=filenet_archivist
)

# 4. Run the Crew
crew = Crew(
    agents=[filenet_archivist],
    tasks=[find_contract_task]
)

result = crew.kickoff()
print("### Agent Result ###")
print(result)

Common Integration Issues

CMIS Connection Refused: Ensure the CMIS_URL is reachable from inside the Docker container. If FileNet is on localhost, use host.docker.internal instead of localhost.
Authentication: Legacy FileNet systems often use Basic Auth over HTTP. If you get SSL errors, you may need to disable SSL verification in cmislib (not recommended for production) or mount the correct certificates.
Port Binding: If the Agent cannot find the MCP tools, ensure the server log shows Running on http://0.0.0.0:8000. If it says 127.0.0.1, the Docker container won’t expose it externally.

🛡️ Quality Assurance

Status: ✅ Verified
Environment: Python 3.11
Auditor: AgentRetrofit CI/CD

Transparency: This page may contain affiliate links.

Automating IBM FileNet Content Retrieval via CMIS with CrewAI

Automating IBM FileNet Content Retrieval via CMIS with CrewAI

🏗️ Architecture

🛠️ The Server Code (server.py)

🐳 Docker Deployment (Dockerfile)

🤖 Connecting CrewAI

Common Integration Issues

🛡️ Quality Assurance

🛠️ The Server Code (`server.py`)

🐳 Docker Deployment (`Dockerfile`)