Skip to content

MCP Server

CatalystOps exposes a built-in Model Context Protocol (MCP) server that lets Claude, GitHub Copilot, Cursor, and any MCP-compatible client access live analysis data through natural language.

Auto-Discovery

In VS Code 1.99+, the MCP server is discovered automatically via vscode.lm.registerMcpServerDefinitionProvider. No configuration needed — just install CatalystOps and the server is available to Copilot and Claude.

For other clients, add the URL shown in the CatalystOps Output panel:

json
{
  "servers": {
    "catalystops": {
      "url": "http://127.0.0.1:<port>/mcp"
    }
  }
}

The port is dynamic and logged on extension startup.

Enabling / Disabling

The MCP server is enabled by default. Disable it with:

jsonc
{
  "catalystops.mcp.enabled": false
}

Tools

ToolDescription
analyze_pysparkRun local static analysis on any PySpark code snippet
get_active_file_issuesGet issues for the currently open file
get_plan_analysisGet plan issues from the last dry run
run_dry_runTrigger a dry run on the active file and return results
get_billing_summaryGet cached billing data (day / week / month)
refresh_billingForce a live billing query
list_clustersList workspace clusters with state and Spark version
list_job_runsList jobs and their most recent run status
get_job_run_planFetch plan issues from a historical job run by ID
get_last_job_run_analysisGet plan issues from the last job analyzed in VS Code

Resources

URIDescription
catalystops://issues/currentIssues for the active file
catalystops://plans/lastLast dry-run plan results
catalystops://billing/summaryCached billing summary

Prompts

PromptDescription
pyspark_code_reviewReview the active file for PySpark performance issues
optimize_spark_planAnalyze the last dry-run plan and suggest optimizations

Example Usage

You:    What issues are in my active file?
Claude: ↳ calling get_active_file_issues…
        Found 3 issues in pipeline.py:
        ● Line 12 — collect() pulls all data to the driver. OOM risk.
        ● Line 18 — Global orderBy shuffles all data to one partition.
        ● Line 31 — Streaming query has no .trigger() — continuous micro-batches.

You:    Run a dry run and tell me the join strategy
Claude: ↳ calling run_dry_run…
        The plan shows a Sort-Merge Join on line 8 — the right side
        (200MB) is below the broadcast threshold. Want me to add a broadcast hint?

Released under the Elastic License 2.0.