Home Blog CatalystOps 0.8.0
Release Notes March 9, 2026 v0.8.0

See Inside Your Spark Jobs:
Query Plan Visualization Comes to VS Code

If you've ever stared at a slow Databricks job wondering why it's slow — this release is for you. CatalystOps 0.8.0 ships the Explain Plan view and DAG visualization: two new tools that bring your Spark physical query plan directly into VS Code, with cost scores on every node and one-click fixes for the most common performance killers.

The Problem

Debugging a slow PySpark job has always meant leaving your editor, navigating to the Databricks UI, clicking through the Spark UI, squinting at a wall of plan text, and then manually figuring out what to do about it.

Even experienced data engineers find the raw physical plan hard to read. And once you've identified the problem — a bad join, an unnecessary shuffle, a repeated scan — you still have to remember the right fix and apply it by hand.

CatalystOps 0.8.0 changes that.

What's New in 0.8.0

🔍 Explain Plan View

After running a dry run against your Databricks cluster, CatalystOps now populates a sidebar tree showing your full physical query plan — broken down node by node, with a cost score on each operation.

No more digging through Spark UI. The plan lives in your editor, right next to your code. Sort-merge joins, exchanges (shuffles), and repeated scans are flagged automatically so you know exactly where your DBUs are going.

🗺 Interactive DAG Visualization

Open the Plan DAG (CatalystOps: Show Plan DAG) to see your query rendered as an interactive graph in a VS Code webview panel. It makes it immediately obvious when your plan has unnecessary stages, fan-outs, or redundant operations that wouldn't be visible in the tree view.

🗺 Live Example: Query Plan DAG
A typical PySpark join query — click any node to see what CatalystOps flags
analysis.py
orders = spark.table("orders")
customers = spark.table("customers")
products = spark.table("products")

# Large join — no broadcast hints
result = (
  orders
  .join(customers, "customer_id")
  .join(products, "product_id")
  .groupBy("region")
  .agg(F.sum("revenue"))
)
CatalystOps: Plan DAG 2 issues found
HashAggregate sum(revenue) Exchange hashpartition · 30 pts SortMergeJoin orders ⋈ customers · 50 pts Exchange hashpartition · 30 pts SortMergeJoin ⋈ products · 50 pts FileScan orders FileScan customers FileScan products
🖱 Click any node to see CatalystOps analysis
Critical Warning Info OK

⚡ Context-Aware Quick Fixes

CatalystOps doesn't just show you the problem — it offers one-click fixes directly on plan tree nodes. These aren't generic suggestions; they're generated from the actual plan for your specific query:

Problem detectedQuick fix offered
Inefficient join (missing broadcast)Add broadcast hint
Unnecessary exchange / shuffleAdd repartition hint
Repeated scan without cachingAdd .persist()
Sort-merge join with AQE disabledSet AQE config
Cartesian productAdd join condition hint

⏱ Configurable Dry-Run Timeout

Set your own timeout via catalystops.dryRun.timeoutSeconds (default: 300s, minimum: 30s). No more jobs silently timing out mid-analysis on large plans.

Why This Matters

Most PySpark performance issues fall into a handful of categories: bad joins, unnecessary shuffles, repeated scans, missing statistics. They're common, they're expensive, and they're fixable — if you can spot them.

The problem has always been visibility. The Spark physical plan contains everything you need, but it's buried in a UI most developers don't open until something is already on fire. CatalystOps 0.8.0 brings that visibility into your editor, before you ship to production, with the fixes already written for you.

🔥 Pro tip

After your first dry run, open the Plan DAG alongside the plan tree — the graph view makes multi-stage shuffle chains immediately obvious at a glance.

Getting Started

  1. Install CatalystOps from the VS Code Marketplace or Open VSX
  2. Connect your Databricks workspace in the extension settings
  3. Open a PySpark file and run a dry run (CatalystOps: Run Dry Run)
  4. Open the Explain Plan panel in the sidebar
  5. Click any node to see cost details and available quick fixes
  6. Open the DAG view — CatalystOps: Show Plan DAG

What's Next

0.8.0 lays the groundwork for deeper plan analysis. On the roadmap: multi-file plan correlation, historical plan comparison, and cost trend tracking across runs.

Got feedback? Open an issue on GitHub or drop a review on the VS Code Marketplace.


Free & open source

Try CatalystOps today

Catch PySpark performance issues before they hit production — inline, in your editor.