All Rules Reference

This page lists every rule CatalystOps checks. For context and fix examples, see Local Analysis and Plan Analysis.

Static Code Rules

Critical

Rule ID	Name	Description
`CODE_COLLECT_001`	collect() Usage	Brings all data to the driver — OOM risk
`CODE_ITER_COLLECT_001`	for-loop over collect()	Row-by-row iteration on the driver
`CODE_CROSSJOIN_001`	Cross Join	Cartesian product — exponential data growth
`CODE_SQL_INJECT_001`	SQL Injection	f-string in spark.sql()
`CODE_KAFKA_COMMIT_001`	Kafka Auto-Commit	Can cause data loss or duplication
`CODE_DLT_CDC_ORDER_001`	DLT CDC Clause Order	APPLY AS DELETE WHEN in wrong order

Warning

Rule ID	Name	Description
`CODE_PANDAS_001`	toPandas()	Brings all data to driver
`CODE_COUNT_001`	count() > 0	Use isEmpty() instead
`CODE_COALESCE_001`	coalesce(1)	Funnels all data to one partition
`CODE_ORDERBY_001`	Global orderBy	Full shuffle
`CODE_REPARTITION_001`	repartition before write	Use coalesce()
`CODE_WINDOW_001`	Window without partitionBy	Global window
`CODE_WITHCOLUMN_LOOP_001`	withColumn in loop	New plan node per iteration
`CODE_REPEATED_ACTIONS_001`	Repeated actions without cache	Recomputes the DataFrame
`CODE_UDF_FILTER_001`	UDF in filter()	Blocks predicate pushdown
`CODE_STREAMING_TRIGGER_001`	No .trigger()	Continuous micro-batches
`CODE_STREAMING_WATERMARK_001`	Streaming groupBy without watermark	Unbounded state
`CODE_STREAMING_INNER_JOIN_001`	Streaming inner join	Silently drops late events
`CODE_DYNAMIC_ALLOC_001`	Dynamic allocation on streaming	Cluster instability
`CODE_AUTOLOADER_RATE_001`	Auto Loader without rate limit	Unbounded ingestion
`CODE_CHECKPOINT_DBFS_001`	Checkpoint on DBFS	Use cloud storage
`CODE_OPTIMIZE_MERGE_001`	OPTIMIZE after every MERGE	Latency spikes
`CODE_DROP_CREATE_001`	DROP + CREATE TABLE	Non-atomic, use CREATE OR REPLACE
`CODE_FLOAT_FINANCIAL_001`	FLOAT for financial data	Use DECIMAL

Info

Rule ID	Name	Description
`CODE_UDF_001`	UDF Usage	Prevents query plan optimization
`CODE_SCHEMA_001`	Schema Inference	Extra data pass
`CODE_SELECT_STAR_001`	SELECT *	Reads unnecessary columns
`CODE_WRITE_MODE_001`	Missing write mode	Defaults to error
`CODE_REPRO_001`	Repeated source without cache	Multiple scans
`CODE_ROCKSDB_001`	Stateful streaming without RocksDB	State store performance
`CODE_UNNAMED_QUERY_001`	Unnamed streaming query	Hard to monitor
`CODE_MERGE_DV_001`	MERGE without Deletion Vectors	Performance on large tables
`CODE_MERGE_RLC_001`	MERGE without Row-Level Concurrency	Concurrency issues
`CODE_ZORDER_001`	ZORDER	Deprecated; use Liquid Clustering
`CODE_ANALYZE_001`	Missing ANALYZE TABLE	Stale statistics
`CODE_DLT_PARTITION_001`	DLT PARTITIONED BY	Use Liquid Clustering
`CODE_READ_FILES_SCHEMA_001`	read_files() without schemaHints	Schema inference per run

Plan-Level Detectors

These run after a dry run or job run analysis and require an actual Catalyst execution plan:

Name	Description
`BroadcastHashJoin` (missing)	Sort-merge join where broadcast would be faster
`CartesianProduct`	Cartesian join in the physical plan
`ShuffleExchange`	Unnecessary shuffle
`SinglePartitionBottleneck`	Exchange SinglePartition — all data on one executor
`SortAggregate`	Sort-based aggregation (prefer hash-based)
`GlobalWindow` / `RunningWindowFunction`	Global window, Photon-aware
`RepeatedTableScan`	Same table scanned multiple times
`MissingPartitionFilter`	Partition filters empty — full table scan
`MissingTableStatistics`	Table has no statistics
`CacheSpill`	Cached data spilling to disk
`TooFewPartitions`	Low parallelism for data size
`CrossJoin`	Cartesian join (also caught by local analysis)
`UnionSchemaMismatch`	Column order mismatch (also caught by local analysis)