Skip to content

Static Cost Estimation

Get an instant dollar estimate for a PySpark script without running it — by annotating your file with compute and data size hints.

Annotations

Add a # @compute comment anywhere in the file to describe your cluster, and # @size comments on read operations to describe data sizes:

python
# @compute: nodes=4, cores=2, memory=16GB, rate=0.25

events = spark.read.parquet("s3://bucket/events")   # @size: 50GB
lookup = spark.read.csv("s3://bucket/lookup")        # @size: 200MB

CatalystOps shows the estimated cost inline via CodeLens above the # @compute annotation.

Compute Parameters

ParameterDescription
nodesNumber of worker nodes
coresvCPUs per node
memoryRAM per node (e.g. 16GB)
rateDBU rate in $/hr

Size Parameters

# @size accepts values like 50GB, 200MB, 1TB. Place it at the end of the line with the spark.read call.

DBU Rate

The default DBU rate comes from catalystops.cost.dbuRatePerHour (default 0.4). Override it per-file with the rate parameter in # @compute.

Released under the Elastic License 2.0.