This is Part 1 of a 3-part series on Spark SQL Metrics:

What Are SQL Metrics?

Every physical operator in Spark SQL can define metrics — counters that track what happened during query execution. When you click on a query in the Spark SQL tab and see numbers like “number of output rows: 5,000” or “peak memory: 512.0 MiB”, those are SQL metrics.

They are built on Spark’s AccumulatorV2 framework: each task updates its local copy, and the driver aggregates them after task completion.

The Five Metric Types

Spark defines five metric types, each with different aggregation and display semantics:

1. Sum (createMetric)

The simplest type. Values from all tasks are summed into a single total.

Display format: 1,234,567

Typical usage: Row counts, file counts, partition counts.

"numOutputRows" -> SQLMetrics.createMetric(sparkContext, "number of output rows")

2. Size (createSizeMetric)

For byte-based measurements. Shows the total plus per-task distribution.

Display format: total (min, med, max): 512.0 MiB (128.0 MiB, 128.0 MiB, 128.0 MiB)

Typical usage: Peak memory, spill size, data size, shuffle bytes.

"peakMemory" -> SQLMetrics.createSizeMetric(sparkContext, "peak memory")

The (min, med, max) breakdown reveals per-task distribution — essential for detecting skew. If max is 10x the median, one task is doing most of the work.

3. Timing (createTimingMetric)

For millisecond durations. Shows total plus per-task distribution.

Display format: total (min, med, max): 5.0 s (100 ms, 1.2 s, 2.0 s)

Typical usage: Aggregation time, sort time, broadcast time, hash map build time.

"aggTime" -> SQLMetrics.createTimingMetric(sparkContext, "time in aggregation build")

4. Nanosecond Timing (createNanoTimingMetric)

Same as timing but accepts nanosecond values, converted to milliseconds for display.

Display format: Same as timing.

Typical usage: Shuffle write time (measured in nanoseconds for precision).

"shuffleWriteTime" -> SQLMetrics.createNanoTimingMetric(sc, "shuffle write time")

5. Average (createAverageMetric)

For per-task averages. Shows distribution of the average values across tasks.

Display format: avg (min, med, max): (1.2, 2.5, 6.3)

Typical usage: Hash probe efficiency.

"avgHashProbe" -> SQLMetrics.createAverageMetric(sparkContext, "avg hash probes per key")

Reading the “total (min, med, max)” Format

This is the most important format to understand:

peak memory
total (min, med, max)
512.0 MiB (128.0 MiB, 128.0 MiB, 128.0 MiB (stage 3.0: task 36))
FieldMeaning
totalSum across all tasks
minSmallest task value
medMedian (50th percentile)
maxLargest task value, annotated with (stage X: task Y)

Balanced workload: min ≈ med ≈ max

Skewed workload: max » med — investigate the annotated task

Complete SQL Metrics Reference

Scan Operators

MetricDisplay NameTypeOperators
numOutputRowsnumber of output rowssumDataSourceScanExec, DataSourceV2ScanExecBase, InMemoryTableScanExec, LocalTableScanExec
numFilesnumber of files readsumDataSourceScanExec
filesSizesize of files readsizeDataSourceScanExec
numPartitionsnumber of partitions readsumDataSourceScanExec
staticFilesNumstatic number of files readsumDataSourceScanExec
staticFilesSizestatic size of files readsizeDataSourceScanExec
metadataTimemetadata timetimingDataSourceScanExec
scanTimescan timetimingDataSourceScanExec
pruningTimedynamic partition pruning timetimingDataSourceScanExec

Aggregation Operators

MetricDisplay NameTypeOperators
numOutputRowsnumber of output rowssumAll aggregate operators
aggTimetime in aggregation buildtimingHashAggregateExec, ObjectHashAggregateExec, SortAggregateExec
peakMemorypeak memorysizeHashAggregateExec
spillSizespill sizesizeHashAggregateExec, ObjectHashAggregateExec
avgHashProbeavg hash probes per keyaverageHashAggregateExec
numTasksFallBackednumber of sort fallback taskssumHashAggregateExec, ObjectHashAggregateExec

Join Operators

MetricDisplay NameTypeOperators
numOutputRowsnumber of output rowssumAll join operators
buildDataSizedata size of build sidesizeShuffledHashJoinExec
buildTimetime to build hash maptimingShuffledHashJoinExec
spillSizespill sizesizeSortMergeJoinExec

Sort Operator

MetricDisplay NameTypeOperators
sortTimesort timetimingSortExec
peakMemorypeak memorysizeSortExec
spillSizespill sizesizeSortExec

Shuffle Exchange

MetricDisplay NameTypeOperators
dataSizedata sizesizeShuffleExchangeExec
numPartitionsnumber of partitionssumShuffleExchangeExec
shuffleBytesWrittenshuffle bytes writtensizeShuffle write
shuffleRecordsWrittenshuffle records writtensumShuffle write
shuffleWriteTimeshuffle write timensTimingShuffle write

Shuffle Read (via AQEShuffleReadExec)

MetricDisplay NameTypeOperators
numPartitionsnumber of partitionssumAQEShuffleReadExec
partitionDataSizepartition data sizesizeAQEShuffleReadExec
numCoalescedPartitionsnumber of coalesced partitionssumAQEShuffleReadExec
numSkewedPartitionsnumber of skewed partitionssumAQEShuffleReadExec
numSkewedSplitsnumber of skewed partition splitssumAQEShuffleReadExec
numEmptyPartitionsnumber of empty partitionssumAQEShuffleReadExec
remoteBlocksFetchedremote blocks readsumShuffle read
localBlocksFetchedlocal blocks readsumShuffle read
remoteBytesReadremote bytes readsizeShuffle read
remoteBytesReadToDiskremote bytes read to disksizeShuffle read
localBytesReadlocal bytes readsizeShuffle read
fetchWaitTimefetch wait timetimingShuffle read
recordsReadrecords readsumShuffle read
remoteReqsDurationremote reqs durationtimingShuffle read
remoteMergedReqsDurationremote merged reqs durationtimingShuffle read

Broadcast Exchange

MetricDisplay NameTypeOperators
dataSizedata sizesizeBroadcastExchangeExec
numOutputRowsnumber of output rowssumBroadcastExchangeExec
collectTimetime to collecttimingBroadcastExchangeExec
buildTimetime to buildtimingBroadcastExchangeExec
broadcastTimetime to broadcasttimingBroadcastExchangeExec

Python UDF Operators

MetricDisplay NameTypeOperators
pythonDataSentdata sent to Python workerssizeAll Python operators
pythonDataReceiveddata returned from Python workerssizeAll Python operators
pythonBootTimetime to start Python workerstimingAll Python operators
pythonInitTimetime to initialize Python workerstimingAll Python operators
pythonTotalTimetime to run Python workerstimingAll Python operators
pythonProcessingTimetime to execute Python codetimingAll Python operators
pythonNumRowsReceivednumber of output rowssumAll Python operators

Window Operators

MetricDisplay NameTypeOperators
spillSizespill sizesizeWindowExec, ArrowWindowPythonExec

Write Operators

MetricDisplay NameTypeOperators
numFilesnumber of written filessumFile writes
numOutputByteswritten outputsizeFile writes
numOutputRowsnumber of output rowssumFile writes
numPartsnumber of dynamic partsumFile writes
taskCommitTimetask commit timetimingFile writes
jobCommitTimejob commit timetimingFile writes

MERGE INTO Operator

MetricDisplay NameTypeOperators
numTargetRowsCopiedtarget rows copied unmodifiedsumMergeRowsExec
numTargetRowsInsertedtarget rows insertedsumMergeRowsExec
numTargetRowsUpdatedtarget rows updatedsumMergeRowsExec
numTargetRowsDeletedtarget rows deletedsumMergeRowsExec
numTargetRowsMatchedUpdatedtarget rows updated by matched clausesumMergeRowsExec
numTargetRowsMatchedDeletedtarget rows deleted by matched clausesumMergeRowsExec
numTargetRowsNotMatchedBySourceUpdatedtarget rows updated by not matched by sourcesumMergeRowsExec
numTargetRowsNotMatchedBySourceDeletedtarget rows deleted by not matched by sourcesumMergeRowsExec

Stateful Streaming Operators

MetricDisplay NameTypeOperators
numOutputRowsnumber of output rowssumStateful operators
numTotalStateRowsnumber of total state rowssumStateful operators
numRowsDroppedByWatermarkrows dropped by watermarksumStateful operators
stateMemorymemory used by statesizeStateful operators
allUpdatesTimeMstime to updatetimingStateful operators
allRemovalsTimeMstime to removetimingStateful operators
commitTimeMstime to commit changestimingStateful operators

Other Operators

MetricDisplay NameTypeOperators
numOutputRowsnumber of output rowssumFilterExec, ProjectExec, ExpandExec, GenerateExec, CommandResultExec, WindowGroupLimitExec, UnionLoopExec, PythonWorkerLogsExec
numAnchorOutputRowsnumber of anchor output rowssumUnionLoopExec
numIterationsnumber of recursive iterationssumUnionLoopExec
dataSizedata sizesizeSubqueryBroadcastExec
collectTimetime to collecttimingSubqueryBroadcastExec

Columnar Batch Operators

MetricDisplay NameTypeOperators
numInputBatchesnumber of input batchessumColumnar transitions
numOutputBatchesnumber of output batchessumColumnar transitions
numInputRowsnumber of input rowssumColumnar transitions
numOutputRowsnumber of output rowssumColumnar transitions

WholeStageCodegen and Metric Scope

Most operators (FilterExec, ProjectExec, HashAggregateExec, joins) are fused by WholeStageCodegen into a single JVM method. Their row count metrics (numOutputRows) are individually accurate, but they don’t have individual timing because they execute as one compiled function.

Operators that have phases executing outside the codegen pipeline and have their own timing:

  • SortExec (sort time)
  • Aggregations (aggregation build time)
  • ShuffledHashJoinExec (hash map build time)
  • BroadcastExchangeExec (collect/build/broadcast time)
  • ShuffleExchangeExec (shuffle write time)
  • Python UDF operators (Python worker time)
  • Stateful streaming operators (update/remove/commit time)

In Part 2, we’ll cover how SQL metrics are implemented internally (the AccumulatorV2 lifecycle), and how AQE uses shuffle statistics at runtime to rewrite query plans. In Part 3, we’ll cover the DataSource V2 CustomMetric extension API, UI rendering, and the REST API.