Deep Dive into Spark SQL Metrics (Part 1): Types, Full Reference, and What They Mean

Part 1 of a 3-part deep dive into Apache Spark’s SQL metrics system. Covers the 5 metric types, a complete reference of 100+ metrics across all operators, and how to read the numbers in the Spark UI.

April 1, 2026 · 7 min · Kent Yao

Deep Dive into Spark SQL Metrics (Part 2): Internals and How AQE Uses Them

Part 2 of the SQL Metrics deep dive. How metrics flow from tasks to driver, and how Adaptive Query Execution uses shuffle statistics to rewrite plans at runtime.

April 1, 2026 · 7 min · Kent Yao

Deep Dive into Spark SQL Metrics (Part 3): Extension APIs, UI, and REST API

Part 3 of the SQL Metrics deep dive. How to extend Spark with custom metrics via the DataSource V2 API, how the UI renders them, and how to query metrics programmatically.

April 1, 2026 · 9 min · Kent Yao

Deep Dive into Spark SQL Metrics (Part 4): How Gluten Extends the Metrics System

Part 4 of the SQL Metrics deep dive. How Apache Gluten bridges native Velox/ClickHouse metrics back to Spark’s SQL Metrics framework, adding 60+ metrics that vanilla Spark doesn’t have.

April 1, 2026 · 12 min · Kent Yao

Spark Declarative Pipelines: A Paradigm Shift for Data Engineering

Apache Spark 4.1 introduces Spark Declarative Pipelines (SDP) — a declarative framework that lets you define what your data should look like, not how to compute it. As a Spark PMC Member, here’s my take on what this means for data engineering.

March 28, 2026 · 3 min · Kent Yao

Introducing spark-advisor: An AI-Powered Spark Performance Engineer

spark-advisor is an agent skill that turns your AI coding assistant into a Spark performance engineer — diagnosing slow jobs, detecting skew, comparing benchmark runs, and producing actionable tuning recommendations.

March 20, 2026 · 4 min · Kent Yao

spark-history-cli: Making the Spark History Server Agent-Friendly

spark-history-cli brings the Spark History Server to your terminal — an interactive REPL and one-shot CLI that covers all 20 REST API endpoints. List apps, inspect jobs, drill into stages, check SQL executions, and download event logs without ever opening a browser. It also ships as a GitHub Copilot CLI skill.

March 18, 2026 · 6 min · Kent Yao

The SQL Execution Detail Page Finally Shows You What Your Jobs Are Doing

The SQL execution detail page in Spark’s Web UI used to show jobs as comma-separated IDs. Now it has a full Associated Jobs table with status, duration, stage progress, and task progress bars — so you can debug SQL queries without clicking through each job individually.

March 14, 2026 · 4 min · Kent Yao

Dark Mode Comes to the Apache Spark Web UI

Apache Spark’s Web UI now supports dark mode — a long-awaited quality-of-life improvement for developers who spend hours debugging jobs. Here’s why we built it and what it means for the Spark community.

March 6, 2026 · 4 min · Kent Yao

Rethinking SQL Plan Visualization in Apache Spark

The Spark SQL plan visualization just got a major upgrade — compact node labels, clickable metric panels, and edge row counts that make join explosions immediately visible.

March 5, 2026 · 4 min · Kent Yao