<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Query-Optimizer on Kent Yao</title>
    <link>https://yaooqinn.github.io/tags/query-optimizer/</link>
    <description>Recent content in Query-Optimizer on Kent Yao</description>
    <generator>Hugo -- 0.157.0</generator>
    <language>en-us</language>
    <lastBuildDate>Wed, 27 May 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://yaooqinn.github.io/tags/query-optimizer/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>−46% or −2%? Rule-Based Rewriters Only Work at Home</title>
      <link>https://yaooqinn.github.io/posts/query-engines/rule-rewrite-blindspot-dsb/</link>
      <pubDate>Wed, 27 May 2026 00:00:00 +0000</pubDate>
      <guid>https://yaooqinn.github.io/posts/query-engines/rule-rewrite-blindspot-dsb/</guid>
      <description>On TPC-H 10GB, a state-of-the-art learned rewriter cuts mean execution time from 69.84s to 37.57s — a 46% win. On DSB 10GB, the same rewriter takes 32.62s to 31.93s — a 2.1% non-event. The gap isn&amp;rsquo;t query difficulty; it&amp;rsquo;s whether the benchmark is in the rewriter&amp;rsquo;s training distribution. &amp;ldquo;Rule-based systems are stable and reliable&amp;rdquo; is often a benchmark artifact, not an engineering fact.</description>
    </item>
    <item>
      <title>Anatomy of a 120-Line Prompt That Lets an LLM Rewrite Physical Plans</title>
      <link>https://yaooqinn.github.io/posts/query-engines/prompt-anatomy-for-plan-generation/</link>
      <pubDate>Wed, 27 May 2026 00:00:00 +0000</pubDate>
      <guid>https://yaooqinn.github.io/posts/query-engines/prompt-anatomy-for-plan-generation/</guid>
      <description>DBPlanBench gets GPT-5 to deliver a 4.78× geometric-mean speedup on DataFusion TPC-H SF10 by letting the model rewrite physical plans directly. I read its sql_optimization_prompts.py end to end — 120 lines, 30 of methodology, 90 of contract. That ratio is the most transferable thing in the paper.</description>
    </item>
    <item>
      <title>Branch Flip Analysis: A White-Box Way to Find Performance Bugs, and What It Means for Spark</title>
      <link>https://yaooqinn.github.io/posts/spark/branch-flip-analysis-from-postgres-to-spark/</link>
      <pubDate>Tue, 26 May 2026 00:00:00 +0000</pubDate>
      <guid>https://yaooqinn.github.io/posts/spark/branch-flip-analysis-from-postgres-to-spark/</guid>
      <description>An ETH paper finds 21 previously unknown performance bugs in PostgreSQL, MySQL, CockroachDB and MariaDB by flipping optimization branches on and off. The technique is conceptually simple, the surface in Spark is unusually inviting, and the open-source engine community already ships one of the building blocks.</description>
    </item>
    <item>
      <title>Just Asking an LLM to Rewrite SQL Does Almost Nothing</title>
      <link>https://yaooqinn.github.io/posts/query-engines/llm-only-rewrite-doesnt-work/</link>
      <pubDate>Tue, 26 May 2026 00:00:00 +0000</pubDate>
      <guid>https://yaooqinn.github.io/posts/query-engines/llm-only-rewrite-doesnt-work/</guid>
      <description>On TPC-H 10GB, asking GPT-4o to rewrite SQL takes mean execution time from 78.81s down to 74.92s — almost nothing. Swap in an open 14B model, feed it plans, add a reward, fine-tune once, and the same workload drops to 29.67s. Whether LLMs can help SQL rewriting is not a question about model strength; it&amp;rsquo;s a question about whether you&amp;rsquo;re willing to give the model the signals it actually needs.</description>
    </item>
    <item>
      <title>LLMs for Join Order: An Apache Spark Perspective on the Three-Tier Ladder</title>
      <link>https://yaooqinn.github.io/posts/spark/llm-for-join-order-an-apache-spark-perspective/</link>
      <pubDate>Mon, 25 May 2026 00:00:00 +0000</pubDate>
      <guid>https://yaooqinn.github.io/posts/spark/llm-for-join-order-an-apache-spark-perspective/</guid>
      <description>Databricks and UPenn put an LLM agent to work as an offline join-order tuner and got P90 latency down 41% / geomean 1.288× speedup on JOB&amp;rsquo;s 113 queries — beating even perfect cardinality estimates. From the trenches of an open-source query engine, here is what that result does and does not prove.</description>
    </item>
  </channel>
</rss>
