Datafusion

Anatomy of a 120-Line Prompt That Lets an LLM Rewrite Physical Plans

DBPlanBench gets GPT-5 to deliver a 4.78× geometric-mean speedup on DataFusion TPC-H SF10 by letting the model rewrite physical plans directly. I read its sql_optimization_prompts.py end to end — 120 lines, 30 of methodology, 90 of contract. That ratio is the most transferable thing in the paper.

LLMs Shouldn't Replace the Query Optimizer — They Should Sit Behind It

Putting the LLM after the optimizer, emitting JSON patches for local plan tuning, is easier to reason about as engineering than asking it to replace the cost-based optimizer.