Matt Fuller, Co-founder at Starburst
10 minute read
Next week, we will be releasing the Starburst Distribution of Presto 195e. Based on prestodb/presto 0.195, Starburst’s 195e will ship with Presto’s first cost-based optimizer! In our performance testing and in collaboration with customers in our beta program, we are measuring greater than an order of magnitude performance improvement for many analytical queries such as TPC-H and TPC-DS queries.
Over the last year, our team at Starburst in collaboration with the team at Facebook have been heads down working on a state of the art cost-based optimizer (CBO). In the coming weeks leading up to and after our release, we’ll be publishing a series of blog posts describing our the CBO design, engineering efforts, and benchmarking results.
All the CBO work we’ve done will be open sourced. Much of it is already merged into prestodb/presto and we at Starburst will continue to work on merging the remaining pieces. In the meantime, once 195e is released you’ll be able to download 195e and try out Presto’s first CBO!
Before I tell you more about the CBO in Presto, let me first give some background. SQL is a declarative language where you describe what data you would like and not the algorithmic steps on how to do it. It is up to the SQL query engine to determine the sequence of steps, commonly referred to as a query plan, to process the query and return the data to the user.
Depending on the complexity of the SQL query there are many (often exponentially many) different query plans that return the same results. However, the performance of each plan varies dramatically. Yes, there are queries that could run subsecond or never finish in multiple lifetimes depending on the chosen plan!
One of the main goals of the CBO is to explore the space of possible query plans and find the optimal one. However because this is an NP search space, it may not be the optimal plan, but a close to an optimal plan. e.g. the absolute optimal query plan probably does not matter if it takes hours or more and lots of resources to find when a “pretty good” plan can be found in milliseconds.
Presto with Non-CBO
Prior to the CBO, Presto’s optimizer consisted only of a large number of query rewrites. These rewrite optimizations are important and rely on good-enough-heuristics. For example, predicate push down is generally a good choice and even a wrong decision does not cause a performance disaster. All these rewrites will still exist in the coming release and the CBO will address the biggest gaps in Presto optimization:
- Join Reordering
- Join Distribution Choice
These optimizations are much harder to rely on heuristics and therefore must use a model to cost the trade-offs of the multiple query plans. These are also the most important optimizations to make. We will save discussion of other optimizations such as partition pruning, dynamic filtering, column pruning, query runtime optimizations for other blog posts.
Prior to the CBO, Presto determined the join order based on how the query was syntactically written. It was up to the person writing SQL to carefully craft the query so it performs well — remember the part about SQL supposed to be declarative?
For example consider,
SELECT * FROM foo JOIN bar ON foo.a=bar.x JOIN pio on pio.q = bar.z;
As written, Presto would first join tables foo and bar. And then join table pio to the result of that. However, depending on the data properties, it might be optimal to join bar and pio first. And then join foo to the result of that. Using the CBO, Presto will be able to intelligently decide the best sequence. We will discuss this in MUCH greater detail in the coming blog posts.
Join Distribution Choice
In Presto, there are two types of join distribution choices:
- Replicated (sometimes also called broadcast)
We won’t go into too many details now, but each choice has their own advantages depending on the data properties and upstream pieces of the plan we are joining. A repartitioned join allows for distributed join execution across a cluster by partitioning the data on both join inputs. This choice usually works best when both inputs are large. However, if one input to the join is small enough, it may make more sense to simply replicate or copy the contents to each node in the cluster. Doing so could avoid an expensive repartition of the other join input.
Prior to the CBO, there was a feature flag in Presto to determine if the join distribution type should be either Replicated or Repartitioned. However, it was impossible to mix the two types for queries with more than 1 join. So you had to decide which is overall better — often flipping the flag may help some joins and hurt others making performance for other queries unpredictable.
Presto with CBO
In the Presto 195e release (and in a nearterm release of prestodb/presto), we address these two aforementioned gaps in Presto by the introduction of the CBO. In the coming series of blog posts we will describe in detail how Presto’s CBO chooses an optimal plan. Topics will include Join Enumeration, Cost Model, and Statistics, and SPI changes to plug Presto connectors into the CBO. This will certainly be one of the most exciting Presto releases to date!
Why does this matter?
Presto is widely used as the ad hoc query engine for querying data from S3, HDFS, MySQL and many any other data sources (...we’re working on Azure Blob storage and others). Not only is it a pain to mindlessly wait for your query to complete, it is also increasing your infrastructure bill and causes you to do less in the time you have!
Surely a “human optimizer” could spend time cleverly crafting their queries and determine which feature flags to flip for each query. And maybe the human time spent still beats the time to execute a disastrous query plan. However, many interact with Presto via a tool such a Tableau, Superset, Microstrategy, Qlikview, etc. These tools programmatically generate the queries and the ability to override the SQL is very limited or completely impossible.
We hope you check back in the coming weeks as some of our Presto committers that worked on this CBO project will be blogging about it. Expect to see posts from:
Expect to read details about the statistics calculations, changes to the Presto SPI for connectors to plug into the CBO, cost modeling, join enumeration, and more.