Kamil Bajda-Pawlikowski, Co-founder and CTO at Starburst
As you may have learned from our first press release, we have announced the creation of Starburst, a new independent company solely focused on Presto, an open source distributed SQL engine.
If you are new to Presto, please read more about its unique SQL-on-Anything capabilities.
In this blog post, I would like to talk a bit about how we got where we are today.
Who are we?
Let’s start with some introductions!
Justin Borgman and I are both Yale alumni and co-founders of Hadapt, the first SQL-on-Hadoop company, acquired by Teradata in 2014 to form the Teradata Center for Hadoop. Our focus was to evangelize Presto and accelerate its customer adoption.
Matt Fuller and Wojciech Biela were engineering leaders at Teradata. Their focus was to head the Presto product development efforts in the Boston and Warsaw offices respectively. Both Matt and Wojciech came from Hadapt along with Justin and Kamil, and the majority of our engineers.
Together, we spearheaded Teradata’s first widely successful involvement in an open source project. We engaged in the Presto community and over time we grew to become the most active contributors and committers to the Presto community outside Facebook.
Teradata & Presto
Our team has been working with Presto for 3 years. Since we really liked the Presto architecture and codebase as well as the team at Facebook who started the project, we decided to officially join the open source community in 2015 and announced our involvement at Hadoop Summit back in June of that year by making the initial release of our distribution of Presto generally available.
With that first release, Teradata committed to help the community make Presto a fully-featured and enterprise-ready SQL engine with open source contributions from engineers at the Teradata Center for Hadoop. Our team released quarterly enterprise editions of the Presto distro and delivered on our initial promise.
Let’s take a brief look at the key contributions from our team while we were at Teradata:
- Presto-Admin, an easy-to-use installation & management utility for Presto
- Security Integrations such as Kerberos, LDAP, and in-transit encryption
- ANSI SQL syntax enhancements which led to full TPC-H and TPC-DS support
- Enterprise-grade ODBC and JDBC drivers to enable BI tools such as Tableau, Qlik, etc.
- Presto connector enhancements for SQLServer and Cassandra
- Numerous query execution performance improvements
- Spill to disk capabilities for large intermediate data sets
A future blog post will dive into more details of the capabilities we delivered.
Presto Users and Community
Throughout our involvement in the open source community, Teradata sponsored various Hadoop conferences and meetups featuring Presto as a great interactive and highly scalable SQL engine for all data sources. Our voice was heard and in the last two years Presto experienced an unprecedented growth in popularity and user adoption at companies of all shapes and sizes, from fast-growing internet companies to the Fortune 500.
A large number of new users significantly expanded the initial group of early adopters such as Facebook, Airbnb, Dropbox, Groupon and Netflix. The acceleration of the Presto roadmap and successful proof-of-concepts led to production deployments at Twitter, Uber, LinkedIn, Slack as well as Bloomberg, Yahoo! Japan, Comcast, and FINRA.
Dozens of others are listed at the GitHub wiki page and there are many more who prefer to remain unnamed or who leverage Presto as part of cloud offerings such as Amazon EMR or Athena for their particular use cases.
We will share the success stories of many Presto users in the near future.
Starburst will continue to contribute to the open source project Presto and ship new releases of our Presto distribution every few months just as we did at Teradata. The goal remains the same, i.e. to provide well-tested and production-ready Presto for enterprise usage. We will be advancing Presto rapidly and offering the same high-quality support. Starburst will help the community keep Presto the best open source SQL engine for a variety of data sources at massive scale.