Introducing Starburst Presto for Azure HDInsight! Learn More »

Presto Newsletter #1

Welcome to the first issue of the Presto Newsletter.

 

Events

Presto Summit 2018 recap
The first ever, all-day Presto Summit brought together many Presto users, committers, and other big data analytics fans. Participants from over 40 companies joined us on July 16th. The agenda was filled with high-quality talks from some of the leading members of the Presto community.
Here is a link to the topics covered and the slides:
https://www.starburstdata.com/technical-blog/presto-summit-2018-recap/

 

Presto News and Knowledge

Querying 8.66 Billion Records – a Performance and Cost Comparison between Presto and Redshift (including Spectrum)
This is a very detailed post from Ernesto at Concurrency Labs comparing Presto to Redshift. The comparison includes cost and performance for both solutions and is worth the read:
https://www.concurrencylabs.com/blog/starburst-presto-vs-aws-redshift/

Using Presto to query on-premises object stores
Nitish at Minio, a distributed object store for private clouds (roll your own S3..), wrote a great post on creating your own object store analytics hub using Presto:
https://blog.minio.io/presto-modern-interactive-sql-query-engine-for-enterprise-ce56d7aea931

Demo: Querying Presto from Qlik Sense
This demo shows how easy it is to use Qlik Sense to query Presto:
https://www.youtube.com/watch?v=X9lFdues_wE

Presto query optimizer: Pursuit of performance
Starburst CTO Kamil Bajda-Pawlikowski and Facebook’s Martin Traverso presented at the DataWorks Summit on Presto’s new cost-based optimizer:
https://dataworkssummit.com/san-jose-2018/session/presto-query-optimizer-pursuit-of-performance/

Using Presto for GeoSpatial Analytics
Also at DataWorks Summit, Uber engineers talked about using Presto for GeoSpatial Analytics:
https://dataworkssummit.com/san-jose-2018/session/geospatial-data-platform-at-uber/

Presto at Tivo, Boston Hadoop Meetup
See how Tivo uses Presto for SQL analytics. This excellent presentation covers a few important topics:
– TIVO’s decision-making process – choosing Presto over Redshift Spectrum
– Choosing the correct AWS instance type for their Presto workloads
– How the different memory structures in Presto work together
– Using MySQL and S3 together to create TIVO’s data warehouse
https://www.slideshare.net/JustinBorgman1/presto-at-tivo-boston-hadoop-meetup

Presto TPC-DS benchmark on AWS
Before introducing the Presto cost-based optimizer, Presto had issues with running all TPC-DS queries. That’s no longer the case, plus the performance is much better than the older versions of Presto:
https://www.starburstdata.com/technical-blog/starburst-presto-on-aws-18x-faster-than-emr/

Big Data File Formats – ORC, Parquet & AVRO
At Starburst, we field a lot of questions from customers and prospects on which source file format to use. The answer is usually situation-dependent. This article on Datanami from Alex Woodie does an excellent job of breaking down each format and their advantages and disadvantages in different situations:
https://www.datanami.com/2018/05/16/big-data-file-formats-demystified/

3rd party Presto benchmarks
Here are two excellent articles on Presto performance comparison benchmarks. It’s no wonder Presto’s popularity has exploded over the last few years:
http://bytes.schibsted.com/bigdata-sql-query-engine-benchmark/
https://virtuslab.com/blog/benchmarking-spark-sql-presto-hive-bi-processing-googles-cloud-dataproc/

 

Releases and New Features

Starburst Presto 203e released:
https://www.starburstdata.com/technical-blog/starburst-enterprise-distribution-of-presto-203e-now-available/

-AWS Glue Integration
-New geospatial functions and improved geospatial function performance
-Additional SQL subquery support
-Add SQL FILTER clause for aggregations
-Column-level access control
-Support for authentication with JWT access token
-Various bug fixes that continue to improve the robustness of Presto
-Improvements to query scheduling and resource management

We would like to thank the members of the Presto community for the following contributions:
-Maria Basmanova from Facebook – new geospatial functions and optimizations
Rentao Wu from AWS – Glue Catalog support
-Li Ding – SQL FILTER clause for aggregations
and many, many more!

 

Engineer’s Corner

Iceberg – A modern table format for big data from Netflix
During the first-ever Presto Summit last week, Netflix presented “Iceberg,” a new file format for storing large, slow-moving tabular data. Their presentation and Github links:
https://www.slideshare.net/kbajda/presto-summit-2018-09-netflix-iceberg/
https://github.com/Netflix/iceberg