Latest Blog Post:

Data Lakes without Hadoop

View Post »

Data Lakes without Hadoop

It seems like migrating to the cloud has dominated the news and a lot of companies are shuttering their data centers and letting cloud providers handle it for them. Reasons such as elasticity, simplicity, and infrastructure agility are all great reasons but there are many companies that continue to host their own infrastructure. The reasons could be security or they believe the cloud doesn’t provide the cost benefits in their scenario.

For these companies, building a data lake usually means…

Read More »

Presto Memory Connector

There is a highly efficient connector for Presto! It works by storing all data in memory on Presto Worker nodes, which allow for extremely fast access times with high throughput while keeping CPU overhead at bare minimum.

Read More »

True Separation of Storage and Compute

For the last few years, the hot topic in any organization is the separation of storage and compute. With data volumes increasing on a daily basis as well as the types of data being stored, placing this data on a flexible storage medium such as HDFS and cloud object storage such as Amazon’s S3 and Azure’s Blob storage provides a company with great flexibility on when and where they consume this data.

Read More »

Presto Join Enumeration

Welcome back to the series of blog posts (checkout our previous post!) about Presto’s first Cost-Based Optimizer! Today let’s focus on the challenge of choosing the optimal join order. The order by which relations are joined affects performance of a query substantially. Poor join order might introduce unnecessary CPU and network overhead. To overcome that, the Starburst Presto release includes a state-of-the art join enumeration algorithm that will greatly benefit its users. Let’s first do a quick introduction how Presto join enumerator will speed up your common queries and then we will discuss the algorithm in more details.

Read More »

Presto Cost-Based Optimizer rocks the TPC benchmarks!

As mentioned in our previous blog about the Starburst Presto release and its hottest addition – the Cost Based Optimizer for Presto we’re happy to share the results of benchmarks we did for this release (195e) comparing it to the ‘vanilla’ Presto release 195. Now we will continue on the process of getting all those CBO-related changes merged into the ‘vanilla’ Presto repository.

The benchmarks were performed using a standard set of TPC-H and TPC-DS queries. As a side-note, I would like to highlight that, thanks to our team’s contributions throughout the last couple years, Presto supports 100% TPC benchmark queries and executes them unmodified! That is with no prohibited query modifications. You can find the queries in our repository.

Read More »

Starburst Enterprise Distribution of Presto 195e Now Available!

Today, I am pleased to announce the availability of Presto 195e including Presto’s first Cost Based Optimizer! With the new optimizer you should expect to see significant improvements in Presto’s query performance.  Our team, in collaboration with Facebook, spent the last year heads down working on it, so you can understand why we are pretty excited that this day has finally come!  You can read more about Starburst’s state of the art optimizer here.

Read More »

Presto gets EVEN FASTER, with a 10-15x performance boost in upcoming release!

Next week, we will be releasing the Starburst Distribution of Presto 195e. Based on prestodb/presto 0.195, Starburst’s 195e will ship with Presto’s first cost-based optimizer! In our performance testing and in collaboration with customers in our beta program, we are measuring greater than an order of magnitude performance improvement for many analytical queries such as TPC-H and TPC-DS queries.

Read More »

AWS Data Analytics Platform – Starburst Data’s Vision

As more and more companies turn to low-cost object storage to store a majority of their data, providing easy access to this data has become vitally important. The need to transform and load data to other sytems that provide specific features is still a necessity for certain requirements but querying an object store directly is gaining popularity. 

Read More »