Announcing Starburst Datanova: Register today

Reference Architectures

General Architecture

Presto is a distributed system that runs on one or more machines to form a cluster. An installation will include one Presto Coordinator and any number of Presto Workers. The Presto Coordinator is the machine to which users submit their queries. The Coordinator is responsible for parsing, planning, and scheduling query execution across the Presto Workers. Adding more Presto Workers allows for more parallelism and faster query processing.

Presto on AWS

Deploy Presto on AWS EC2 instances using the Starburst Marketplace offering. Easily configure the Presto cluster to query from an existing Hadoop cluster, EMR, S3 data, or any other data source the Presto cluster can access.

Presto on Azure

Deploy Presto as an HDInsights Application to access data in Azure Blob Storage, Azure Data Lake Storage and other data sources Presto can access such as Microsoft’s SQLServer.

Presto on GCP

Deploy Presto directly from the Google Cloud Marketplace with Starburst Enterprise. You can also deploy by using the kubectl tool and a YAML file describing the configuration to deploy Presto on GKE.

Starburst on GCP Architecture

Presto On Premises

Deploy Presto on premises co-located on your Hadoop cluster or its own standalone cluster. If Presto is deployed co-located on the Hadoop cluster, it must be the only compute engine running. For example, Spark and Presto complement each other in the data pipeline, but should not be run at the same time. Additionally connect Presto to your on premises object store such as Minio, Ceph, Cloudian, or OpenIO.