Introducing Starburst Presto for Azure HDInsight! Learn More »

Securing Presto with Apache Ranger

Starburst is excited to announce the general availability of Starburst Presto Enterprise 208e. This is the first Presto release to bring you Apache Ranger and Apache Sentry integration, vastly enhancing the security in any enterprise Presto deployment.

Both Ranger and Sentry provide Role-Based Access Control (RBAC) for services that interact with data in the Hadoop ecosystem as well as data in S3 and other object stores. This allows for Presto to provide fine grained access to your data, including column-level access control. Without Ranger and Sentry, Presto can leverage SQL Standard Access Control via Hive. However, Ranger and Sentry have become the standard for data access control providing a centralized place for managing security policies. Furthermore, Ranger and Sentry provide additional features not possible in SQL Standard Access Control.

Apache Ranger or Apache Sentry

Both Apache Ranger and Apache Sentry are excellent options for providing RBAC in Presto and the choice simply depends on the Hadoop distribution you are using. While Presto does not necessarily require Hadoop, it uses the Hive Metastore to store the metadata.

Apache Ranger comes packaged with Hortonworks Data Platform and Apache Sentry is packaged with Cloudera Enterprise. Therefore it makes most sense to simply use what is packaged with your Hadoop distribution. Apache Ranger is also an excellent option for those not tied to a specific Hadoop distribution and has wide adoption in the public cloud such as with AWS.

In this blog post we’ll go into more detail about the Presto and Apache Ranger integration. In a following blog post, we will dive deeper into Presto and Apache Sentry

Apache Ranger & Presto

Apache Ranger is a centralized platform to manage security policies across a variety of services such as Presto or Hive. It provides a simple and intuitive Web-based console for creating and managing policies used to control access to the data.

The Ranger integration with Presto enforces the access policies defined in Ranger on objects such as databases, tables, and columns. If a user does not have the privilege to query an object, the query will fail, and an error will be returned indicating access denied.

Presto integration with Ranger is an extension of the Presto Hive connector. The plugin pulls in the policies from the central Ranger server and caches them locally. When a SQL query is issued, the Ranger plugin in Presto intercepts the request and evaluates it against the security policies defined in Ranger.

Authentication is handled outside of Ranger (for example using LDAP, AD or Kerberos), and Ranger uses the authenticated user and user groups to associate with the policy definition. Ranger pulls users and groups from operating system or LDAP, or AD.

Walkthrough Example

Configuration

Let’s demonstrate how Starburst Presto and Apache Ranger work together via an example. Before we begin, Presto must be configured to work with Ranger. The basic configuration may look like the following:

connector.name=hive-hadoop2
… usual configuration settings for the Hive connector …hive.security=rangerranger.policy-rest-url=https://ranger-host:6182
ranger.service-name=hiveranger.authentication-type=KERBEROS
ranger.kerberos-principal=presto-server/presto-server-node@EXAMPLE.COM
ranger.kerberos-keytab=/etc/presto/conf/presto-server.keytab
ranger.plugin-policy-ssl-config-file=/etc/hive/conf/ranger-policymgr-ssl.xml

Refer to our documentation to learn more about configuration.

Users and Groups

We have 3 users: alice, bob, and charlie. We also have 2 groups: alice_group and bob_group. User alice belongs to the alice_group and user bob belongs to the bob_group. User charlie does not belong to these groups.

To start, the alice_group has open permissions on all databases, tables, and columns. Whereas bob_group and charlie do not have any permissions.

Presto security Ranger - choosing policy

RBAC

Let’s connect to Presto as the alice user and create a table.

$ presto-cli –catalog hive –schema default –user alice

presto:default> create table foo(x int, y int, z int);
CREATE TABLE

presto:default> insert into foo values (1, 2, 3);
INSERT: 1 row

presto:default> select * from foo;
x | y | z
—+—+—
1 | 2 | 3
(1 row)

 

Now let’s connect to Presto as the bob user and try to query that table.

$ presto-cli –catalog hive –schema default –user bob

presto:default> select x from foo;
Query failed: Access Denied: Cannot select from columns [x] in table or view default.foo

In this example you can see that bob cannot query from the table create by alice. This is correct behavior as bob has not been granted permissions to access that table.

Presto Security with Ranger - column security

To grant bob permissions to access the table, let’s add the bob_group to the same policy that has the alice_group.

Presto Security with Ranger - security group

Once we’ve saved the changes in Ranger, let’s try querying again using the bob user.

$ presto-cli –catalog hive –schema default –user bob

presto:default> select x from foo;
x

1
(1 row)

presto:default> select y from foo;
y

2
(1 row)

Hooray! User bob can now access the table. Next, let’s try to run the query as charlie.

$ presto-cli –catalog hive –schema default –user charlie

presto:default> select x from foo;;
Query failed: Access Denied: Cannot select from columns [x] in table or view default.foo

As you may have expected, charlie cannot access the table. As we did with bob, let’s grant charlie access to query the table. However, this time we will add the user charlie to the policy instead of a group (remember? charlie has no groups, right?).

Presto Security with Ranger - permissions

Once the Ranger policy is saved, charlie can now access the data.

$ presto-cli –catalog hive –schema default –user charlie

presto:default> select x from foo;
x

1
(1 row)

Column Level Permissions

Next we will demonstrate a finer level of access control and restrict data access at the column level. First, remove charlie from the policy that allows access to all databases, tables, and columns. Charlie will no longer be able to access the data in our example table.

Next, we will create a new policy in Ranger. This policy will define an exclusion for querying from the y column of our example table. We will added the charlie user to this new policy.

Presto Security with Ranger - update policy

Now, let’s try to run some queries as charlie where you’ll find Presto enforces the column level permissions defined in Ranger.

$ presto-cli –catalog hive –schema default –user charlie

presto:default> select x from foo;
x

1
(1 row)

presto:default> select y from foo;
Query failed: Access Denied: Cannot select from columns [y] in table or view default.foo

Summary

Starburst Presto Enterprise 208e provides RBAC by leveraging integrations with Apache Ranger and Apache Sentry. In this blog post we primarily focus on the Apache Ranger integration. In the next blog post in this series, we will discuss the Presto and Apache Sentry integration as well as provide some walkthrough examples.

Interested in trying out Starburst Presto Enterprise? Contact us below.

 

Provide the info below if you have questions about Presto Security or would like to try the Ranger or Sentry integrations.