Databricks
The following setup allows Alvin to access your Databricks metadata and query history, without being able to touch the underlying data*.
Last updated
The following setup allows Alvin to access your Databricks metadata and query history, without being able to touch the underlying data*.
Last updated
This setup only supports Databricks with Unity Catalog enabled, for other types of Databricks environment please get in touch with our support. * In order for Alvin to extract and monitor data volumes of tables we ask for additional permission which is specified at step 6, which is an optional step that enables additional features in Alvin.
Under Workspace settings / Identity and access / Service Principals, create a new service principal, may call it databricks_unity_catalog_extractor
.
Get the displayed value for Application Id, for example:
1d62fbf3-2a96-44bd-942b-55f89cd38a77
Make sure the following Entitlements are enabled:
Make sure the service principal created at step 1.1 has Can Use permission under Token Usage.
Example:
You will need the generated <my-token_value>
to complete the connection setup later on.
Run this with an user that has access to GRANT permissions, usually an ADMIN user, giving the following permissions to the service principal you created at step 1.1:
Example of commands, how to list the available system schemas:
Enabling the schemas used by Alvin:
If you have a non production warehouse, you may reuse it for Alvin, but the recommended approach is to create a new one.
Click the 'Permissions' button and give the Alvin service principal 'Can use' permissions.
In order for Alvin to extract the number of rows and bytes on tables, the following permission must be granted on catalog, schema or individual table levels:
Alvin only runs SELECT count
aggregations and DESCRIBE
commands on tables, which can be audited in the Alvin user environment.
If your organization restricts Databricks access to a specific set of IP addresses, Alvin will only access your Databricks through the following IP, add it to your Allowed IP Addresses list: 34.159.141.113
Follow the instructions to give service principal permissions to use access tokens.
Follow the instructions to generate an access token for the service principal. If you want the connection to Databricks to be uninterrupted by the token expiring, set lifetime_seconds
to null
to prevent the token from expiring. Save this access token somewhere safe.
These steps must be executed only once, if they have never been executed before:
Follow the instructions to create a SQL Warehouse for Alvin to use. You will use the Host
, Port
and HTTP path
from the 'Connection details' tab when creating the connection to Databricks in Alvin.
Create a new connection . Make sure the SQL Warehouse is up and running before hitting Test Connection, otherwise it might take a long time to validate the connection.
Databricks Unity Catalog does not provide a such as READ_METADATA
as it had on the .