LogoLogo
  • Introduction
    • Overview
    • Why Alvin?
    • Connect your systems
      • Data Warehouse
        • BigQuery
          • Provision source system credentials
          • Provision with GCloud CLI
        • Snowflake
        • Databricks
      • Business Intelligence
        • Looker
      • Orchestration
        • dbt
      • SSO (Single Sign-On)
    • Security & compliance
    • Types of metadata
    • FAQ
  • Cost Monitoring
    • Introduction to Cost Monitoring
    • Compute
    • Storage
  • BI Query Optimizer
    • Introduction to Query Optimizer
    • How does it work?
    • Getting started
  • Workflow automation
    • Introduction to Workflow Automation
    • Events definitions
    • Configuring Workflows
  • Anomaly Detection
    • Anomaly Detection
  • Exploring Metadata
    • Lineage
      • Depth of lineage
    • Impact Analysis
    • Entities
    • Entity View
    • Metadata Warehouse
Powered by GitBook
On this page
  • 1. Generate a Databricks access token
  • 1.1 - Create Service Principal
  • 1.2 - Grant token usage to service principal in workspace
  • 1.3 - Generate an access token for service principal
  • 2. Grant permissions to service principal on each catalog you want Alvin to extract
  • 3. Make sure system tables are enabled
  • 4. Create a Databricks SQL Warehouse for Alvin
  • 5. Add connection to Alvin
  • 6. Additional row counts permissions (Optional)
  • 7. Whitelist Alvin IP (Optional)
  1. Introduction
  2. Connect your systems
  3. Data Warehouse

Databricks

The following setup allows Alvin to access your Databricks metadata and query history, without being able to touch the underlying data*.

PreviousSnowflakeNextBusiness Intelligence

Last updated 9 months ago

This setup only supports Databricks with Unity Catalog enabled, for other types of Databricks environment please get in touch with our support. * In order for Alvin to extract and monitor data volumes of tables we ask for additional permission which is specified at step 6, which is an optional step that enables additional features in Alvin.

1. Generate a Databricks access token

1.1 - Create Service Principal

Under Workspace settings / Identity and access / Service Principals, create a new service principal, may call it databricks_unity_catalog_extractor.

Get the displayed value for Application Id, for example: 1d62fbf3-2a96-44bd-942b-55f89cd38a77

Make sure the following Entitlements are enabled:

1.2 - Grant token usage to service principal in workspace

Make sure the service principal created at step 1.1 has Can Use permission under Token Usage.

1.3 - Generate an access token for service principal

Example:

databricks token-management create-obo-token {service_principal_application_id}

You will need the generated <my-token_value> to complete the connection setup later on.

{
  "token_info": {
    "comment":"",
    "created_by_id":571098782507535,
    "created_by_username":"user@company.com",
    "creation_time":1721859192753,
    "expiry_time":-1,
    "owner_id":6607881346135267,
    "token_id":"<my-token_id>"
  },
  "token_value":"<my-token_value>"
}

2. Grant permissions to service principal on each catalog you want Alvin to extract

Run this with an user that has access to GRANT permissions, usually an ADMIN user, giving the following permissions to the service principal you created at step 1.1:

-- this step must run for each catalog that has unity catalog tables
GRANT USE_CATALOG ON CATALOG prod TO `{service_principal_application_id}`;
GRANT USE_SCHEMA ON CATALOG prod TO `{service_principal_application_id}`;
GRANT BROWSE ON CATALOG prod TO `{service_principal_application_id}`;

-- this step must run once for the system catalog, 
-- only for the system catalog you must also give – 
-- SELECT permission, to extract the system metadata tables, 
-- these tables contain only metadata.
GRANT USE_CATALOG ON CATALOG system TO `{service_principal_application_id}`;
GRANT USE_SCHEMA ON CATALOG system TO `{service_principal_application_id}`;
GRANT SELECT ON CATALOG system TO `{service_principal_application_id}`;

3. Make sure system tables are enabled

Example of commands, how to list the available system schemas:

curl -v -X GET -H "Authorization: Bearer {my_admin_token}" "https://{databricks_host}/api/2.0/unity-catalog/metastores/{metastore_id}/systemschemas"

Enabling the schemas used by Alvin:

# Query history:
curl -v -X PUT -H "Authorization: Bearer {my_admin_token}" "https://{databricks_host}/api/2.0/unity-catalog/metastores/{metastore_id}/systemschemas/query"

# Billing:
curl -v -X PUT -H "Authorization: Bearer {my_admin_token}" "https://{databricks_host}/api/2.0/unity-catalog/metastores/{metastore_id}/systemschemas/billing"

# Information Schema:
curl -v -X PUT -H "Authorization: Bearer {my_admin_token}" "https://{databricks_host}/api/2.0/unity-catalog/metastores/{metastore_id}/systemschemas/information_schema"

4. Create a Databricks SQL Warehouse for Alvin

If you have a non production warehouse, you may reuse it for Alvin, but the recommended approach is to create a new one.

  1. Click the 'Permissions' button and give the Alvin service principal 'Can use' permissions.

5. Add connection to Alvin

6. Additional row counts permissions (Optional)

In order for Alvin to extract the number of rows and bytes on tables, the following permission must be granted on catalog, schema or individual table levels:

-- this step must run for each catalog that has unity catalog tables
GRANT SELECT ON CATALOG prod TO `{service_principal_application_id}`;

Alvin only runs SELECT count aggregations and DESCRIBE commands on tables, which can be audited in the Alvin user environment.

7. Whitelist Alvin IP (Optional)

If your organization restricts Databricks access to a specific set of IP addresses, Alvin will only access your Databricks through the following IP, add it to your Allowed IP Addresses list: 34.159.141.113

Follow the instructions to give service principal permissions to use access tokens.

Follow the instructions to generate an access token for the service principal. If you want the connection to Databricks to be uninterrupted by the token expiring, set lifetime_seconds to null to prevent the token from expiring. Save this access token somewhere safe.

These steps must be executed only once, if they have never been executed before:

Follow the instructions to create a SQL Warehouse for Alvin to use. You will use the Host, Port and HTTP path from the 'Connection details' tab when creating the connection to Databricks in Alvin.

Create a new connection . Make sure the SQL Warehouse is up and running before hitting Test Connection, otherwise it might take a long time to validate the connection.

Databricks Unity Catalog does not provide a such as READ_METADATA as it had on the .

here
here
https://docs.databricks.com/en/admin/system-tables/index.html#enable
here
here
less granular permission
hive_metastore
Service Principal Creation
Token Usage Can Use permission
Connection Settings