Implementation
With Databricks, you can leverage your Databricks cluster to compute transformations and training sets. Featureform however does not handle storage in non-local mode, so it is necessary to separately register a file store provider like Azure to store the results of its computation.Requirements
- Databricks Cluster
- Remote file storage (eg. Azure Blob Storage)
Required Azure Configurations
Azure Blob Storage
If you encounter the following error, or one similar to it, when registering a transformation, you may need to disable certain default container configurations:Home > Storage Accounts > YOUR STORAGE ACCOUNT
, and select “Data Protection” under the “Data Management” section. Then uncheck:
- Enable soft delete for blobs
- Enable soft delete for containers
Databricks
If you encounter the following error, or one similar to it, when registering a transformation, you may need to add credentials for your Azure Blob Storage account to your Databricks cluster:- Launch your Databricks workspace from the Azure portal (e.g. by clicking “Launch Workspace” from the “Overview” page of your Databricks workspace)
- Select “Compute” from the left-hand menu and click on your cluster
- Click “Edit” and then select “Advanced Options” tab to show the “Spark Config” text input field
- Add the following configuration to the “Spark Config” text input field:
Transformation Sources
Using Databricks as an Offline Store, you can define new transformations via SQL and Spark DataFrames. Using either these transformations or preexisting tables in your file store, a user can chain transformations and register columns in the resulting tables as new features and labels.Training Sets and Inference Store Materialization
Any column in a preexisting table or user-created transformation can be registered as a feature or label. These features and labels can be used, as with any other Offline Store, for creating training sets and inference serving.Configuration
To configure a Databricks store as a provider, you need to have a Databricks cluster. Featureform automatically downloads and uploads the necessary files to handle all necessary functionality of a native offline store like Postgres or BigQuery.databricks_definition.py
Mutable Configuration Fields
-
description
-
username
(Executor) -
password
(Executor) -
token
(Executor) -
account_key
(File Store)