Kubernetes
Featureform supports Kubernetes as an Offline Store.
Implementation
Since featureform is deployed natively on a Kubernetes cluster, it can leverage its own cluster to compute transformations and training sets. Featureform however does not handle storage in non-local mode, so it is necessary to separately register a file store provider like Azure to store the results of its computation.
Requirements
Transformation Sources
Using Kubernetes as an Offline Store, you can define new transformations via SQL and Pandas DataFrames. Using either these transformations or preexisting tables in your file store, a user can chain transformations and register columns in the resulting tables as new features and labels.
Training Sets and Inference Store Materialization
Any column in a preexisting table or user-created transformation can be registered as a feature or label. These features and labels can be used, as with any other Offline Store, for creating training sets and inference serving.
Configuration
To configure a Kubernetes store as a provider, you merely need to have featureform running in your Kubernetes cluster, and register a compatible file store to store the output of the computation. Featureform automatically downloads and uploads the necessary files to handle all necessary functionality of a native offline store like Postgres or BigQuery.
Mutable Configuration Fields
-
description
-
docker_image
For your file store provider documentation for its mutable fields.
Dataframe Transformations
Using your Kubernetes as a provider, a user can define transformations in SQL like with other offline providers.
In addition, registering a provider via Kubernetes allows you to perform DataFrame transformations using your source tables as inputs.
Custom Compute Images
By default, the Docker image used to run compute for the transformations only has Pandas pre-installed. However, custom images can be built using Featureform’s base image to enable use of other 3rd party libraries.
Building A Custom Image
Custom images can be built and stored in your own repository, as long as the Kubernetes cluster has permission to access that repository.
You can install additional python packages on top of the base Featureform image.
Using A Custom Image (Provider-Wide)
Once you’ve built your custom image and pushed it to your docker repository, you can use it in your Featureform cluster.
To use the custom-built image, you can add it to the Kubernetes Provider registration. This will override the default image for all jobs run with this provider.
To use these libraries in a transformation, you can import them within the transformation definition.
Using A Custom Image (Per-Transformation)
Custom images can also be used per-transformation. The specified image will override the default image or provider-wide image for only the transformation it is specified in.
Custom Resource Requests and Limits
By default, transformation pods will be scheduled without resource requests or limits. This means that the pods will be scheduled on any node that has available resources.
You can specify resource requests and limits for the transformation pods. This will ensure that the pods are scheduled when the appropriate resources are available.