Types of Infrastructure Providers
The two main types of infrastructure providers are the Offline Store and Inference Store. The offline store is where primary data sets are stored and transformed into features, labels, and training sets. The training sets are served directly from the offline store, while features are materialized into an inference store for serving.
An inference store allows feature values to be looked up at inference time. Featureform maintains the current value of each feature per entity in the inference store and provides a Python API for serving.
Choosing an Inference Store
When choosing an inference store provider, consider three variables: price, deployment complexity, and latency. Deployment complexity refers to the cost of expertise needed to host the inference store provider. This can be the cost of internal IT headcount, a vendor, or a cloud platform. Price refers to the actual cost of data and serving in the inference store, typically the lower the latency an inference store has, the higher the cost per GB of storage. In near-real-time situations like a recommender system, a low-latency inference store provider like Redis or Cassandra is the right choice. On the other hand, in a batch use case, using Snowflake may be sufficient and cost-efficient.
The Offline Store provides dataset storage, transformation capabilities, and training set serving. Featureform coordinates transformations on the offline store to have it reach the user’s desired state.
Choosing an Offline Store
The Offline Store performs the majority of the heavy lifting for the feature store. It stores the primary sources and runs all of the transformations. Featureform also uses it to create training sets and generate the data for the inference store. The Offline Store provider you choose should be able to handle your scale of data and support the transformation language you’d like to use, whether it be SQL, Dataframes, or something else.
We’ll begin by specifying our providers in a Python file.
import featureform as ff client = ff.Client(host=host) redis = ff.register_redis( name="redis", description="Example inference store", team="Featureform", host="0.0.0.0", port=6379, password="", db=0, ) postgres = ff.register_postgres( name="postgres_docs", description="Example offline store", team="Featureform", host="0.0.0.0", port="5432", user="postgres", password="password", database="postgres", ) client.apply()
To update a previously registered provider’s configuration, make the necessary changes to
providers.py (for example) and register them again.
The description field can always be updated, but which configuration fields are updatable depends on the provider type.