Featureform Workflow
The feature engineering process involves three key stages: experimentation, production, and evaluation. Collaboration among data scientists is crucial during these stages, as it often leads to the creation of innovative features and insights. Featureform streamlines the feature engineering workflow, facilitating collaboration and enhancing the efficiency of the entire process.
Experimentation: Define and Interact with Resources
Once your data infrastructure is connected, you can begin the experimentation phase. In this phase, you register primary data sets and define transformations, features, and training sets.
Offline stores offer methods like register_table
, register_file
, and register_directory
for registering initial data sets. Subsequently, you define transformations that build derivative training sets. Transformations involve decorating a method with metadata, specifying where it runs, inputs, and other relevant details.
@spark_provider.df_transformation(inputs=[("source", "v4")], variant="var")
def fn(src):
"""This transformation selects columns 'a', 'b', and 'c' from the 'source' dataset."""
return src[["a", "b", "c"]]
During experimentation, you may want to visualize and analyze data. Featureform allows any registered data set to be retrieved as a Dataframe using the client.dataframe
call.
client.dataframe
accepts either a tuple specifying the name and variant or the function object or data set object itself:
df = client.dataframe(fn)
The experimentation phase enables data exploration and feature creation without affecting production systems.
Production: Serve Features and Training Sets
In the production phase, you register features and labels based on data sets defined in the experimentation phase. This enables the creation of training sets and real-time feature serving. Features and labels are associated with entities, which function as primary keys.
import featureform as ff
@ff.entity
class User:
posts_per_hour = ff.Feature(dataset[["user", "post_freq"]], type=ff.Float32, variant="simple")
is_bot = ff.Label(dataset[["user", "is_bot"]], variant="simple", type=ff.Bool)
client.apply()
You can use the ff.register_training_set
method to create training sets by joining labels and features based on entity keys.
ff.register_training_set("bot_prediction", "v1", label=User.is_bot, features=[User.posts_per_hour])
client.apply()
Features can be served using the client.features
or client.training_set
methods.
client.training_set("bot_prediction", "v1").dataframe()
client.features([("posts_per_hour", "simple")], entities={"user": userId})
Featureform seamlessly integrates with inference stores and vector databases, enabling efficient real-time feature serving and nearest neighbor lookups when needed. This production phase ensures that your machine learning models use accurate and up-to-date data.
Evaluation: Monitor and Address Drift and Failures
Continuous evaluation is vital for monitoring model performance and identifying issues like concept drift or model failures. Featureform supports proactive monitoring and addressing of concept drift. You can set clear objectives, establish monitoring frequency, and regularly track model performance and data distribution.
When concept drift is detected, revisit feature engineering, adjust model hyperparameters, and retrain the model with the latest data. Featureform’s monitoring and evaluation capabilities help maintain the reliability of your machine learning systems.
Collaboration and Organization: Search, Discover, and Re-Use Features and Data Sets
Collaboration is central to effective feature engineering. Featureform promotes collaboration by allowing data scientists to discover and reuse features and data sets created by others. You can search for existing resources, explore their definitions, and leverage them in your work. Each Featureform resource can be tagged, versioned, and described, and it’s immutable by default, ensuring safe usage without concerns about upstream changes.
This collaborative approach encourages knowledge sharing and accelerates feature engineering efforts. By building on the work of others and sharing your contributions, you can create more robust and powerful machine learning models.
In summary, the Featureform workflow covers experimentation, production, evaluation, and collaboration, providing a comprehensive framework for feature engineering in machine learning. Whether you’re defining features, serving them in real-time, monitoring for concept drift, or collaborating with your team, Featureform streamlines the entire process, making it more efficient and effective.