Experimentation: Define and Interact with Resources

Once your data infrastructure is connected, you can begin the experimentation phase. In this phase, you register primary data sets and define transformations, features, and training sets.

Offline stores offer methods like register_table, register_file, and register_directory for registering initial data sets. Subsequently, you define transformations that build derivative training sets. Transformations involve decorating a method with metadata, specifying where it runs, inputs, and other relevant details.

@spark_provider.df_transformation(inputs=[("source", "v4")], variant="var")
def fn(src):
  """This transformation selects columns 'a', 'b', and 'c' from the 'source' dataset."""
  return src[["a", "b", "c"]]

During experimentation, you may want to visualize and analyze data. Featureform allows any registered data set to be retrieved as a Dataframe using the client.dataframe call.

client.dataframe accepts either a tuple specifying the name and variant or the function object or data set object itself:

df = client.dataframe(fn)

The experimentation phase enables data exploration and feature creation without affecting production systems.

Production: Serve Features and Training Sets

In the production phase, you register features and labels based on data sets defined in the experimentation phase. This enables the creation of training sets and real-time feature serving. Features and labels are associated with entities, which function as primary keys.

import featureform as ff

@ff.entity
class User:
  posts_per_hour = ff.Feature(dataset[["user", "post_freq"]], type=ff.Float32, variant="simple")
  is_bot = ff.Label(dataset[["user", "is_bot"]], variant="simple", type=ff.Bool)

client.apply()

You can use the ff.register_training_set method to create training sets by joining labels and features based on entity keys.

ff.register_training_set("bot_prediction", "v1", label=User.is_bot, features=[User.posts_per_hour])

client.apply()

Features can be served using the client.features or client.training_set methods.

client.training_set("bot_prediction", "v1").dataframe()
client.features([("posts_per_hour", "simple")], entities={"user": userId})

Featureform seamlessly integrates with inference stores and vector databases, enabling efficient real-time feature serving and nearest neighbor lookups when needed. This production phase ensures that your machine learning models use accurate and up-to-date data.

Evaluation: Monitor and Address Drift and Failures

Continuous evaluation is vital for monitoring model performance and identifying issues like concept drift or model failures. Featureform supports proactive monitoring and addressing of concept drift. You can set clear objectives, establish monitoring frequency, and regularly track model performance and data distribution.

When concept drift is detected, revisit feature engineering, adjust model hyperparameters, and retrain the model with the latest data. Featureform’s monitoring and evaluation capabilities help maintain the reliability of your machine learning systems.

Collaboration and Organization: Search, Discover, and Re-Use Features and Data Sets

Collaboration is central to effective feature engineering. Featureform promotes collaboration by allowing data scientists to discover and reuse features and data sets created by others. You can search for existing resources, explore their definitions, and leverage them in your work. Each Featureform resource can be tagged, versioned, and described, and it’s immutable by default, ensuring safe usage without concerns about upstream changes.

This collaborative approach encourages knowledge sharing and accelerates feature engineering efforts. By building on the work of others and sharing your contributions, you can create more robust and powerful machine learning models.

In summary, the Featureform workflow covers experimentation, production, evaluation, and collaboration, providing a comprehensive framework for feature engineering in machine learning. Whether you’re defining features, serving them in real-time, monitoring for concept drift, or collaborating with your team, Featureform streamlines the entire process, making it more efficient and effective.