In most scenarios, primary data sets serve as the raw materials, which are then transformed into data sets containing the set of features and labels required for serving and training machine learning models. These transformations can be directly applied to primary data sets or sequenced and executed on other previously transformed data sets. It’s essential to understand that, with the exception of pandas, Featureform itself doesn’t perform the data transformations. Instead, it orchestrates your existing data infrastructure to execute the transformations.
sql_transformation
method provided by an offline store provider. Decorate a Python function that returns a formatted SQL string. By default, the function name is used as the data set’s name, and a variant is automatically generated. The sql_transformation
method contains a kwargs named inputs
, which is a list of either (name, variant)
tuples or Featureform data set objects. The function receives the Dataframe representation of these inputs as args. Both the name and variant can be overridden using kwargs of the same names in sql_transformation
. Additionally, the function’s docstring serves as the data set’s description.
Example:
df_transformation
method provided by an offline store provider to decorate a Python function that returns a Dataframe. The df_transformation
method requires a kwargs named inputs
, which is a list of either (name, variant)
tuples or Featureform data set objects. The function receives the Dataframe representation of these inputs as args. Similar to SQL transformations, the default name and variant are generated from the function’s name, but these can be customized using kwargs within df_transformation
. The function’s docstring serves as the data set’s description.
Example: