Featureform boasts support for three languages for transformations: SQL, Dataframes, and Python. Python transformations are exclusively available for on-demand transformations and in streaming transformations.

SQL Transformations

Featureform supports SQL transformations on providers like Snowflake, Spark, and Postgres, which natively support SQL. Given our orchestration approach that aligns with your data infrastructure, SQL transformations use the same SQL dialect as your provider.

To register a SQL transformation, use the sql_transformation method provided by an offline store provider. Decorate a Python function that returns a formatted SQL string. By default, the function name is used as the data set’s name, and a variant is automatically generated. The sql_transformation method contains a kwargs named inputs, which is a list of either (name, variant) tuples or Featureform data set objects. The function receives the Dataframe representation of these inputs as args. Both the name and variant can be overridden using kwargs of the same names in sql_transformation. Additionally, the function’s docstring serves as the data set’s description.

Example:

@snowflake_provider.sql_transformation(variant="var", inputs=[sales_data])
def fn(sales_data):
  """This transformation filters data where the value is greater than 10."""
  return "SELECT * from {{sales_data}} WHERE value > 10"

Dataframe Transformations

Featureform also offers support for Dataframe transformations, compatible with providers like Spark and Pandas on K8s that natively support Dataframes. The Dataframe object used is the native Dataframe object of the respective provider.

To register a Dataframe transformation, use the df_transformation method provided by an offline store provider to decorate a Python function that returns a Dataframe. The df_transformation method requires a kwargs named inputs, which is a list of either (name, variant) tuples or Featureform data set objects. The function receives the Dataframe representation of these inputs as args. Similar to SQL transformations, the default name and variant are generated from the function’s name, but these can be customized using kwargs within df_transformation. The function’s docstring serves as the data set’s description.

Example:

@spark_provider.df_transformation(inputs=[("source", "v4")], variant="var")
def fn(src):
  """This transformation selects columns 'a', 'b', and 'c' from the 'source' dataset."""
  return src[["a", "b", "c"]]

Featureform’s transformation API empowers you to build the right features and labels tailored to your machine learning requirements using the syntax and logic you’re used to, all while utilizing the strengths of your existing data infrastructure.