Transforming Data
Featureform allows users to define transformations to create the tables they need for their features and training sets. Transformation definitions are versioned and stored immutably. They can be run on a schedule, as streaming transformations, or on demand. Featureform runs the transformations on your registered infrastructure providers.
Registering Primary Data Sources
All features and labels originate from a set of initial data sources. These sources can be streams, files, or tables. A feature can simply be a single field of a data source. More commonly, it’s created via a set of transformations from one or multiple sources.
Files
In the case of Offline Stores that work with a file system, the primary data may be a file or a set of files. Even when the Offline Store is a database, the primary data may be files that should be copied into a table in the Offline Store.
Tables
In the case of an Offline Store, the primary data may be a table that already exists. In this case, we can register the table with Featureform.
Streams
Streaming support in Enterprise Featureform. Book a call with us today!
Defining transformations
There are two supported transformation types: SQL and Dataframes. Not all providers support all transformation types. For example, a dataframe transformation cannot currently be run on Snowflake. Each transformation definition also includes a set of metadata like its name, variant, and description.
SQL
SQL transformations are defined by a decorated python function that returns a templated SQL string. The decorator specifies the necessary metadata, and the SQL string has its table names replaced with templated source names and versions.
Dataframes
Dataframe are defined by a decorated python function that takes a set of input dataframes and returns an output dataframe. The decorator specifies the necessary metadata and inputs, then the function outlines the transformation logic.
Chaining transformations
Transformations can be chained together, where one transformation is the input into another transformation. Transformations will wait for any dependent transformations to complete successfully before running.
Checking status
Since some registrations may take longer than other depending on size of dataset and complexity of a query, the Featureform API has the ability to check the status of any resource programatically.