Primary Data Sets

On this page

Tables
Files

It’s crucial to note that registering your primary data sets merely establishes a metadata reference to the data; it does NOT copy the data to Featureform. All data remains within your infrastructure and undergoes transformations there. Featureform takes on the role of an orchestrator. There are three primary types of data sets that you can register: directories, files, and tables.

Tables

Table-based Offline Stores, such as Snowflake, inherently revolve around tables. These providers furnish a method known as .register_table(name, variant, table=""). This method enables you to register transformations based on the primary data set or to register features and labels derived from it.

Files

Offline Stores like Spark interact with file stores like S3 and HDFS. For these providers, there exists a method called .register_file(name, variant="", path=""). Currently, Featureform offers support for CSVs and Parquet files. If your specific use case requires a different file format, please don’t hesitate to raise an issue on Github or engage with our community on Slack. We value your feedback and are eager to explore new possibilities.

Data Infrastructure Provider Transforming Data Sets

Overview

Using Featureform

Featureform Resource Types

LLMs, Embeddings, and Vector Databases

Common Use Cases and Examples

Supported Infrastructure Providers

Deployment

Tables

Files

Overview

Using Featureform

Featureform Resource Types

LLMs, Embeddings, and Vector Databases

Common Use Cases and Examples

Supported Infrastructure Providers

Deployment

​Tables

​Files

Tables

Files