# Featureform ## Docs - [Abstractions](https://docs.featureform.com/abstractions/abstractions.md) - [Data Infrastructure Provider](https://docs.featureform.com/abstractions/data-infrastructure-providers.md): Featureform functions as a Virtual Feature Store, strategically positioned atop your existing data infrastructure. It acts orchestrator, conducting your infrastructure to construct and serve the features you define. Importantly, this approach means your data remains within your infrastructure—it's n… - [Embeddings](https://docs.featureform.com/abstractions/embedding.md): **Embeddings** represent a specialized type of [feature](/abstractions/feature) stored in a [Vector DB](/llms-embeddings-and-vector-databases/llm-workflow-with-featureform). They are primarily used for nearest neighbor lookups. If you'd like to explore a comprehensive explanation of what an embeddin… - [Entities](https://docs.featureform.com/abstractions/entity.md): An **entity** serves as a collection of semantically related features and labels. Users define entities to map to the domain of their specific use cases. For instance, in the context of a ride-hailing service, entities could include customers and drivers, grouping related features and labels associa… - [Features](https://docs.featureform.com/abstractions/feature.md): **Features** represent the core abstraction in Featureform. They serve as inputs to machine learning models, providing context or observations that the model leverages to make inferences. In practice, feature engineering often yields the highest return on investment for data scientists, significantl… - [Labels](https://docs.featureform.com/abstractions/label.md): **Labels** are a core component of a [training set](/abstractions/training-set). A model relies on a set of [features](/abstractions/feature) to make an inference. During the training process, this inference is compared to a label, and the model is adjusted incrementally. - [Primary Data Sets](https://docs.featureform.com/abstractions/primary-data-sets.md): Once you've configured your data infrastructure, you can commence the process of registering your primary data sets. These primary data sets either directly contain your features and labels or serve as the foundation for their creation. - [Training Sets](https://docs.featureform.com/abstractions/training-set.md): Models require training, a process that typically involves feeding in a set of [features](/abstractions/feature) with known [labels](/abstractions/label). During training, the model makes inferences based on these features, and the labels are used to adjust the model's weights. - [Transforming Data Sets](https://docs.featureform.com/abstractions/transforming-data-sets.md): In most scenarios, primary data sets serve as the raw materials, which are then transformed into data sets containing the set of features and labels required for serving and training machine learning models. These transformations can be directly applied to primary data sets or sequenced and executed… - [Dataframe and SQL Transformation Support](https://docs.featureform.com/concepts/dataframe-sql-transformation.md): Data scientists have diverse preferences in tools, with some favoring SQL while others lean towards Dataframes. Featureform transformations also exhibit varying compatibility with each API, often influenced by underlying data infrastructure like Postgres, which may support only one of the two. - [Exploring Resources with Dataframes](https://docs.featureform.com/concepts/dataframes.md): When it comes to working with data for machine learning, dataframes are ubiquitous. Featureform simplifies interaction with its sources and transformations, allowing you to fetch them into local memory as dataframes using the *client.dataframe()* API. - [Open-Source vs. Enterprise Featureform: What Sets Them Apart](https://docs.featureform.com/concepts/enterprise-vs-open-source-feature-store.md): At Featureform, we are deeply committed to delivering an exceptional standalone experience with our open-source product. However, as a business, we adhere to an open-core philosophy. Since the inception of our open-source offering, we've been transparent about the two primary distinctions between th… - [Governance and Access Control: Ensuring Compliance](https://docs.featureform.com/concepts/governance-and-access-control.md): Featureform Enterprise's product include governance, access controls, and audit logs. Many machine learning models, such as fraud detection and recommender systems, rely on features that are projected from personally identifiable information (PII). Consequently, establishing and enforcing robust pol… - [Immutability, Lineage, and Directed Acyclic Graphs (DAGs)](https://docs.featureform.com/concepts/immutability-lineage-and-dags.md): In the Featureform ecosystem, our declarative API establishes a DAG that outlines the relationship between resources. Starting from primary sources, these resources undergo transformations and ultimately evolve into features and training sets. For enhanced visibility and insight, you can readily exp… - [Calculating On-Demand Features at Request Time](https://docs.featureform.com/concepts/on-demand-features-request-time.md): Certain machine learning predictions rely on data available only at the time of the request. For instance, testing a user transaction for fraud might require data that's passed with the request and cannot be preprocessed. While stream processing offers near real-time features, it can lead to race co… - [Achieving Point-in-Time Correctness and Handling Historical Features with Time Series Data](https://docs.featureform.com/concepts/point-in-time-correctness-historical-features-timeseries-data.md): In the realm of time-series data, it's a common scenario for feature values to evolve over time. For instance, in a fraud detection model, you might define a feature like user's average transaction amount based on a series of transactions from your users. This value will continuously change as new t… - [The LLM Workflow with Featureform](https://docs.featureform.com/concepts/rag-vector-db-llms.md): Large Language Models (LLMs) are pre-trained models that take a text prompt as input and generate a response based on the prompt. - [Search and Discover Features and Transformations](https://docs.featureform.com/concepts/search-and-discovery-features.md): Features and the transformations backing them often hold value that transcends individual models and teams. Featureform offers built-in search and discovery capabilities, accessible via various avenues: - [Streaming Data: Real-time Updates](https://docs.featureform.com/concepts/streaming.md): Certain features necessitate continuous updates through a data stream, surpassing the capabilities of scheduled batch processing or triggered executions. *Featureform Enterprise* offers an API tailored for streaming feature values. This not only ensures real-time relevance but also retains historica… - [Setting Custom Tags and Properties](https://docs.featureform.com/concepts/tags-and-properties.md): In Featureform, your ML data resources' metadata is stored comprehensively. Our metadata engine offers adaptability, enabling you to establish personalized tags and properties. These become particularly crucial when utilizing our Governance APIs within Featureform Enterprise. - [Versioning and Variants](https://docs.featureform.com/concepts/versioning-and-variants.md): Managing versioning is crucial for effective ML resource management. Featureform empowers you to implement versioning across your data sources, transformations, features, labels, and training sets. Each of these resources is inherently immutable by default, ensuring you can confidently utilize versi… - [Backup and Restore](https://docs.featureform.com/deployment/backup-and-restore.md): Featureform can be configured to take periodic snapshots of itself that are backed up to your specified cloud storage. In case of an incident, this snapshot can be pulled and reloaded to restore Featureform to a previous state. - [Kubernetes](https://docs.featureform.com/deployment/kubernetes.md): This guide will walk through deploying Featureform on Kubernetes. The Featureform ingress currently supports AWS load balancers. - [AWS](https://docs.featureform.com/deployment/quickstart-aws.md): This quickstart will walk through creating a few simple features, labels, and a training set using Postgres and Redis. We will use a transaction fraud training set. - [Azure](https://docs.featureform.com/deployment/quickstart-azure.md): This quickstart will walk through creating a few simple features, labels, and a training set using Postgres and Redis. We will use a transaction fraud training set. - [Quickstart](https://docs.featureform.com/deployment/quickstart-docker.md): A quick start guide for Featureform with Docker. - [Google Cloud](https://docs.featureform.com/deployment/quickstart-gcp.md): This quickstart will walk through creating a few simple features, labels, and a training set using BigQuery and Firestore. We will use a transaction fraud training set. - [Architecture and Components](https://docs.featureform.com/getting-started/architecture-and-components.md): Featureform's architecture and components are designed to streamline the feature engineering process. It follows a Virtual Feature Store architecture, allowing for pluggable data infrastructure and serving as an overarching application framework for feature definition, management, and serving. Let's… - [Registering Infrastructure Providers](https://docs.featureform.com/getting-started/connecting-your-data-infrastructure.md): Featureform coordinates a set of infrastructure providers to act together as a feature store. This Virtual Feature Store approach allows teams to choose the right infrastructure to meet their needs and interface across them with the same abstraction. Teams can also use multiple infrastructure provid… - [Featureform Workflow](https://docs.featureform.com/getting-started/featureform-workflow.md): The feature engineering process involves three key stages: experimentation, production, and evaluation. Collaboration among data scientists is crucial during these stages, as it often leads to the creation of innovative features and insights. Featureform streamlines the feature engineering workflow,… - [Model to Feature Lineage](https://docs.featureform.com/getting-started/model-to-feature-lineage.md): In the Featureform ecosystem, our declarative API establishes a DAG that outlines the relationship between resources. Starting from primary sources, these resources undergo transformations and ultimately evolve into features and training sets. For enhanced visibility and insight, you can readily exp… - [Calculating On-Demand Features at Request Time](https://docs.featureform.com/getting-started/on-demand-features-request-time.md): Certain machine learning predictions rely on data available only at the time of the request. For instance, testing a user transaction for fraud might require data that's passed with the request and cannot be preprocessed. While stream processing offers near real-time features, it can lead to race co… - [Register and Serve Features, Labels, and Training Sets](https://docs.featureform.com/getting-started/register-and-serve-features-training-sets.md): Once you've created your primary data sets, you can define features, labels, and training sets based on them. - [Transforming Data Sets](https://docs.featureform.com/getting-started/registering-transforming-and-interacting-with-data-sets.md) - [Scheduling Resources](https://docs.featureform.com/getting-started/scheduling-resources.md): Often times we want to keep our Features and Training Sets up to date with the latest data. Featureform offers the ability to schedule and run updates for Transformations, Features, Labels, and Training Sets. - [Exploring the Feature Registry](https://docs.featureform.com/getting-started/search-monitor-discovery-feature-registry-ui-cli.md): Once we have everything registered (e.g. features, training sets, providers), we can see information about them on the Feature Registry. - [Streaming Data: Real-time Updates](https://docs.featureform.com/getting-started/streaming-features.md): Certain features necessitate continuous updates through a data stream, surpassing the capabilities of scheduled batch processing or triggered executions. *Featureform Enterprise* offers an API tailored for streaming feature values. This not only ensures real-time relevance but also retains historica… - [Cassandra](https://docs.featureform.com/inference-online-stores/cassandra.md): Featureform supports [Cassandra](https://cassandra.apache.org/%5F/index.html) as an Inference Store. - [DynamoDB](https://docs.featureform.com/inference-online-stores/dynamodb.md): Featureform supports [DynamoDB](https://aws.amazon.com/dynamodb/) as an Inference Store. - [Firestore](https://docs.featureform.com/inference-online-stores/firestore.md): Featureform supports [Firestore](https://firebase.google.com/docs/firestore) as an Inference Store. - [MongoDB](https://docs.featureform.com/inference-online-stores/mongodb.md): Featureform supports [MongoDB](https://www.mongodb.com/) as an Inference Store. - [Redis](https://docs.featureform.com/inference-online-stores/redis.md): Featureform supports [Redis](https://redis.io/) as an Inference Store. - [What is Featureform?](https://docs.featureform.com/introduction.md) - [Building a Chatbot with OpenAI and a Vector Database](https://docs.featureform.com/llms-embeddings-and-vector-databases/building-a-chatbot-with-openai-and-a-vector-database.md) - [LLM Workflow](https://docs.featureform.com/llms-embeddings-and-vector-databases/llm-workflow-with-featureform.md) - [S3](https://docs.featureform.com/providers/aws-s3.md): Featureform supports [AWS S3](https://aws.amazon.com/s3/) as a [File Store](/providers/object-and-file-stores) - [Azure Blobs](https://docs.featureform.com/providers/azure-blob-store.md): Featureform supports [Azure Blob Store](https://azure.microsoft.com/en-us/products/storage/blobs/) as a [File Store](/providers/object-and-file-stores) - [BigQuery](https://docs.featureform.com/providers/bigquery.md): Featureform supports [BigQuery](https://cloud.google.com/bigquery) as an Offline Store. - [ClickHouse](https://docs.featureform.com/providers/clickhouse.md): Featureform supports [ClickHouse](https://clickhouse.com/) as an Offline Store. - [Extending Featureform with Custom Providers and Requesting New Providers](https://docs.featureform.com/providers/custom-providers.md): Featureform's architecture is built upon a foundation of provider abstractions, which include Offline Stores, Object/File Stores, Inference Stores, and Vector Databases. Each of these providers adheres to a generic interface, allowing Featureform to seamlessly manage various types of infrastructure.… - [Google Cloud (GCS)](https://docs.featureform.com/providers/google-cloud-gcs.md): Featureform supports [Google Cloud Storage (GCS)](https://cloud.google.com/storage) as a [File Store](/providers/object-and-file-stores) - [HDFS](https://docs.featureform.com/providers/hdfs.md) - [Kubernetes](https://docs.featureform.com/providers/kubernetes.md): Featureform supports [Kubernetes](https://kubernetes.io/) as an Offline Store. - [Object and File Stores](https://docs.featureform.com/providers/object-and-file-stores.md): Object and File Stores serve as fundamental components within the Featureform framework, particularly in the context of ETL-based offline stores like Spark and Pandas on K8s. These stores fulfill two primary functions within Featureform: - [Offline Store](https://docs.featureform.com/providers/offline-store.md): The Offline Store provider is a versatile component within Featureform, serving multiple key functions. It plays a central role in running transformations and storing data. Due to Featureform's virtual architecture, you can expect similar performance and cost characteristics from the Offline Store a… - [Overview of Infrastructure Providers](https://docs.featureform.com/providers/overview.md): Featureform is designed around a Virtual Feature Store architecture, which manages metadata and orchestrates various infrastructure providers. This approach allows data scientists to interact with their data using the Featureform framework while ensuring that data continues to be stored and processe… - [Pinecone](https://docs.featureform.com/providers/pinecone.md): Featureform supports [Pinecone](https://pinecone.io/) as a [Vector DB](/providers/vector-db). - [Postgres](https://docs.featureform.com/providers/postgres.md): Featureform supports [Postgres](https://www.postgresql.org/) as an Offline Store. - [Redis](https://docs.featureform.com/providers/redis.md): **The RedisSearch module is required to use Redis as a Vector DB** - [Redshift](https://docs.featureform.com/providers/redshift.md): Featureform supports [Redshift](https://aws.amazon.com/redshift/) as an Offline Store. - [Snowflake](https://docs.featureform.com/providers/snowflake.md): Featureform supports [Snowflake](https://www.snowflake.com/) as an Offline Store. - [Spark](https://docs.featureform.com/providers/spark.md) - [Spark with Databricks](https://docs.featureform.com/providers/spark_databricks.md): Featureform supports [Databricks](https://www.databricks.com) as an Offline Store. - [Spark with EMR](https://docs.featureform.com/providers/spark_emr.md): Featureform supports [Spark on AWS](https://aws.amazon.com/emr/features/spark/) as an Offline Store. - [Vector Database](https://docs.featureform.com/providers/vector-db.md): A Vector Database provider is designed to facilitate nearest neighbor lookups. It shares several similarities with an inference store but is distinguished by its support for the `client.nearest` API. Configuration is typically done when registering an [embedding](../abstractions/embedding.md) associ… - [Weaviate](https://docs.featureform.com/providers/weaviate.md): Featureform supports [Weaviate](https://weaviate.io/) as a [Vector DB](/providers/vector-db). - [System Architecture](https://docs.featureform.com/system-architecture.md) - [Fraud Detection and Featureform](https://docs.featureform.com/use-cases/fraud-detection.md): Fraud detection use-cases showcase several key advantages of the Featureform platform. - [Retrieval Augmented Generation (RAG) Workflow for Chatbots with Featureform](https://docs.featureform.com/use-cases/llm-chatbot-rag.md): The retrieval augmented generation workflow pulls information that’s relevant to the user’s query and feeds it into the LLM via the prompt. That information might be similar documents pulled from a vector database, or features looked up from an inference store. ## Optional - [Community](https://join.slack.com/t/featureform-community/shared_invite/zt-xhqp2m4i-JOCaN1vRN2NDXSVif10aQg) - [Github](https://github.com/featureform) - [Python SDK](https://sdk.featureform.com/)