Documentation Index
Fetch the complete documentation index at: https://docs.featureform.com/llms.txt
Use this file to discover all available pages before exploring further.
To illustrate point-in-time correctness, consider the creation of a training set using the following fraud label data:
| Transaction | User | Fraudulent Charge | Timestamp |
|---|
| 1 | A | False | Jan 3, 2022 |
| 2 | A | True | Jan 5, 2022 |
Suppose we have a feature denoting the user’s average purchase price:
| User | Avg Purchase Price | Timestamp |
|---|
| A | 5 | Jan 2, 2022 |
| A | 10 | Jan 4, 2022 |
The resulting training set should appear as follows:
| Avg Purchase Price | Fraudulent Charge |
|---|
| 5 | False |
| 10 | True |
Notice how the first row’s feature value is 5, reflecting its state on Jan 3rd, 2022—the time of the first label. Conversely, the second row’s feature value is 10, corresponding to the situation on Jan 5th, 2022—the time of the second label. This adherence to historical feature values ensures point-in-time correctness.
In Featureform, when defining features and labels, you have the option to include a timestamp column. By doing so, when creating the training set, Featureform will automatically align labels with features at the specified points in time.
Here’s an example of defining features and labels with timestamps in Featureform:
@ff.entity
class User:
avg_transactions = ff.Feature(
average_user_transaction[["CustomerID", "TransactionAmount"]],
variant="quickstart",
type=ff.Float32,
inference_store=redis,
timestamp_column="timestamp",
)
fraudulent = ff.Label(
transactions[["CustomerID", "IsFraud"]],
variant="quickstart",
type=ff.Bool,
timestamp_column="timestamp",
)
ff.register_training_set(
"fraud_training",
"quickstart",
label=("fraudulent", "quickstart"),
features=[("avg_transactions", "quickstart")],
)
client.apply()
# The training set's feature values will be point-in-time correct.
ts = client.training_set("fraud_training", "quickstart").dataframe()
Additionally, it’s worth noting that when working with the streaming data API in Featureform Enterprise, timestamps are tracked, and you have the capability to backfill data for point-in-time correct training sets as well. This comprehensive approach ensures the integrity of historical features in your machine learning workflows.