In the realm of time-series data, it’s a common scenario for feature values to evolve over time. For instance, in a fraud detection model, you might define a feature like user’s average transaction amount based on a series of transactions from your users. This value will continuously change as new transactions pour in. A typical training set comprises a label (what the model aims to predict) and a set of features. Each row often represents a historical transaction. Therefore, it’s crucial for the feature values in these rows to reflect their state at the time of the associated label. This concept is known as point-in-time correctness, where we need to obtain the historical values of features.
Transaction | User | Fraudulent Charge | Timestamp |
---|---|---|---|
1 | A | False | Jan 3, 2022 |
2 | A | True | Jan 5, 2022 |
User | Avg Purchase Price | Timestamp |
---|---|---|
A | 5 | Jan 2, 2022 |
A | 10 | Jan 4, 2022 |
Avg Purchase Price | Fraudulent Charge |
---|---|
5 | False |
10 | True |