Registering Entities, Features and Labels
Every feature must describe an entity. An entity can be thought of as a primary key table, and every feature must have at least a single foreign key entity field. Common entities include users, items, and purchases. Entities can be anything that a feature can describe.Without Timestamp
With Timestamp
This example is based off of a fraud training set with a CustomerID, TransactionID, Amount, and Transaction Time.Entity Column Type
NOTE: Currently, the data type of a feature’s entity column (e.g."CustomerID"
) must be a string.
Registering Training Sets
Once we have our features and labels registered, we can create a training set. Training set creation works by joining a label with a set of features via their entity value and timestamp. For each row of the label, the entity value is used to look up all of the feature values in the training set. When a timestamp is included in the label and the feature, the training set will contain the latest feature value where the feature’s timestamp is less than the label’s.Point-in-Time Correctness
Training sets are point-in-time correct. To illustrate point-in-time correctness, image that we are creating a training set from the fraud label previewed below:Transaction | User | Fraudulent Charge | Timestamp |
---|---|---|---|
1 | A | False | Jan 3, 2022 |
2 | A | True | Jan 5, 2022 |
User | Avg Purchase Price | Timestamp |
---|---|---|
A | 5 | Jan 2, 2022 |
A | 10 | Jan 4, 2022 |
Avg Purchase Price | Fraudulent Charge |
---|---|
5 | False |
10 | True |