Quickstart
This Quickstart will make use of Featureform’s Local Mode to get you up and running quickly. Local mode requires nothing to be deployed; however, it does not currently allow you to connect to and interact with most of your external data infrastructure.
You can follow the instructions below to install Featureform locally and try out the dashboard.
You can also try local mode in this example 📔 Google Colab notebook 📔 here.
Step 1: Install Featureform
Requirements
- Python 3.7+
Install the Featureform SDK via Pip.
pip install featureform
Step 2: Download test data
For this quickstart, we’ll use a fraudulent transaction dataset that can be found here: https://featureform-demo-files.s3.amazonaws.com/transactions.csv
The data contains 9 columns, almost all of would require some feature engineering before using in a typical model.
TransactionID,CustomerID,CustomerDOB,CustLocation,CustAccountBalance,TransactionAmount (INR),Timestamp,IsFraud
T1,C5841053,10/1/94,JAMSHEDPUR,17819.05,25,2022-04-09 11:33:09,False
T2,C2142763,4/4/57,JHAJJAR,2270.69,27999,2022-03-27 01:04:21,False
T3,C4417068,26/11/96,MUMBAI,17874.44,459,2022-04-07 00:48:14,False
T4,C5342380,14/9/73,MUMBAI,866503.21,2060,2022-04-14 07:56:59,True
T5,C9031234,24/3/88,NAVI MUMBAI,6714.43,1762.5,2022-04-13 07:39:19,False
T6,C1536588,8/10/72,ITANAGAR,53609.2,676,2022-03-26 17:02:51,True
T7,C7126560,26/1/92,MUMBAI,973.46,566,2022-03-29 08:00:09,True
T8,C1220223,27/1/82,MUMBAI,95075.54,148,2022-04-12 07:01:02,True
T9,C8536061,19/4/88,GURGAON,14906.96,833,2022-04-10 20:43:10,True
Step 3: Register files
We can write a config file in Python that registers our test data file.
import featureform as ff
from featureform import local
# This is where you would typically register your infrastructure providers.
client = ff.Client(local=True)
transactions = local.register_file(
name="transactions",
variant="quickstart",
description="A dataset of fraudulent transactions",
path="transactions.csv"
)
Next, we’ll define a Dataframe transformation on our dataset.
@local.df_transformation(variant="quickstart",
inputs=[("transactions", "quickstart")])
def average_user_transaction(transactions):
"""the average transaction amount for a user """
return transactions.groupby("CustomerID")["TransactionAmount"].mean()
Next, we’ll register a user entity to associate with a feature and label.
@ff.entity
class User:
avg_transactions = ff.Feature(
average_user_transaction[["CustomerID", "TransactionAmount"]], # We can optional include the `timestamp_column` "Timestamp" here
variant="quickstart",
type=ff.Float32,
# We can switch this out for an inference store like Redis in production.
inference_store=local,
)
fraudulent = ff.Label(
transactions[["CustomerID", "IsFraud"]], variant="quickstart", type=ff.Bool
)
The ff.entity
decorator will use the lowercased class name as the entity name. The class attributes avg_transactions
and fraudulent
will be registered as a feature and label, respectively, associated with the user
entity. Indexing into the sources (e.g. average_user_transaction
) with a [["<ENTITY COLUMN>", "<FEATURE/LABEL COLUMN>"]]
, returns the required parameters to the Feature
and Label
registration classes.
When registering more than one variant, we can use the Variants
registration class:
@ff.entity
class User:
avg_transactions = ff.Variants(
{
"quickstart": ff.Feature(
average_user_transaction[["CustomerID", "TransactionAmount"]],
type=ff.Float32,
inference_store=local,
),
"quickstart_v2": ff.Feature(
average_user_transaction[["CustomerID", "TransactionAmount"]],
type=ff.Float32,
inference_store=local,
),
}
)
fraudulent = ff.Label(
transactions[["CustomerID", "IsFraud"]], variant="quickstart", type=ff.Bool
)
Finally, we’ll join together the feature and label into a training set.
ff.register_training_set(
"fraud_training", "quickstart",
label=("fraudulent", "quickstart"),
features=[("avg_transactions", "quickstart")],
)
Now that our definitions are complete, we can apply it to our Featureform instance.
featureform apply definitions.py --local
Step 4: Serve features for training and inference
Once we have our training set and features registered, we can train our model.
import featureform as ff
client = ff.Client(local=True)
train = client.training_set("fraud_training", "quickstart").dataframe()
We can serve features in production once we deploy our trained model as well.
import featureform as ff
client = ff.Client(local=True)
fpf = client.features([("avg_transactions", "quickstart")], {"user": "C1010012"})
# Run features through model