Quickstart
A quick start guide for Featureform on AWS EKS.
This quickstart will walk through creating a few simple features, labels, and a training set using Postgres and Redis. We will use a transaction fraud training set.

Step 1: Install Featureform client

Install the Featureform SDK via Pip.
1
pip install featureform
Copied!

Step 2: Deploy EKS

You can follow our Minikube or Kubernetes deployment guide. This will walk through a simple AWS deployment of Featureform with our quick start Helm chart containing Postgres and Redis.
Install the AWS CLI then run the following command to create an EKS cluster.
1
eksctl create cluster \
2
--name featureform \
3
--version 1.21 \
4
--region us-east-1 \
5
--nodegroup-name linux-nodes \
6
--nodes 1 \
7
--nodes-min 1 \
8
--nodes-max 4 \
9
--with-oidc \
10
--managed
Copied!

Step 3: Install Helm charts

We'll be installing three Helm Charts: Featureform, the Quickstart Demo, and Certificate Manager.
First we need to add the Helm repositories.
1
helm repo add featureform https://storage.googleapis.com/featureform-helm/
2
helm repo add jetstack https://charts.jetstack.io
3
helm repo update
Copied!
Now we can install the Helm charts.
1
helm install certmgr jetstack/cert-manager \
2
--set installCRDs=true \
3
--version v1.8.0 \
4
--namespace cert-manager \
5
--create-namespace
6
helm install my-featureform featureform/featureform
7
helm install quickstart featureform/quick-start
Copied!

Step 4: Create TLS certificate

We can a self-signed TLS certificate for connecting directly to the load balancer.
Wait until the load balancer has been created. It can be checked using:
1
kubectl get ingress
Copied!
When the ingresses have a valid address, we can update the deployment to create the public certificate.
1
export FEATUREFORM_HOST=$(kubectl get ingress | grep "grpc-ingress" | awk {'print $4'} | column -t)
2
helm upgrade my-featureform featureform/featureform --set global.hostname=$FEATUREFORM_HOST
3
kubectl rollout restart statefulset/featureform-etcd
Copied!
We can save and export our self-signed certificate.
1
kubectl get secret featureform-ca-secret -o=custom-columns=':.data.tls\.crt'| base64 -d > tls.crt
2
export FEATUREFORM_CERT=$(pwd)/tls.crt
Copied!
The dashboard is now viewable at your ingress address.
1
echo $FEATUREFORM_HOST
Copied!

Step 5: Register providers

The Quickstart helm chart creates a Postgres instance with preloaded data, as well as an empty Redis standalone instance. Now that they are deployed, we can write a config file in Python.
definitions.py
1
import featureform as ff
2
3
redis = ff.register_redis(
4
name = "redis-quickstart",
5
name="redis",
6
host="quickstart-redis", # The internal dns name for redis
7
port=6379,
8
description = "A Redis deployment we created for the Featureform quickstart"
9
)
10
11
postgres = ff.register_postgres(
12
name = "postgres-quickstart",
13
host="quickstart-postgres", # The internal dns name for postgres
14
port="5432",
15
user="postgres",
16
password="password",
17
database="postgres",
18
description = "A Postgres deployment we created for the Featureform quickstart"
19
)
Copied!
Once we create our config file, we can apply it to our Featureform deployment.
1
featureform apply definitions.py
Copied!

Step 4: Define our resources

We will create a user profile for us, and set it as the default owner for all the following resource definitions.
definitions.py
1
ff.register_user("featureformer").make_default_owner()
Copied!
Now we'll register our user fraud dataset in Featureform.
definitions.py
1
transactions = postgres.register_table(
2
name = "transactions",
3
variant = "kaggle",
4
description = "Fraud Dataset From Kaggle",
5
table = "Transactions", # This is the table's name in Postgres
6
)
Copied!
Next, we'll define a SQL transformation on our dataset.
definitions.py
1
@postgres.register_transformation(variant="quickstart")
2
def average_user_transaction():
3
"""the average transaction amount for a user """
4
return "SELECT CustomerID, avg(TransactionAmount) as avg_transaction_amt " \
5
" from {{transactions.kaggle}} GROUP BY user_id"
Copied!
Next, we'll register a passenger entity to associate with a feature and label.
definitions.py
1
user = ff.register_entity("user")
2
# Register a column from our transformation as a feature
3
average_user_transaction.register_resources(
4
entity=user,
5
entity_column="CustomerID",
6
inference_store=redis,
7
features=[
8
{"name": "avg_transactions", "variant": "quickstart", "column": "avg_transaction_amt", "type": "float64"},
9
],
10
)
11
# Register label from the original file
12
transactions.register_resources(
13
entity=passenger,
14
entity_column="CustomerID",
15
labels=[
16
{"name": "fraudulent", "variant": "quickstart", "column": "ISFRAUD", "type": "int"},
17
],
18
)
Copied!
Finally, we'll join together the feature and label intro a training set.
definitions.py
1
ff.register_training_set(
2
"fraud_training", "quickstart",
3
label=("fraudulent", "quickstart"),
4
features=[("avg_transactions", "quickstart")],
5
)
Copied!
Now that our definitions are complete, we can apply it to our Featureform instance.
1
featureform apply definitions.py
Copied!

Step 5: Serve features for training and inference

Once we have our training set and features registered, we can train our model.
1
import featureform as ff
2
3
client = ff.ServingClient()
4
dataset = client.dataset("fraud_training", "quickstart")
5
training_dataset = dataset.repeat(10).shuffle(1000).batch(8)
6
for feature_batch, label_batch in training_dataset:
7
# Train model
Copied!
We can serve features in production once we deploy our trained model as well.
1
import featureform as ff
2
3
client = ff.ServingClient()
4
fpf = client.features([("avg_transactions", "quickstart")], {"CustomerID": "1"})
5
# Run features through model
Copied!