Quickstart (Google Cloud)

A quick start guide for Featureform on GCP using Terraform.
This quickstart will walk through creating a few simple features, labels, and a training set using BigQuery and Firestore. We will use a transaction fraud training set.


Step 1: Clone the Featureform Repo
cd featureform/terraform/gcp

Step 2: Create GCP Services

We'll start BigQuery, Firestore, and Google Kubernetes Engine (GKE). (Specific services can be enabled/disabled as needed in
We need to set:
export PROJECT_ID=<your-project-id> # Your GCP Project ID
export DATASET_ID=featureform # The BigQuery Dataset we'll use
export BUCKET_NAME=<your-bucket-name> # A GCP Storage Bucket where we can store test data
export COLLECTION_ID=featureform_collection # A Firestore Collection ID
export FEATUREFORM_HOST=<your-domain-name> # The domain name that you own

Set our CLI to our current project

cd gcp_services
gcloud auth application-default login # Gives Terraform access to GCP
gcloud config set project $PROJECT_ID # Sets our GCP Project
terraform init; \
terraform apply -auto-approve \
-var="project_id=$PROJECT_ID" \
-var="bigquery_dataset_id=$DATASET_ID" \
-var="storage_bucket_name=$BUCKET_NAME" \

Step 3: Configure Kubectl

We need to load the GKE config into our kubeconfig.
gcloud container clusters get-credentials $(terraform output -raw kubernetes_cluster_name) --region $(terraform output -raw region)

Step 4: Install Featureform

We'll use terraform to install Featureform on our GKE cluster.
cd ../featureform
terraform init; terraform apply -auto-approve -var="featureform_hostname=$FEATUREFORM_HOST"

Step 5: Direct Your Domain To Featureform

Featureform automatically provisions a public certificate for your domain name.
To connect, you need to point your domain name at the Featureform GKE Cluster.
We can get the IP Address for the cluster using:
kubectl get ingress | grep "grpc-ingress" | awk {'print $4'} | column -t
You need to add 2 records to your DNS provider for the (sub)domain you intend to use:
  1. 1.
    A CAA record for value: 0 issuewild "". This allows letsencrypt to automatically generate a public certificate
  2. 2.
    An A record with the value of the outputted value from above

Step 6: Load Demo Data

We can load some demo data into BigQuery that we can transform and serve.
# Load sample data into a bucket in the same project
curl | gsutil cp - gs://$BUCKET_NAME/transactions.csv
# Load the bucket data into BigQuery
bq load --autodetect --source_format=CSV $DATASET_ID.Transactions gs://$BUCKET_NAME/transactions.csv

Step 7: Install the Featureform SDK

pip install featureform

Step 8: Register providers

GCP Registered providers require a GCP Credentials file for a user that has permissions for Firestore and BigQuery.
import os
import featureform as ff
project_id = os.getenv("PROJECT_ID")
dataset_id = os.getenv("DATASET_ID")
firestore = ff.register_firestore(
description="A Firestore deployment we created for the Featureform quickstart",
bigquery = ff.register_bigquery(
description="A BigQuery deployment we created for the Featureform quickstart",
Once we create our config file, we can apply it to our Featureform deployment.
featureform apply

Step 9: Define our resources

We will create a user profile for us, and set it as the default owner for all the following resource definitions.
Now we'll register our user fraud dataset in Featureform.
transactions = bigquery.register_table(
description="Fraud Dataset From Kaggle",
table="Transactions", # This is the table's name in BigQuery
Next, we'll define a SQL transformation on our dataset.
def average_user_transaction():
return "SELECT CustomerID as user_id, avg(TransactionAmount) " \
"as avg_transaction_amt from {{transactions.default}} GROUP BY user_id"
Next, we'll register a passenger entity to associate with a feature and label.
# Register a column from our transformation as a feature
user = ff.register_entity("user")
{"name": "avg_transactions", "column": "avg_transaction_amt", "type": "float32"},
# Register label from our base Transactions table
{"name": "fraudulent", "column": "isfraud", "type": "bool"},
Finally, we'll join together the feature and label into a training set.
Now that our definitions are complete, we can apply it to our Featureform instance.
featureform apply

Step 10: Serve features for training and inference

Once we have our training set and features registered, we can train our model.
import featureform as ff
client = ff.ServingClient()
dataset = client.training_set("fraud_training")
training_set = dataset.shuffle(10000)
for batch in training_set:
Example Output:
Features: [279.76] , Label: False
Features: [254.] , Label: False
Features: [1000.] , Label: False
Features: [5036.] , Label: False
Features: [10.] , Label: False
Features: [884.08] , Label: False
Features: [56.] , Label: False
We can serve features in production once we deploy our trained model as well.
import featureform as ff
client = ff.ServingClient()
fpf = client.features(["avg_transactions"], {"user": "C1011381"})
Example Output: