Step 1: Clone the Featureform Repo
cd featureform/terraform/gcp

Step 2: Create GCP Services

We’ll start BigQuery, Firestore, and Google Kubernetes Engine (GKE). (Specific services can be enabled/disabled as needed in

We need to set:

export PROJECT_ID=           # Your GCP Project ID
export DATASET_ID=featureform                 # The BigQuery Dataset we'll use
export BUCKET_NAME=         # A GCP Storage Bucket where we can store test data
export COLLECTION_ID=featureform_collection   # A Firestore Collection ID
export FEATUREFORM_HOST=    # The domain name that you own

Set our CLI to our current project

cd gcp_services
gcloud auth application-default login   # Gives Terraform access to GCP
gcloud config set project $PROJECT_ID   # Sets  our GCP Project
terraform init; \
terraform apply -auto-approve \
-var="project_id=$PROJECT_ID" \
-var="bigquery_dataset_id=$DATASET_ID" \
-var="storage_bucket_name=$BUCKET_NAME" \

Step 3: Configure Kubectl

We need to load the GKE config into our kubeconfig.

gcloud container clusters get-credentials $(terraform output -raw kubernetes_cluster_name) --region $(terraform output -raw region)

Step 4: Install Featureform

We’ll use Terraform to install Featureform on our GKE cluster.

cd ../featureform
terraform init; terraform apply -auto-approve -var="featureform_hostname=$FEATUREFORM_HOST"

Step 5: Direct Your Domain To Featureform

Featureform automatically provisions a public certificate for your domain name.

To connect, you need to point your domain name at the Featureform GKE Cluster.

We can get the IP Address for the cluster using:

kubectl get ingress | grep "grpc-ingress" | awk {'print $4'} | column -t

You need to add 2 records to your DNS provider for the (sub)domain you intend to use:

  1. A CAA record for value: 0 issuewild "". This allows letsencrypt to automatically generate a public certificate

  2. An A record with the value of the outputted value from above

Step 6: Load Demo Data

We can load some demo data into BigQuery that we can transform and serve.

# Load sample data into a bucket in the same project
curl | gsutil cp - gs://$BUCKET_NAME/transactions.csv

# Load the bucket data into BigQuery
bq load --autodetect --source_format=CSV $DATASET_ID.Transactions gs://$BUCKET_NAME/transactions.csv

Step 7: Install the Featureform SDK

pip install featureform

Step 8: Register providers

GCP Registered providers require a GCP Credentials file for a user that has permissions for Firestore and BigQuery.
import os
import featureform as ff

project_id = os.getenv("PROJECT_ID")
dataset_id = os.getenv("DATASET_ID")

firestore = ff.register_firestore(
    description="A Firestore deployment we created for the Featureform quickstart",

bigquery = ff.register_bigquery(
    description="A BigQuery deployment we created for the Featureform quickstart",

Once we create our config file, we can apply it to our Featureform deployment.

featureform apply

Step 9: Define our resources

We will create a user profile for us, and set it as the default owner for all the following resource definitions.

Now we’ll register our user fraud dataset in Featureform.
transactions = bigquery.register_table(
    description="Fraud Dataset From Kaggle",
    table="Transactions", # This is the table's name in BigQuery

Next, we’ll define a SQL transformation on our dataset.
def average_user_transaction():
    return "SELECT CustomerID as user_id, avg(TransactionAmount) " \
           "as avg_transaction_amt from {{transactions.default}} GROUP BY user_id"

Next, we’ll register a passenger entity to associate with a feature and label.
# Register a column from our transformation as a feature
user = ff.register_entity("user")

        {"name": "avg_transactions", "column": "avg_transaction_amt", "type": "float32"},
# Register label from our base Transactions table
        {"name": "fraudulent", "column": "isfraud", "type": "bool"},

Finally, we’ll join together the feature and label into a training set.

Now that our definitions are complete, we can apply it to our Featureform instance.

featureform apply

Step 10: Serve features for training and inference

Once we have our training set and features registered, we can train our model.

import featureform as ff

client = ff.ServingClient()
dataset = client.training_set("fraud_training")
training_set = dataset.shuffle(10000)
for batch in training_set:

Example Output:

Features: [279.76] , Label: False
Features: [254.] , Label: False
Features: [1000.] , Label: False
Features: [5036.] , Label: False
Features: [10.] , Label: False
Features: [884.08] , Label: False
Features: [56.] , Label: False

We can serve features in production once we deploy our trained model as well.

import featureform as ff

client = ff.ServingClient()
fpf = client.features(["avg_transactions"], {"user": "C1011381"})

Example Output: