Serving Embeddings for Inference
Using embeddings for inference is relatively simple in the most common case. The key-value and nearest-neighbor interfaces can be used to do whatever you need.
1
def similar_items(item):
2
return items_space.nearest_neighbor(10, key=item)
3
4
def user_recommendations(user):
5
user_vec = user_space.get(user)
6
return items_space.nearest_neighbor(10, vector=user_vec)
Copied!

Batching for increased throughput

The key-value and nearest neighbor interfaces both have support for performing multiple operations together. In use cases where a machine learning model is run on batches of data, this can dramatically improve throughput.
1
def batch_user_recommendations(user_batch):
2
user_vecs = user_space.multiget(user_batch)
3
return items_space.multi_nearest_neighbor(10, vectors=user_vecs)
Copied!

Snapshots for minimal latency

Most Embeddinghub operations require a full round trip from client to server. Using batching, this can be pipelined, and throughput can be increased. However, if your system has very low latency requirements, a local snapshot can be used.
1
user_snapshot = user_space.download_snapshot()
2
Item_snapshot = item_space.download_snapshot()
3
4
def user_recommendations(user):
5
user_vec = user_snapshot.get(user)
6
return item_snapshot.nearest_neighbor(10, vector=user_vec)
Copied!
A snapshot refresh can be triggered either synchronously or asynchronously. This allows for fine-grained control on performance characteristics of the Embeddinghub.
1
user_snapshot.refresh(wait=False)
Copied!
Last modified 1mo ago