Skip to content

Firestore Vector Store

The Firestore plugin provides retriever implementations that use Google Cloud Firestore as a vector store.

Installation

pip3 install genkit-plugin-firebase

Prerequisites

  • A Firebase project with Cloud Firestore enabled.
  • The genkit package installed.
  • gcloud CLI for managing credentials and Firestore indexes.

Configuration

To use this plugin, specify it when you initialize Genkit:

from genkit.ai import Genkit
from genkit.plugins.firebase.firestore import FirestoreVectorStore
from google.cloud import firestore

firestore_client = firestore.Client()

ai = Genkit(
    plugins=[
        FirestoreVectorStore(
            name='my_firestore_retriever',
            collection='my_collection',
            vector_field='embedding',
            content_field='text',
            embedder='vertexai/text-embedding-004',
            firestore_client=firestore_client,
        ),
    ]
)

Configuration Options

  • name (str): A unique name for this retriever instance.
  • collection (str): The name of the Firestore collection to query.
  • vector_field (str): The name of the field in the Firestore documents that contains the vector embedding.
  • content_field (str): The name of the field in the Firestore documents that contains the text content.
  • embedder (str): The name of the embedding model to use. Must match a configured embedder in your Genkit project.
  • firestore_client: A firestore client object that will be used for all queries to the vectorstore.

Usage

  1. Create a Firestore Client:

    from google.cloud import firestore
    firestore_client = firestore.Client()
    
  2. Define a Firestore Retriever:

    from genkit.ai import Genkit
    from genkit.plugins.firebase.firestore import FirestoreVectorStore
    
    ai = Genkit(
        plugins=[
            FirestoreVectorStore(
                name='my_firestore_retriever',
                collection='my_collection',
                vector_field='embedding',
                content_field='text',
                embedder='vertexai/text-embedding-004',
                firestore_client=firestore_client,
            ),
        ]
    )
    
  3. Retrieve Documents:

    async def retreive_documents():
        return await ai.retrieve(
            query="What are the main topics?",
            retriever='my_firestore_retriever',
        )
    

Populating the Index

Before you can retrieve documents, you need to populate your Firestore collection with data and their corresponding vector embeddings. Here's how you can do it:

  1. Prepare your Data: Organize your data into documents. Each document should have at least two fields: a text field containing the content you want to retrieve, and an embedding field that holds the vector embedding of the content. You can add any other metadata as well.

  2. Generate Embeddings: Use the same embedding model configured in your FirestoreVectorStore to generate vector embeddings for your text content. The ai.embed() method can be used.

  3. Upload Documents to Firestore: Use the Firestore client to upload the documents with their embeddings to the specified collection.

Here's an example of how to index data:

from genkit.ai import Document
from genkit.types import TextPart

async def index_documents(documents: list[str], collection_name: str):
    """Indexes the documents in Firestore."""
    genkit_documents = [Document(content=[TextPart(text=doc)]) for doc in documents]
    embed_response = await ai.embed(embedder='vertexai/text-embedding-004', documents=genkit_documents)
    embeddings = [emb.embedding for emb in embed_response.embeddings]

    for i, document_text in enumerate(documents):
        doc_id = f'doc-{i + 1}'
        embedding = embeddings[i]

        doc_ref = firestore_client.collection(collection_name).document(doc_id)
        result = doc_ref.set({
            'text': document_text,
            'embedding': embedding,
            'metadata': f'metadata for doc {i + 1}',
        })

# Example Usage
documents = [
    "This is document one.",
    "This is document two.",
    "This is document three.",
]
await index_documents(documents, 'my_collection')

Creating a Firestore Index

To enable vector similarity search you will need to configure the index in your Firestore database. Use the following command

gcloud firestore indexes composite create \
  --project=<FIREBASE-PROJECT>\
  --collection-group=<COLLECTION-NAME> \
  --query-scope=COLLECTION \
  --field-config=vector-config='{"dimension":"<YOUR_DIMENSION_COUNT>","flat": "{}"}',field-path=<VECTOR-FIELD>
  • Replace <FIREBASE-PROJECT> with the name of your Firebase project
  • Replace <COLLECTION-NAME> with the name of your Firestore collection
  • Replace <YOUR_DIMENSION_COUNT> with the correct dimension for your embedding model. Common values are:
    • 768 for vertexai/text-embedding-004
  • Replace <VECTOR-FIELD> with the name of the field containing vector embeddings (e.g. embedding).

API Reference

Bases: genkit.ai.Plugin

Firestore retriever plugin. Args: name: name if the retriever. collection: The name of the Firestore collection to query. vector_field: The name of the field containing the vector embeddings. content_field: The name of the field containing the document content, you wish to return. embedder: The embedder to use with this retriever. embedder_options: Optional configuration to pass to the embedder. distance_measure: The distance measure to use when comparing vectors. Defaults to 'COSINE'. firestore_client: The Firestore database instance from which to query. metadata_fields: Optional list of metadata fields to include.

Source code in plugins/firebase/src/genkit/plugins/firebase/firestore.py
class FirestoreVectorStore(Plugin):
    """Firestore retriever plugin.
    Args:
        name: name if the retriever.
        collection: The name of the Firestore collection to query.
        vector_field: The name of the field containing the vector embeddings.
        content_field: The name of the field containing the document content, you wish to return.
        embedder: The embedder to use with this retriever.
        embedder_options: Optional configuration to pass to the embedder.
        distance_measure: The distance measure to use when comparing vectors. Defaults to 'COSINE'.
        firestore_client: The Firestore database instance from which to query.
        metadata_fields: Optional list of metadata fields to include.
    """

    name = 'firebaseFirestore'

    def __init__(
        self,
        name: str,
        firestore_client: Any,
        collection: str,
        vector_field: str,
        content_field: str | Callable[[DocumentSnapshot], list[dict[str, str]]],
        embedder: str,
        embedder_options: dict[str, Any] | None = None,
        distance_measure: DistanceMeasure = DistanceMeasure.COSINE,
        metadata_fields: list[str] | MetadataTransformFn | None = None,
    ):
        """Initialize the firestore plugin.

        Args:
            params: List of firestore retriever configurations.
        """
        self.name = name
        self.firestore_client = firestore_client
        self.collection = collection
        self.vector_field = vector_field
        self.content_field = content_field
        self.embedder = embedder
        self.embedder_options = embedder_options
        self.distance_measure = distance_measure
        self.metadata_fields = metadata_fields

    def initialize(self, ai: GenkitRegistry) -> None:
        """Initialize firestore plugin.

        Register actions with the registry making them available for use in the Genkit framework.

        Args:
            ai: The registry to register actions with.

        Returns:
            None
        """
        retriever = FirestoreRetriever(
            ai=ai,
            name=self.name,
            firestore_client=self.firestore_client,
            collection=self.collection,
            vector_field=self.vector_field,
            content_field=self.content_field,
            embedder=self.embedder,
            embedder_options=self.embedder_options,
            distance_measure=self.distance_measure,
            metadata_fields=self.metadata_fields,
        )

        return ai.define_retriever(
            name=firestore_action_name(self.name),
            fn=retriever.retrieve,
        )

__init__(name, firestore_client, collection, vector_field, content_field, embedder, embedder_options=None, distance_measure=DistanceMeasure.COSINE, metadata_fields=None)

Initialize the firestore plugin.

Parameters:

Name Type Description Default
params

List of firestore retriever configurations.

required
Source code in plugins/firebase/src/genkit/plugins/firebase/firestore.py
def __init__(
    self,
    name: str,
    firestore_client: Any,
    collection: str,
    vector_field: str,
    content_field: str | Callable[[DocumentSnapshot], list[dict[str, str]]],
    embedder: str,
    embedder_options: dict[str, Any] | None = None,
    distance_measure: DistanceMeasure = DistanceMeasure.COSINE,
    metadata_fields: list[str] | MetadataTransformFn | None = None,
):
    """Initialize the firestore plugin.

    Args:
        params: List of firestore retriever configurations.
    """
    self.name = name
    self.firestore_client = firestore_client
    self.collection = collection
    self.vector_field = vector_field
    self.content_field = content_field
    self.embedder = embedder
    self.embedder_options = embedder_options
    self.distance_measure = distance_measure
    self.metadata_fields = metadata_fields

initialize(ai)

Initialize firestore plugin.

Register actions with the registry making them available for use in the Genkit framework.

Parameters:

Name Type Description Default
ai genkit.ai.GenkitRegistry

The registry to register actions with.

required

Returns:

Type Description
None

None

Source code in plugins/firebase/src/genkit/plugins/firebase/firestore.py
def initialize(self, ai: GenkitRegistry) -> None:
    """Initialize firestore plugin.

    Register actions with the registry making them available for use in the Genkit framework.

    Args:
        ai: The registry to register actions with.

    Returns:
        None
    """
    retriever = FirestoreRetriever(
        ai=ai,
        name=self.name,
        firestore_client=self.firestore_client,
        collection=self.collection,
        vector_field=self.vector_field,
        content_field=self.content_field,
        embedder=self.embedder,
        embedder_options=self.embedder_options,
        distance_measure=self.distance_measure,
        metadata_fields=self.metadata_fields,
    )

    return ai.define_retriever(
        name=firestore_action_name(self.name),
        fn=retriever.retrieve,
    )

plugin_name()

The name of the plugin.

Returns:

Type Description

The name of the plugin.

Source code in packages/genkit/src/genkit/ai/_plugin.py
def plugin_name(self):
    """The name of the plugin.

    Returns:
        The name of the plugin.
    """
    return self.name

resolve_action(ai, kind, name)

Resolves an action by adding it to the provided GenkitRegistry.

Parameters:

Name Type Description Default
ai genkit.ai._registry.GenkitRegistry

The Genkit registry.

required
kind genkit.core.registry.ActionKind

The kind of action to resolve.

required
name str

The name of the action to resolve.

required

Returns:

Type Description
None

None, action resolution is done by side-effect on the registry.

Source code in packages/genkit/src/genkit/ai/_plugin.py
def resolve_action(self, ai: GenkitRegistry, kind: ActionKind, name: str) -> None:
    """Resolves an action by adding it to the provided GenkitRegistry.

    Args:
        ai: The Genkit registry.
        kind: The kind of action to resolve.
        name: The name of the action to resolve.

    Returns:
        None, action resolution is done by side-effect on the registry.
    """
    pass