gcp_vertex_ai_embeddings

beta

Generates vector embeddings to represent a text string, using the Vertex AI API.

# Configuration fields, showing default values
label: ""
gcp_vertex_ai_embeddings:
  project: "" # No default (required)
  credentials_json: "" # No default (optional)
  location: us-central1
  model: text-embedding-004 # No default (required)
  task_type: RETRIEVAL_DOCUMENT
  text: "" # No default (optional)
  output_dimensions: 0 # No default (optional)

This processor sends text strings to the Vertex AI API, which generates vector embeddings for them. By default, the processor submits the entire payload of each message as a string, unless you use the text field to customize it.

For more information, see the Vertex AI documentation.

Fields

credentials_json

Set your Google Service Account Credentials as JSON.

This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see Manage Secrets before adding it to your configuration.

Type: string

location

The location of the Vertex AI large language model (LLM) that you want to use.

Type: string

Default: us-central1

model

The name of the LLM to use. For a full list of models, see the Vertex AI Model Garden.

Type: string

# Examples:
model: text-embedding-004
model: text-multilingual-embedding-002

output_dimensions

The maximum length of a generated vector embedding. If this value is set, generated embeddings are truncated to this size.

Type: int

project

The ID of your Google Cloud project.

Type: string

task_type

Use the following options to optimize embeddings that the model generates for specific use cases.

Type: string

Default: RETRIEVAL_DOCUMENT

Option Summary

CLASSIFICATION

optimize for being able classify texts according to preset labels

CLUSTERING

optimize for clustering texts based on their similarities

FACT_VERIFICATION

optimize for queries that are proving or disproving a fact such as "apples grow underground"

QUESTION_ANSWERING

optimize for search proper questions such as "Why is the sky blue?"

RETRIEVAL_DOCUMENT

optimize for documents that will be searched (also known as a corpus)

RETRIEVAL_QUERY

optimize for queries such as "What is the best fish recipe?" or "best restaurant in Chicago"

SEMANTIC_SIMILARITY

optimize for text similarity

text

The text you want to generate vector embeddings for. By default, the processor submits the entire payload as a string. This field supports interpolation functions.

Type: string