cohere_embeddings

beta

Generates vector embeddings to represent input text, using the Cohere API.

# Configuration fields, showing default values
label: ""
cohere_embeddings:
  base_url: https://api.cohere.com
  auth_token: "" # No default (required)
  model: embed-english-v3.0 # No default (required)
  text_mapping: "" # No default (optional)
  input_type: search_document
  dimensions: "" # No default (optional)

This processor sends text strings to your chosen large language model (LLM), which generates vector embeddings for them using the Cohere API. By default, the processor submits the entire payload of each message as a string, unless you use the text_mapping field to customize it.

To learn more about vector embeddings, see the Cohere API documentation.

Examples

Store embedding vectors in Qdrant

Compute embeddings for some generated data and store it within xrefs:component:outputs/qdrant.adoc[Qdrant]

input:
  generate:
    interval: 1s
    mapping: |
      root = {"text": fake("paragraph")}
pipeline:
  processors:
  - cohere_embeddings:
      model: embed-english-v3
      api_key: "${COHERE_API_KEY}"
      text_mapping: "root = this.text"
output:
  qdrant:
    grpc_host: localhost:6334
    collection_name: "example_collection"
    id: "root = uuid_v4()"
    vector_mapping: "root = this"

Fields

api_key

The API key for the Cohere API.

This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see Manage Secrets before adding it to your configuration.

Type: string

base_url

The base URL to use for API requests.

Type: string

dimensions

The number of dimensions (numerical values) in each vector embedding generated by this processor. This parameter only supports embed-v4.0 and newer models.

Type: int

input_type

The type of text input passed to the model.

Type: string

Default: search_document

Option Summary

classification

Used for embeddings passed through a text classifier.

clustering

Used for the embeddings run through a clustering algorithm.

search_document

Used for embeddings stored in a vector database for search use-cases.

search_query

Used for embeddings of search queries run against a vector DB to find relevant documents.

model

The name of the Cohere LLM you want to use.

Type: string

# Examples:
model: embed-english-v3.0
model: embed-english-light-v3.0
model: embed-multilingual-v3.0
model: embed-multilingual-light-v3.0

text_mapping

The text you want to generate a vector embedding for. By default, the processor submits the entire payload as a string.

Type: string