Skip to content

Embeddings

Arian Amiramjadi edited this page Dec 24, 2025 · 1 revision

Embeddings

Create vector embeddings for semantic search, RAG, and similarity comparison.

Quick Start

// Single text embedding
embedding, err := ai.Embed("Hello world").First()

// Multiple texts at once
embeddings, err := ai.EmbedMany("text1", "text2", "text3").Do()

Embedding Models

// OpenAI models
ai.Embed("text").Model(ai.EmbedTextSmall3)  // Default, 1536 dims
ai.Embed("text").Model(ai.EmbedTextLarge3)  // 3072 dims, higher quality
ai.Embed("text").Model(ai.EmbedTextAda002)  // Legacy

// Google models
ai.Embed("text").Model(ai.EmbedGecko)
ai.Embed("text").Model(ai.EmbedGeckoLatest)

// Ollama (local)
ai.Embed("text").Model(ai.EmbedNomic)
ai.Embed("text").Model(ai.EmbedMxbai)

Dimension Control

Some models support dimension reduction for efficiency:

// Reduce dimensions (faster, less storage)
embedding, err := ai.Embed("text").
    Model(ai.EmbedTextLarge3).
    Dimensions(512).
    First()

Provider-Specific

// Use OpenAI directly
embedding, err := ai.OpenAI().Embed("text").First()

// Use Google
embedding, err := ai.Google().Embed("text").Model(ai.EmbedGecko).First()

Response Metadata

resp, err := ai.Embed("text").DoWithMeta()
if err != nil {
    panic(err)
}

fmt.Printf("Model: %s\n", resp.Model)
fmt.Printf("Dimensions: %d\n", resp.Dimensions)
fmt.Printf("Tokens used: %d\n", resp.TotalTokens)

Similarity Functions

Built-in similarity calculations:

// Cosine similarity (most common, -1 to 1)
score := ai.CosineSimilarity(embedding1, embedding2)

// Dot product
score := ai.DotProduct(embedding1, embedding2)

// Euclidean distance
distance := ai.EuclideanDistance(embedding1, embedding2)

Semantic Search

Find the most similar texts in a corpus:

query := "What is machine learning?"
corpus := []string{
    "Machine learning is a subset of AI",
    "The weather is nice today",
    "Deep learning uses neural networks",
}

results, err := ai.SemanticSearch(query, corpus, 2) // top 2 results
for _, r := range results {
    fmt.Printf("Score: %.3f - %s\n", r.Score, r.Text)
}

Batch Embedding

For large datasets:

texts := []string{
    "doc 1", "doc 2", "doc 3", 
    // ... thousands more
}

// Batch with automatic chunking (100 per batch)
embeddings, err := ai.EmbedBatch(texts, 100)

Use Cases

Document Similarity

func findSimilarDocs(query string, docs []string) {
    results, _ := ai.SemanticSearch(query, docs, 5)
    for _, r := range results {
        fmt.Printf("%.2f: %s\n", r.Score, r.Text[:50])
    }
}

RAG (Retrieval Augmented Generation)

// 1. Embed your knowledge base
knowledge := []string{"fact1", "fact2", "fact3"}
embeddings, _ := ai.EmbedMany(knowledge...).Do()

// 2. On query, find relevant context
query := "user question"
results, _ := ai.SemanticSearch(query, knowledge, 3)

// 3. Use as context for generation
context := ""
for _, r := range results {
    context += r.Text + "\n"
}

answer, _ := ai.Claude().
    System("Answer based on this context:\n" + context).
    Ask(query)

Clustering

// Embed all documents
docs := []string{"doc1", "doc2", "doc3"}
embeddings, _ := ai.EmbedMany(docs...).Do()

// Use embeddings for clustering (e.g., k-means)
// clusters := kmeans(embeddings, k)

Default Model

// Change the default embedding model
ai.DefaultEmbeddingModel = ai.EmbedTextLarge3

Clone this wiki locally