Skip to content

Latest commit

 

History

History
225 lines (163 loc) · 7.22 KB

File metadata and controls

225 lines (163 loc) · 7.22 KB

Google Vertex AI

Access Claude, Gemini, and MaaS models through Google Cloud's Vertex AI platform.

Configuration

Vertex AI uses Google Cloud OAuth2 authentication. ReqLLM uses Application Default Credentials (ADC) by default.

Application Default Credentials (Recommended)

For local development:

gcloud auth application-default login
export GOOGLE_CLOUD_PROJECT="your-project-id"
export GOOGLE_CLOUD_REGION="global"

ReqLLM checks ADC sources in this order:

  1. GOOGLE_APPLICATION_CREDENTIALS pointing to an ADC credential file
  2. GOOGLE_APPLICATION_CREDENTIALS_JSON containing ADC credential JSON
  3. The well-known gcloud ADC file, such as ~/.config/gcloud/application_default_credentials.json (honors CLOUDSDK_CONFIG)
  4. The Google Cloud metadata server

Supported ADC credential types include user ADC credentials from gcloud auth application-default login, service account keys, workload identity credential configuration files, and metadata server credentials. Local ADC files of type impersonated_service_account are not supported yet.

Service Account

Environment Variables:

GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"
GOOGLE_CLOUD_PROJECT="your-project-id"
GOOGLE_CLOUD_REGION="global"

Application Config:

config :req_llm, :google_vertex,
  service_account_json: "/path/to/service-account.json",
  project_id: "your-project-id",
  region: "global"

Provider Options:

ReqLLM.generate_text(
  "google_vertex:claude-sonnet-4-5@20250929",
  "Hello",
  provider_options: [
    service_account_json: "/path/to/service-account.json",
    project_id: "your-project-id",
    region: "global"
  ]
)

Model Specs

For the full model-spec workflow, see Model Specs.

Use exact Vertex model IDs from LLMDB.xyz when possible. For MaaS and other OpenAI-compatible Vertex models that are not in the registry yet, build a full explicit model spec with ReqLLM.model!/1. Some MaaS model IDs also need extra.family when the family cannot be inferred from the ID alone.

Provider Options

Passed via :provider_options keyword:

service_account_json

  • Type: String (file path or JSON string) or map
  • Purpose: Explicit Google Cloud service account JSON credentials
  • Fallback: config :req_llm, :google_vertex
  • Example: provider_options: [service_account_json: "/path/to/credentials.json"]
  • Note: For normal ADC usage, prefer GOOGLE_APPLICATION_CREDENTIALS instead of this option.

access_token

  • Type: String
  • Purpose: Use an existing OAuth2 access token generated outside ReqLLM (e.g., via Goth or gcloud)
  • Behavior: Bypasses the service account JSON flow and internal token management
  • Fallback: config :req_llm, :google_vertex
  • Example: provider_options: [access_token: "your-access-token"]

project_id

  • Type: String
  • Purpose: Google Cloud project ID
  • Fallback: config :req_llm, :google_vertex, then GOOGLE_CLOUD_PROJECT env var
  • Example: provider_options: [project_id: "my-project-123"]
  • Required: Yes

region

  • Type: String
  • Default: "global"
  • Purpose: GCP region for Vertex AI endpoint
  • Fallback: config :req_llm, :google_vertex, then GOOGLE_CLOUD_REGION env var
  • Example: provider_options: [region: "us-central1"]
  • Note: Use "global" for newest models, specific regions for regional deployment

additional_model_request_fields

  • Type: Map
  • Purpose: Model-specific request fields (e.g., thinking configuration)
  • Example:
    provider_options: [
      additional_model_request_fields: %{
        thinking: %{type: "enabled", budget_tokens: 4096}
      }
    ]

labels

  • Type: Map of strings to strings
  • Purpose: Custom metadata labels attached to the request. Used by Google Cloud for billing and reporting — labels are filterable in billing reports and BigQuery exports.
  • Constraints: Up to 64 labels per request; keys 1–63 chars starting with a lowercase letter; keys and values may only contain lowercase letters, numbers, underscores, and dashes.
  • Availability: Vertex AI only — the direct Gemini API (generativelanguage.googleapis.com) does not support this field.
  • Example:
    provider_options: [
      labels: %{
        "team" => "engineering",
        "environment" => "production",
        "use_case" => "contract_analysis"
      }
    ]
  • Reference: Custom metadata labels

Claude-Specific Options

Vertex AI supports the same Claude options as native Anthropic:

anthropic_top_k

  • Type: 1..40
  • Purpose: Sample from top K options per token
  • Example: provider_options: [anthropic_top_k: 20]

stop_sequences

  • Type: List of strings
  • Purpose: Custom stop sequences
  • Example: provider_options: [stop_sequences: ["END", "STOP"]]

anthropic_metadata

  • Type: Map
  • Purpose: Request metadata for tracking
  • Example: provider_options: [anthropic_metadata: %{user_id: "123"}]

thinking

  • Type: Map
  • Purpose: Enable extended thinking/reasoning
  • Example: provider_options: [thinking: %{type: "enabled", budget_tokens: 4096}]
  • Access: ReqLLM.Response.thinking(response)

anthropic_prompt_cache

  • Type: Boolean
  • Purpose: Enable prompt caching
  • Example: provider_options: [anthropic_prompt_cache: true]

anthropic_prompt_cache_ttl

  • Type: String (e.g., "1h")
  • Purpose: Cache TTL (default ~5min if omitted)
  • Example: provider_options: [anthropic_prompt_cache_ttl: "1h"]

Supported Models

Claude 4.5 Family

  • Haiku 4.5: google_vertex:claude-haiku-4-5@20251001

    • Fast, cost-effective
    • Full tool calling and reasoning support
  • Sonnet 4.5: google_vertex:claude-sonnet-4-5@20250929

    • Balanced performance and capability
    • Extended thinking support
  • Opus 4.1: google_vertex:claude-opus-4-1@20250805

    • Highest capability
    • Advanced reasoning

Claude 4.0 & Earlier

  • Sonnet 4.0: google_vertex:claude-sonnet-4@20250514
  • Opus 4.0: google_vertex:claude-opus-4@20250514
  • Sonnet 3.7: google_vertex:claude-3-7-sonnet@20250219
  • Sonnet 3.5 v2: google_vertex:claude-3-5-sonnet@20241022
  • Haiku 3.5: google_vertex:claude-3-5-haiku@20241022

Model ID Format

Vertex uses the @ symbol for versioning:

  • Format: claude-{tier}-{version}@{date}
  • Example: claude-sonnet-4-5@20250929

Wire Format Notes

  • Authentication: OAuth2 with service account tokens (auto-refreshed)
  • Endpoint: Model-specific paths under aiplatform.googleapis.com
  • API: Uses Anthropic's raw message format (compatible with native API)
  • Streaming: Standard Server-Sent Events (SSE)
  • Region routing: Global endpoint for newest models, regional for specific deployments

All differences handled automatically by ReqLLM.

Resources