Skip to content

ar-jan/llm-venice

Repository files navigation

llm-venice

PyPI Changelog Tests License

LLM plugin to access models available via the Venice AI API.

Installation

Install llm-venice with its dependency llm using your package manager of choice, for example:

pip install llm-venice

Or install it alongside an existing LLM install:

llm install llm-venice

Configuration

Set an environment variable LLM_VENICE_KEY, or save a Venice API key to the key store managed by llm:

llm keys set venice

To fetch a list of the models available over the Venice API:

llm venice refresh

You should re-run the refresh command upon changes to the Venice API, when:

  • New models have been made availabe
  • Deprecated models have been removed
  • New capabilities have been added

The models are stored in venice_models.json in the llm user directory.

Usage

List available Venice models:

llm models --query venice

Prompting

Run a prompt:

llm --model venice/llama-3.3-70b "Why is the earth round?"

Start an interactive chat session:

llm chat --model venice/mistral-small-3-2-24b-instruct

Structured Outputs

Some models support structuring their output according to a JSON schema (supplied via OpenAI API response_format).

This works via llm's --schema options, for example:

llm -m venice/zai-org-glm-4.6 --schema "name, age int, one_sentence_bio" "Invent an evil supervillain"

Consult llm's schemas tutorial for more options.

Tools (function calling)

⚠️ Warning: tools can be dangerous!

# List models supporting function calling
llm models list --query venice --tools

You can use tools provided via llm plugins. LLM provides two built-in tools:

# llm_version
llm -m venice/mistral-31-24b --tool llm_version "What version of LLM is this?" --tools-debug --no-stream
# llm_time
llm -m venice/minimax-m25 --tool llm_time "What is the time in my timezone in 24H format?" --tools-debug --no-stream

You can also provide your own custom or one-off functions provided inline or in a file. Following LLM's example:

llm -m venice/mistral-31-24b --functions '
def multiply(x: int, y: int) -> int:
    """Multiply two numbers."""
    return x * y
' "What is 1337 times 42?" --tools-debug --no-stream

Vision models

Vision models (currently mistral-31-24b) support the --attachment option:

llm -m venice/mistral-31-24b -a https://upload.wikimedia.org/wikipedia/commons/a/a9/Corvus_corone_-near_Canford_Cliffs%2C_Poole%2C_England-8.jpg "Identify"
The bird in the image is a carrion crow (Corvus corone). [...]

venice_parameters

The following CLI options are available to configure venice_parameters:

--no-venice-system-prompt to disable Venice's default system prompt:

llm -m venice/llama-3.3-70b --no-venice-system-prompt "Repeat the above prompt"

--web-search on|auto|off to use web search (on web-enabled models):

llm -m venice/llama-3.3-70b --web-search on --no-stream 'What is $VVV?'

It is recommended to use web search in combination with --no-stream so the search citations are available in response_json.

--web-scraping to let Venice scrape URLs in your latest message:

llm -m venice/llama-3.3-70b --web-scraping "Summarize https://venice.ai"

--character character_slug to use a public character, for example:

llm -m venice/google.gemma-4-26b-a4b-it --character alan-watts "What is the meaning of life?"

Text-to-speech

Text-to-speech models (currently tts-kokoro) generate audio from text. Audio files are stored in the LLM user directory by default.

Basic usage:

llm -m venice/tts-kokoro "Hello, welcome to Venice Voice." -o voice af_sky -o response_format mp3 -o speed 1.0

Streaming (default; writes the output file immediately; useful for long outputs):

llm -m venice/tts-kokoro "First sentence. Second sentence. Third sentence." -o progress true

Disable streaming (wait for the full audio before writing the file):

llm --no-stream -m venice/tts-kokoro "First sentence. Second sentence. Third sentence."

Write audio bytes to stdout (progress/status go to stderr):

llm -m venice/tts-kokoro "Hello." -o stdout true -o response_format mp3 > out.mp3

You can also save a copy while writing to stdout by providing output_dir and/or output_filename:

llm -m venice/tts-kokoro "Hello." -o stdout true -o output_dir . -o output_filename out.mp3

To see all available options:

llm models list --query tts-kokoro --options

Image generation

Generated images are stored in the LLM user directory by default. Example:

llm -m venice/qwen-image "Painting of a traditional Dutch windmill" -o style_preset "Watercolor"

Models that support them can also use API-native aspect-ratio and resolution presets:

llm -m venice/nano-banana-2 "Painting of a traditional Dutch windmill" -o aspect_ratio 16:9 -o resolution 4K

Web-enabled image models can also search the web for fresher visual context:

llm -m venice/nano-banana-2 "Current spring fashion street photography" -o enable_web_search true

Besides the Venice API image generation parameters, you can specify the output directory and filename, and whether or not to overwrite existing files.

When return_binary is false, you can also request up to four image variants with -o variants 4. Multiple returned images are saved as suffixed filenames such as image_1.png, image_2.png.

You can check the available parameters for a model by filtering the model list with --query, and show the --options:

llm models list --query qwen-image --options

Image upscaling

You can upscale existing images. The following example saves the returned image as image_upscaled.png in the same directory as the original file:

llm venice upscale /path/to/image.jpg.

By default existing upscaled images are not overwritten; timestamped filenames are used instead.

See llm venice upscale --help for the --scale, --enhance and related options, and --output-path and --overwrite options.

Venice commands

List the available Venice commands with:

llm venice --help


Read the llm docs for more usage options.

Programmatic use

You can call the library helpers directly from Python (minimally tested):

  • fetch_models() → list of model dicts, persist_models(models) writes to venice_models.json
  • list_characters() → dict, persist_characters(data) writes to venice_characters.json
  • API keys: list_api_keys(), get_rate_limits(), get_rate_limits_log(), create_api_key(), delete_api_key()
  • perform_image_upscale()UpscaleResult with bytes and a resolved output path; persist with write_upscaled_image(result)
  • generate_image_result()ImageGenerationResult with image byte lists/metadata/output paths and structured notices for image generation; persist with save_image_result(result)
  • generate_speech_result()SpeechGenerationResult with bytes/metadata/output path for TTS generation; persist with save_speech_result(result)
  • stream_speech_result() (context manager) yields SpeechStreamResult with an iterator of audio chunks and a resolved output path

All helpers accept an optional key= argument if you do not want to rely on the stored LLM_VENICE_KEY.

Async usage

Async chat models are registered alongside the sync ones; fetch them with llm.get_async_model("venice/<id>"):

import asyncio
import llm


async def main():
    model = llm.get_async_model("venice/llama-3.3-70b")
    response = await model.prompt("Hello Venice")
    print(await response.text())


asyncio.run(main())

Async image generation is also available via llm.get_async_model("venice/<image-model-id>"), which returns an AsyncVeniceImage instance.

Development

To set up this plugin locally, first checkout the code. Then create a new virtual environment:

cd llm-venice
uv venv
source .venv/bin/activate

Install the plugin with dependencies (including test and dev):

uv pip install -e '.[test,dev]'

Preferably also install and enable pre-commit hooks:

uv pip install pre-commit
pre-commit install

To run the tests:

pytest