LLM plugin to access models available via the Venice AI API.
Install llm-venice with its dependency llm using your package manager of choice, for example:
pip install llm-venice
Or install it alongside an existing LLM install:
llm install llm-venice
Set an environment variable LLM_VENICE_KEY, or save a Venice API key to the key store managed by llm:
llm keys set venice
To fetch a list of the models available over the Venice API:
llm venice refresh
You should re-run the refresh command upon changes to the Venice API, when:
- New models have been made availabe
- Deprecated models have been removed
- New capabilities have been added
The models are stored in venice_models.json in the llm user directory.
List available Venice models:
llm models --query venice
Run a prompt:
llm --model venice/llama-3.3-70b "Why is the earth round?"
Start an interactive chat session:
llm chat --model venice/mistral-small-3-2-24b-instruct
Some models support structuring their output according to a JSON schema (supplied via OpenAI API response_format).
This works via llm's --schema options, for example:
llm -m venice/zai-org-glm-4.6 --schema "name, age int, one_sentence_bio" "Invent an evil supervillain"
Consult llm's schemas tutorial for more options.
# List models supporting function calling
llm models list --query venice --toolsYou can use tools provided via llm plugins. LLM provides two built-in tools:
# llm_version
llm -m venice/mistral-31-24b --tool llm_version "What version of LLM is this?" --tools-debug --no-stream
# llm_time
llm -m venice/minimax-m25 --tool llm_time "What is the time in my timezone in 24H format?" --tools-debug --no-streamYou can also provide your own custom or one-off functions provided inline or in a file. Following LLM's example:
llm -m venice/mistral-31-24b --functions '
def multiply(x: int, y: int) -> int:
"""Multiply two numbers."""
return x * y
' "What is 1337 times 42?" --tools-debug --no-streamVision models (currently mistral-31-24b) support the --attachment option:
llm -m venice/mistral-31-24b -a https://upload.wikimedia.org/wikipedia/commons/a/a9/Corvus_corone_-near_Canford_Cliffs%2C_Poole%2C_England-8.jpg "Identify"
The bird in the image is a carrion crow (Corvus corone). [...]
The following CLI options are available to configure venice_parameters:
--no-venice-system-prompt to disable Venice's default system prompt:
llm -m venice/llama-3.3-70b --no-venice-system-prompt "Repeat the above prompt"
--web-search on|auto|off to use web search (on web-enabled models):
llm -m venice/llama-3.3-70b --web-search on --no-stream 'What is $VVV?'
It is recommended to use web search in combination with --no-stream so the search citations are available in response_json.
--web-scraping to let Venice scrape URLs in your latest message:
llm -m venice/llama-3.3-70b --web-scraping "Summarize https://venice.ai"
--character character_slug to use a public character, for example:
llm -m venice/google.gemma-4-26b-a4b-it --character alan-watts "What is the meaning of life?"
Text-to-speech models (currently tts-kokoro) generate audio from text. Audio files are stored in the LLM user directory by default.
Basic usage:
llm -m venice/tts-kokoro "Hello, welcome to Venice Voice." -o voice af_sky -o response_format mp3 -o speed 1.0
Streaming (default; writes the output file immediately; useful for long outputs):
llm -m venice/tts-kokoro "First sentence. Second sentence. Third sentence." -o progress true
Disable streaming (wait for the full audio before writing the file):
llm --no-stream -m venice/tts-kokoro "First sentence. Second sentence. Third sentence."
Write audio bytes to stdout (progress/status go to stderr):
llm -m venice/tts-kokoro "Hello." -o stdout true -o response_format mp3 > out.mp3
You can also save a copy while writing to stdout by providing output_dir and/or output_filename:
llm -m venice/tts-kokoro "Hello." -o stdout true -o output_dir . -o output_filename out.mp3
To see all available options:
llm models list --query tts-kokoro --options
Generated images are stored in the LLM user directory by default. Example:
llm -m venice/qwen-image "Painting of a traditional Dutch windmill" -o style_preset "Watercolor"
Models that support them can also use API-native aspect-ratio and resolution presets:
llm -m venice/nano-banana-2 "Painting of a traditional Dutch windmill" -o aspect_ratio 16:9 -o resolution 4K
Web-enabled image models can also search the web for fresher visual context:
llm -m venice/nano-banana-2 "Current spring fashion street photography" -o enable_web_search true
Besides the Venice API image generation parameters, you can specify the output directory and filename, and whether or not to overwrite existing files.
When return_binary is false, you can also request up to four image variants with -o variants 4. Multiple returned images are saved as suffixed filenames such as image_1.png, image_2.png.
You can check the available parameters for a model by filtering the model list with --query, and show the --options:
llm models list --query qwen-image --options
You can upscale existing images.
The following example saves the returned image as image_upscaled.png in the same directory as the original file:
llm venice upscale /path/to/image.jpg.
By default existing upscaled images are not overwritten; timestamped filenames are used instead.
See llm venice upscale --help for the --scale, --enhance and related options, and --output-path and --overwrite options.
List the available Venice commands with:
llm venice --help
Read the llm docs for more usage options.
You can call the library helpers directly from Python (minimally tested):
fetch_models()→ list of model dicts,persist_models(models)writes tovenice_models.jsonlist_characters()→ dict,persist_characters(data)writes tovenice_characters.json- API keys:
list_api_keys(),get_rate_limits(),get_rate_limits_log(),create_api_key(),delete_api_key() perform_image_upscale()→UpscaleResultwith bytes and a resolved output path; persist withwrite_upscaled_image(result)generate_image_result()→ImageGenerationResultwith image byte lists/metadata/output paths and structurednoticesfor image generation; persist withsave_image_result(result)generate_speech_result()→SpeechGenerationResultwith bytes/metadata/output path for TTS generation; persist withsave_speech_result(result)stream_speech_result()(context manager) yieldsSpeechStreamResultwith an iterator of audio chunks and a resolved output path
All helpers accept an optional key= argument if you do not want to rely on the stored LLM_VENICE_KEY.
Async chat models are registered alongside the sync ones; fetch them with llm.get_async_model("venice/<id>"):
import asyncio
import llm
async def main():
model = llm.get_async_model("venice/llama-3.3-70b")
response = await model.prompt("Hello Venice")
print(await response.text())
asyncio.run(main())Async image generation is also available via llm.get_async_model("venice/<image-model-id>"), which returns an AsyncVeniceImage instance.
To set up this plugin locally, first checkout the code. Then create a new virtual environment:
cd llm-venice
uv venv
source .venv/bin/activateInstall the plugin with dependencies (including test and dev):
uv pip install -e '.[test,dev]'Preferably also install and enable pre-commit hooks:
uv pip install pre-commit
pre-commit installTo run the tests:
pytest