Skip to content

mnfst/awesome-free-llm-apis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

17 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation


Awesome Free LLM APIs

Awesome

LLM APIs with permanent free tiers for text inference.



Contents

Provider APIs

APIs run by the companies that train or fine-tune the models themselves.

  • Cohere ๐Ÿ‡บ๐Ÿ‡ธ - Command A, Command R+, Aya Expanse 32B +9 more. 20 RPM, 1K/mo.
  • Google Gemini ๐Ÿ‡บ๐Ÿ‡ธ - Gemini 2.5 Pro, Flash, Flash-Lite +4 more. 5-15 RPM, 100-1K RPD. 1
  • Mistral AI ๐Ÿ‡ช๐Ÿ‡บ - Mistral Large 3, Small 3.1, Ministral 8B +3 more. 1 req/s, 1B tok/mo.
  • Zhipu AI ๐Ÿ‡จ๐Ÿ‡ณ - GLM-4.7-Flash, GLM-4.5-Flash, GLM-4.6V-Flash. Limits undocumented.

Inference providers

Third-party platforms that host open-weight models from various sources.

  • Cerebras ๐Ÿ‡บ๐Ÿ‡ธ - Llama 3.3 70B, Qwen3 235B, GPT-OSS-120B +3 more. 30 RPM, 14,400 RPD.
  • Cloudflare Workers AI ๐Ÿ‡บ๐Ÿ‡ธ - Llama 3.3 70B, Qwen QwQ 32B +47 more. 10K neurons/day.
  • GitHub Models ๐Ÿ‡บ๐Ÿ‡ธ - GPT-4o, Llama 3.3 70B, DeepSeek-R1 +more. 10-15 RPM, 50-150 RPD.
  • Groq ๐Ÿ‡บ๐Ÿ‡ธ - Llama 3.3 70B, Llama 4 Scout, Kimi K2 +17 more. 30 RPM, 1K RPD (14,400 for Llama 3.1 8B). 2
  • Hugging Face ๐Ÿ‡บ๐Ÿ‡ธ - Llama 3.3 70B, Qwen2.5 72B, Mistral 7B +many more. $0.10/mo in free credits.
  • Kluster AI ๐Ÿ‡บ๐Ÿ‡ธ - DeepSeek-R1, Llama 4 Maverick, Qwen3-235B +2 more. Limits undocumented.
  • LLM7.io ๐Ÿ‡ฌ๐Ÿ‡ง - DeepSeek R1, Flash-Lite, Qwen2.5 Coder +27 more. 30 RPM (120 with token).
  • NVIDIA NIM ๐Ÿ‡บ๐Ÿ‡ธ - Llama 3.3 70B, Mistral Large, Qwen3 235B +more. 40 RPM.
  • Ollama Cloud ๐Ÿ‡บ๐Ÿ‡ธ - DeepSeek-V3.2, Qwen3.5, Kimi-K2.5 +17 more. 1 concurrent model, light usage. 3
  • OpenRouter ๐Ÿ‡บ๐Ÿ‡ธ - DeepSeek R1, Llama 3.3 70B, GPT-OSS-120B +29 more. 20 RPM, 50 RPD (1K with $10+ in purchased credits). 4
  • SiliconFlow ๐Ÿ‡จ๐Ÿ‡ณ - Qwen3-8B, DeepSeek-R1-Distill-Qwen-7B, GLM-4.1V-9B-Thinking +10 more. 1K RPM, 50K TPM.

Contributing

Know a free tier that's missing? Open a PR. Include the provider, endpoint, rate limits (link to their docs), and a few notable models. Trial credits and time-limited promos don't count.

Footnotes

  • RPM -- requests per minute. RPD -- requests per day.
  • "Limits undocumented" means the provider doesn't publish their rate limits.
  • All endpoints are OpenAI SDK-compatible unless noted.
  • Each link points to the provider's API key page.

Footnotes

  1. Free tier not available in the EU, UK, or Switzerland (available regions). โ†ฉ

  2. 14,400 RPD only applies to Llama 3.1 8B Instant. Most other models (Llama 3.3 70B, Llama 4 Scout, Kimi K2, etc.) are limited to 1,000 RPD (rate limits). โ†ฉ

  3. Ollama Cloud measures usage by GPU time, not tokens or requests. Free tier described as "light usage" with session limits resetting every 5 hours and weekly limits every 7 days. Pro (50x more) and Max (250x more) plans available. Not OpenAI SDK-compatible; uses Ollama API. โ†ฉ

  4. Free models default to 50 RPD. A one-time purchase of $10+ in credits unlocks 1,000 RPD for free models. OpenRouter also offers a Free Models Router (openrouter/free) and model fallbacks for chaining models in priority order. โ†ฉ

Releases

No releases published

Packages

 
 
 

Contributors

โšก