Inference Providers Cheat Sheet · Groq · Cerebras

Inference Providers — Cheat Sheet

Groq · Cerebras · OpenRouter · all three use the OpenAI-compatible API — change 3 lines to switch · COS · 32dots.de

Provider quick-reference

Provider	Free tier	Speed	`base_url`	Key	Models	Best for
Groq	~14,400 req/day · ~30 req/min · per-model daily caps	~500–800 tok/s	`https://api.groq.com/openai/v1`	console.groq.com	Llama 3.x, DeepSeek-R1, gpt-oss, Whisper STT	Fast iteration, n8n pipelines, voice-to-text
Cerebras	1,000,000 tokens/day · ~8K context cap · ~30 req/min	~1,800–3,000 tok/s	`https://api.cerebras.ai/v1`	cloud.cerebras.ai	Llama 3.3 70B, Llama 4 Scout, Qwen3-235B	High-volume free usage, fastest big models
OpenRouter	BYO key — pay per model; some models have free tiers	Varies by model	`https://openrouter.ai/api/v1`	openrouter.ai	GPT-4o, Claude, Gemma, Llama, hundreds more	Widest model choice, one key for everything

⇄ The OpenAI-compatible swap — 3 lines to switch provider Change base_url · api_key · model

curl

curl https://api.groq.com/openai/v1/chat/completions \
  -H 'Authorization: Bearer $GROQ_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "llama-3.1-8b-instant",
    "messages": [{"role":"user",
      "content":"Hello"}]
  }'

# Swap to Cerebras: change the URL and model:
# https://api.cerebras.ai/v1/chat/completions
# model: llama-3.3-70b

# Swap to OpenRouter:
# https://openrouter.ai/api/v1/chat/completions
# model: openai/gpt-4o  (or any slug)

Python — openai SDK

from openai import OpenAI

# --- Groq ---
client = OpenAI(
  base_url="https://api.groq.com/openai/v1",
  api_key="YOUR_GROQ_KEY"
)

# --- Cerebras (swap 2 lines) ---
# base_url="https://api.cerebras.ai/v1",
# api_key="YOUR_CEREBRAS_KEY"

# --- OpenRouter (swap 2 lines) ---
# base_url="https://openrouter.ai/api/v1",
# api_key="YOUR_OR_KEY"

r = client.chat.completions.create(
  model="llama-3.1-8b-instant",
  messages=[{"role":"user",
             "content":"Hello"}]
)
print(r.choices[0].message.content)

n8n — HTTP Request node

Method: POST

# Groq
URL: https://api.groq.com/openai/v1
     /chat/completions

# Cerebras (change URL only)
URL: https://api.cerebras.ai/v1
     /chat/completions

# OpenRouter (change URL only)
URL: https://openrouter.ai/api/v1
     /chat/completions

Headers:
  Authorization: Bearer {{ $env.GROQ_KEY }}
  Content-Type: application/json

Body (JSON):
{
  "model": "llama-3.1-8b-instant",
  "messages": [
    {"role":"user","content":"{{ $json.prompt }}"}
  ]
}

The rule: all three providers speak the OpenAI chat completions format exactly. Your SDK code, your error handling, and your response parsing are unchanged — only the URL, key, and model name move. Whisper STT is Groq-only: POST /audio/transcriptions, model whisper-large-v3.

Own-key warning: always use your own free key — never a shared course key. The 32dots project Groq key was suspended after a silent batch job spent €19.71 overnight. Personal free keys are instant to create and isolated to you.