Inference providers

Fast, cheap model access in the cloud — the hosted inference APIs

Where to get model API access when running locally is not enough: hosted inference providers that serve open models over an OpenAI-compatible API. Groq for raw speed, OpenRouter for the widest model choice, and Cerebras for the fastest single-model inference with a 1M-tokens/day free tier.

After this chapter you can

0/3 courses done — a course counts as done once you've finished all its lessons

Feature
Videos tutorials on YouTube	▶ 2	▶ 3	▶ 3
Detailed course hands-on lessons & templates	Open →	Open →	Open →
Animated walkthrough watch each lesson play out	✗	✗	✗
Free tier experiment at no cost	✓	✓	✓
Speed (tokens/sec) how fast a long answer appears	500–800	provider rate	1,800–3,000
Model choice how many models to pick from	◐	✓	◐
OpenAI-API compatible swap one base_url, keep your code	✓	✓	✓
Closed models (GPT/Claude) access frontier proprietary models	✗	✓	✗
Real-time / low latency voice, live tools, fast iteration	✓	◐	✓
Best for the one-line verdict	fast batch	most models	fastest single-model

✓ strong · ◐ partial · ✗ no · scores are qualitative