Inference providers
Fast, cheap model access in the cloud — the hosted inference APIs
Where to get model API access when running locally is not enough: hosted inference providers that serve open models over an OpenAI-compatible API. Groq for raw speed, OpenRouter for the widest model choice, and Cerebras for the fastest single-model inference with a 1M-tokens/day free tier.
After this chapter you can
0/3 courses done — a course counts as done once you've finished all its lessons
| Feature | |||
|---|---|---|---|
| Videos tutorials on YouTube | ▶ 2 | ▶ 3 | ▶ 3 |
| Detailed course hands-on lessons & templates | Open → | Open → | Open → |
| Animated walkthrough watch each lesson play out | ✗ | ✗ | ✗ |
| Free tier experiment at no cost | ✓ | ✓ | ✓ |
| Speed (tokens/sec) how fast a long answer appears | 500–800 | provider rate | 1,800–3,000 |
| Model choice how many models to pick from | ◐ | ✓ | ◐ |
| OpenAI-API compatible swap one base_url, keep your code | ✓ | ✓ | ✓ |
| Closed models (GPT/Claude) access frontier proprietary models | ✗ | ✓ | ✗ |
| Real-time / low latency voice, live tools, fast iteration | ✓ | ◐ | ✓ |
| Best for the one-line verdict | fast batch | most models | fastest single-model |
✓ strong · ◐ partial · ✗ no · scores are qualitative