32dots HEIDELBERG AI

Inference providers

Fast, cheap model access in the cloud — the hosted inference APIs

Where to get model API access when running locally is not enough: hosted inference providers that serve open models over an OpenAI-compatible API. Groq for raw speed, OpenRouter for the widest model choice, and Cerebras for the fastest single-model inference with a 1M-tokens/day free tier.

After this chapter you can
0/3 courses done — a course counts as done once you've finished all its lessons
Feature
Videos tutorials on YouTube▶ 2▶ 3▶ 3
Detailed course hands-on lessons & templatesOpen →Open →Open →
Animated walkthrough watch each lesson play out
Free tier experiment at no cost
Speed (tokens/sec) how fast a long answer appears500–800provider rate1,800–3,000
Model choice how many models to pick from
OpenAI-API compatible swap one base_url, keep your code
Closed models (GPT/Claude) access frontier proprietary models
Real-time / low latency voice, live tools, fast iteration
Best for the one-line verdictfast batchmost modelsfastest single-model

strong · partial · no · scores are qualitative