Session 2
The 8K-context gotcha & when to use Cerebras
Know the free-tier context limit before it bites you
Cerebras free tier caps context at 8,192 tokens — enough for short prompts and short documents, but a hard wall for long papers, large RAG chunks, or llama-4-scout's advertised extended context window.
Log into n8n.32dots.de with the email and password you received when you signed up. Will be live on session day
- 1 Know the numbers: free tier = 8,192 token context (prompt + completion combined). Paid / on-request = up to 262K. Pasting a 20-page methods section will hit this wall silently — watch for truncated responses.
- 2 Request a context increase if you need more: email Cerebras support from the dashboard (Settings → Contact). Academic accounts are often granted a higher limit quickly, at no cost — worth asking.
- 3 Choose the right provider for long documents: for RAG over full papers or summarising long genomics reports, use OpenRouter (128K+ via many models) or another provider without a free-tier context cap. Cerebras is not the right tool for these tasks on free tier.
- 4 Use the provider cheatsheet at /inference-providers-cheatsheet.html to compare Cerebras, Groq, and OpenRouter side-by-side: Cerebras = fastest big-model + generous short-call budget; Groq = batch + audio STT; OpenRouter = widest model choice including proprietary models.
You can identify which of your workflows fit within 8K context (and stay on Cerebras free) versus which need a different provider or a context-increase request.
Provider routing checklist
Turn the lesson into a reusable decision rule for your lab.
Your task
Write a short comment block or text note that states when to route calls to Cerebras vs Groq vs OpenRouter, based on your actual workload.
- 1 List your top 3 LLM tasks (e.g. "classify abstracts", "summarise full PDFs", "transcribe lab recordings").
- 2 Assign each task a provider using the cheatsheet rules: short calls + big model → Cerebras; audio → Groq; long context or proprietary model needed → OpenRouter.
- 3 Add a note about the 8K cap so the next person on the project does not waste time debugging a silent truncation error on the free tier.
Deliverable
A short written routing rule (comment block or text note) naming which tasks go to which provider and flagging the Cerebras 8K free-tier cap.