Speed + the big free budget: fast agent loops & large models

USE 0 - 15 min

What makes Cerebras worth using

Cerebras runs on wafer-scale silicon that delivers ~2,600 tok/s — the fastest public inference for large models anywhere. Combined with 1 M free tokens/day, it is the right choice for workflows that fire many short requests.

Log into n8n.32dots.de with the email and password you received when you signed up. Will be live on session day

1 Understand the speed edge vs Groq: Cerebras is fastest on large models (70B–405B) and has a far larger free daily budget. Groq wins for audio (Whisper STT) and broad batch jobs; OpenRouter wins on model variety.
2 Pick a model for your task: llama-3.3-70b (best everyday balance), qwen-3-235b or llama3.1-405b (maximum reasoning at still-fast speed), deepseek-r1-distill (chain-of-thought), llama-4-scout (extended-context model — but read the 8K cap warning in the next lesson before using it on the free tier).
3 Run a tight agent loop — fire 10–20 classification or extraction calls in sequence (e.g. classify each abstract in a literature batch). At 2,600 tok/s the round-trip is dominated by network latency, not model time.
4 Monitor your quota in the Cerebras dashboard. 1 M tokens resets daily; a typical research loop (10 calls × 500 tokens) consumes ~5,000 tokens — well within budget.

✓

You have run a multi-call loop (5+ requests) and confirmed usage in the Cerebras dashboard is well under the 1 M daily limit.

BUILD 15 - 20 min

Batch-abstract classifier

Build a short script that classifies a list of paper abstracts using llama-3.3-70b and prints throughput.

Your task

Given a list of 10–20 short paper abstracts, classify each as 'methods paper', 'review', or 'results paper' in a loop and print total elapsed time.

1 Create a list of 10 real or dummy abstracts (one sentence each is fine for testing).
2 Write a loop that sends each abstract to Cerebras (base_url="https://api.cerebras.ai/v1", model="llama-3.3-70b") with a one-line system prompt: "Classify this abstract as: methods paper, review, or results paper. Reply with one label only."
3 Time the full run with Python's time module and print tokens-per-second. Compare to what you would expect from a slower provider.

Deliverable

A script that prints one classification per abstract plus total elapsed time and estimated throughput.