32dots HEIDELBERG AI
Session 1 easy

Speed + the big free budget: fast agent loops & large models

USE 0 - 15 min

What makes Cerebras worth using

Cerebras runs on wafer-scale silicon that delivers ~2,600 tok/s — the fastest public inference for large models anywhere. Combined with 1 M free tokens/day, it is the right choice for workflows that fire many short requests.

Log into n8n.32dots.de with the email and password you received when you signed up. Will be live on session day
  1. 1 Understand the speed edge vs Groq: Cerebras is fastest on large models (70B–405B) and has a far larger free daily budget. Groq wins for audio (Whisper STT) and broad batch jobs; OpenRouter wins on model variety.
  2. 2 Pick a model for your task: llama-3.3-70b (best everyday balance), qwen-3-235b or llama3.1-405b (maximum reasoning at still-fast speed), deepseek-r1-distill (chain-of-thought), llama-4-scout (extended-context model — but read the 8K cap warning in the next lesson before using it on the free tier).
  3. 3 Run a tight agent loop — fire 10–20 classification or extraction calls in sequence (e.g. classify each abstract in a literature batch). At 2,600 tok/s the round-trip is dominated by network latency, not model time.
  4. 4 Monitor your quota in the Cerebras dashboard. 1 M tokens resets daily; a typical research loop (10 calls × 500 tokens) consumes ~5,000 tokens — well within budget.

You have run a multi-call loop (5+ requests) and confirmed usage in the Cerebras dashboard is well under the 1 M daily limit.

BUILD 15 - 20 min

Batch-abstract classifier

Build a short script that classifies a list of paper abstracts using llama-3.3-70b and prints throughput.

Given a list of 10–20 short paper abstracts, classify each as 'methods paper', 'review', or 'results paper' in a loop and print total elapsed time.

  1. 1 Create a list of 10 real or dummy abstracts (one sentence each is fine for testing).
  2. 2 Write a loop that sends each abstract to Cerebras (base_url="https://api.cerebras.ai/v1", model="llama-3.3-70b") with a one-line system prompt: "Classify this abstract as: methods paper, review, or results paper. Reply with one label only."
  3. 3 Time the full run with Python's time module and print tokens-per-second. Compare to what you would expect from a slower provider.
Deliverable

A script that prints one classification per abstract plus total elapsed time and estimated throughput.