Install Ollama & run your first model

USE 0 - 20 min

Get a local model answering questions in under 20 minutes

Ollama is a free, open-source tool that turns running open AI models into a single command. It handles the download, quantization, and GPU acceleration for you — you just name a model and it runs, with nothing leaving your machine. It works on macOS 14+, Windows, and Linux. The first command you run downloads a model and drops you into a chat; a good first pick is llama3, an 7–8 billion parameter model that fits in roughly 5–6 GB of RAM and runs on most modern laptops. Larger models (13B, 30B, 70B) give better answers but need more RAM and usually a GPU.

1 Install Ollama. On macOS or Windows, download the installer from ollama.com/download (macOS needs version 14 Sonoma or later) and run it. On Linux, run curl -fsSL https://ollama.com/install.sh | sh. After install, Ollama runs as a background service automatically.
2 Run your first model. Open a terminal and run ollama run llama3. The first time, it downloads the model (a few GB) and then drops you into an interactive chat prompt.
3 Ask a science question. At the >>> prompt, type: Explain RNA-seq in two sentences for a biologist who has never heard of it.
4 Read the reply. If it arrives — even slowly — your setup is working. Nothing was sent to the internet.
5 Exit the chat by typing /bye (or pressing Ctrl+D). The model stays downloaded for next time.

✓

You received a coherent reply from a local model at the `ollama run` prompt. The network tab in your OS shows no outbound traffic to any AI service.

BUILD 20 - 30 min

Find the smallest model that answers well enough for your work

Bigger is not always better when RAM is the constraint. The goal is the smallest model that gives you answers you can trust for your actual tasks.

Your task

Run a second model of a different size or family, ask both the same science question, and decide which is your daily driver.

1 Pick a second model from ollama.com — a different family works best for comparison (e.g. qwen2.5, gemma, or mistral).
2 Run it with ollama run and ask the same prompt you used in the USE phase.
3 Compare response quality and response speed between the two models.
4 Run ollama list to see both downloaded models and their sizes on disk, then pick a winner and note why — quality, speed, or memory fit.

Deliverable

A one-sentence verdict: which model you chose and the reason (quality / speed / RAM).