32dots HEIDELBERG AI
Session 4 intermediate

Serve your whole lab from one headless machine

USE 0 - 20 min

Expose Ollama on your network so the whole lab can share it

By default Ollama only listens on localhost, so only the machine it runs on can reach it. With one environment variable — OLLAMA_HOST — you can bind it to all network interfaces and turn a spare laptop or a headless server into a shared private AI endpoint that every machine in your lab can point its scripts at. There is no GUI required: Ollama is CLI- and service-first by design, which is exactly what you want on a server.

  1. 1 Pick the machine that will host the model — ideally the one with the most RAM or a GPU. Install Ollama on it and pull a model with ollama pull llama3.
  2. 2 Set OLLAMA_HOST to bind all interfaces. On macOS: launchctl setenv OLLAMA_HOST "0.0.0.0:11434", then restart the Ollama app. On Linux (systemd): run systemctl edit ollama.service, add under [Service] the line Environment="OLLAMA_HOST=0.0.0.0:11434", then run systemctl daemon-reload && systemctl restart ollama. On Windows: add OLLAMA_HOST=0.0.0.0:11434 to your account's environment variables and restart Ollama.
  3. 3 Find the host machine's LAN IP (e.g. ifconfig / ip addr on macOS/Linux, ipconfig on Windows) — something like 192.168.1.42.
  4. 4 From another machine on the same network, test it: ` curl http://192.168.1.42:11434/api/chat -d '{ "model": "llama3", "messages": [{"role": "user", "content": "Say hello"}], "stream": false }' ` A JSON reply means the shared endpoint is live.
  5. 5 Optionally keep the model warm. Set OLLAMA_KEEP_ALIVE (e.g. 24h, or a negative number to keep it loaded indefinitely) so the model does not unload between requests — useful for a shared endpoint people hit irregularly.

You called the Ollama server from a different machine on your LAN using its IP and got a model response. The lab now has one shared private endpoint.

BUILD 20 - 30 min

Write a startup script for your lab's shared private endpoint

A one-file script that pulls your chosen model, starts the server bound to the network, and checks that it is reachable is all you need to turn a spare machine into a shared private AI endpoint.

Write a `start-lab-llm.sh` (macOS/Linux) that sets `OLLAMA_HOST`, ensures the model is pulled, starts the server, and prints the endpoint URL.

  1. 1 Create the script file. On macOS/Linux: `bash #!/bin/bash export OLLAMA_HOST="0.0.0.0:11434" ollama pull llama3 ollama serve & sleep 2 echo "Ollama API ready at http://$(hostname -I | awk '{print $1}'):11434/v1/" ollama list `
  2. 2 Make it executable: chmod +x start-lab-llm.sh.
  3. 3 Run it and confirm the server starts and the model is available.
  4. 4 Test from a second machine on the same network using curl with the host's LAN IP instead of localhost.
Deliverable

A working startup script and a curl response from another machine (or from localhost if you are working solo).