32dots HEIDELBERG AI
Session 2 easy

Call the OpenAI-compatible local API

USE 0 - 20 min

Hit your local model from curl and Python — like the OpenAI API

Once Ollama is installed, its server is already running in the background at http://localhost:11434. It exposes two APIs: a native one at /api/chat, and an OpenAI-compatible one at /v1/. That second one is the powerful part: any script, tool, or SDK that talks to the OpenAI API can talk to your local model instead — swap one base URL, keep your existing code, and no data leaves the machine. This is what makes Ollama the local engine behind your analysis scripts and tools like Hermes.

  1. 1 Confirm the server is up: run ollama list — if it responds, the server at localhost:11434 is running. (If not, run ollama serve to start it manually.)
  2. 2 Test the native API with curl: ` curl http://localhost:11434/api/chat -d '{ "model": "llama3", "messages": [{"role": "user", "content": "Name three open-access genomics databases in one sentence each."}], "stream": false }' ` With "stream": false you get one JSON object back; the message.content field holds the text.
  3. 3 Test the OpenAI-compatible API with Python — if you have the openai package (pip install openai), run: `python from openai import OpenAI client = OpenAI(base_url='http://localhost:11434/v1/', api_key='ollama') response = client.chat.completions.create( model='llama3', messages=[{'role': 'user', 'content': 'Summarise the central dogma in 30 words.'}] ) print(response.choices[0].message.content) ` The api_key value is required by the SDK but ignored by Ollama — any non-empty string works.
  4. 4 Notice the one-line difference from real OpenAI code: only the base_url changed. Everything else is identical.

curl and Python both returned a model-generated response from `localhost:11434`. You have a private OpenAI-compatible endpoint running on your own machine.

BUILD 20 - 30 min

Write a one-function Python helper that wraps your local model

A reusable wrapper means you can call your local model from any analysis script with one import — the same way you would use the real OpenAI SDK.

Write a short Python function `ask_local(prompt, model='llama3')` that hits your Ollama server and returns the text reply. Test it with a science question.

  1. 1 Create a file local_llm.py with a function that creates an openai.OpenAI(base_url='http://localhost:11434/v1/', api_key='ollama') client and returns response.choices[0].message.content.
  2. 2 Accept model as a parameter with a sensible default (the model name you use most).
  3. 3 Call it with: print(ask_local('List three bioinformatics tools for differential expression analysis.'))
  4. 4 Confirm the reply is correct and the response time is acceptable for your hardware.
Deliverable

A working `local_llm.py` file with the `ask_local` function and one test output.