Run open models locally · private · OpenAI-compatible API · headless-capable (v0.4+)
| Model size | Quantization | RAM needed | Good for |
|---|---|---|---|
| 7–8 B | Q4_K_M |
5–6 GB | Most laptops — daily chat, drafts, code snippets |
| 7–8 B | Q8_0 |
8–9 GB | Higher quality output, same size model, needs a bit more headroom |
| 13 B | Q4_K_M |
8–10 GB | 16 GB RAM Mac/PC — better reasoning, longer context |
| 30–34 B | Q4_K_M |
18–22 GB | 32 GB RAM — strong coding, analysis tasks |
| 70 B | Q4_K_M |
40–45 GB | GPU workstation / server — near-frontier quality, slow on CPU |
Start with a 7–8 B Q4_K_M model family: Llama 3.1/3.2, Qwen2.5, Mistral, or Phi-3. In the Discover tab, filter by your RAM. If the model loads but replies take >30 s, try a smaller quant or smaller parameter count.
localhost:1234. Stop with the same button.localhost to 0.0.0.0 (or your LAN IP). Restart server. Other machines reach http://<your-ip>:1234/v1.lms on your PATH so you can run the server without the GUI.lms CLI reference| Command | What it does |
|---|---|
lms server start | Start the API server (headless, no GUI needed) |
lms server stop | Stop the server cleanly |
lms server status | Show whether the server is running and on which port |
lms model load <name> | Load a downloaded model by its exact name (as shown in the model list) |
lms model unload <name> | Unload a model and free its RAM |
lms model list | List all downloaded models |
lms --version | Print the CLI version |
lms --help | Full CLI help |
Always-on daemon: llmster is a standalone LM Studio server daemon (lmstudio.ai/llmster) that runs as a system service and starts on boot — no GUI, no manual lms server start each time. Install it on a spare machine to create a shared private endpoint for your lab.
Local machine
http://localhost:1234/v1
Another machine on your LAN
http://<your-lan-ip>:1234/v1
This is an OpenAI-compatible (and Anthropic-compatible) endpoint. Any tool or SDK that accepts a custom base_url can use it — including the Python openai package, LangChain, AnythingLLM, and Hermes. The API key field can be any non-empty string; local servers ignore it.
curl http://localhost:1234/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{
"model": "lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF",
"messages": [
{"role": "user", "content": "Explain the central dogma of molecular biology in two sentences."}
]
}'
Replace the model name with whatever you have loaded. If the server is running you will see a JSON response; the choices[0].message.content field holds the text.
# pip install openai from openai import OpenAI client = OpenAI( base_url="http://localhost:1234/v1", api_key="local" # any value; local servers ignore it ) response = client.chat.completions.create( model="lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF", messages=[ {"role": "user", "content": "Summarise RNA-seq analysis in 3 bullet points."} ] ) print(response.choices[0].message.content)
Edit ~/.hermes/config.yaml — two lines. Then add a placeholder key to ~/.hermes/.env.
~/.hermes/config.yaml
provider: custom base_url: "http://localhost:1234/v1"
For a model on another machine, replace localhost with its LAN IP.
~/.hermes/.env
OPENAI_API_KEY=local
Local servers ignore the key value. Hermes still requires the variable to be present.
See the Hermes course, lesson hermes-00 for full install instructions. Once these two lines are set, start Hermes with hermes --tui — it will use your local LM Studio model for all conversations, with no data leaving your machine.
| Situation | Use |
|---|---|
| Sensitive or unpublished data (patient records, pre-submission results) | Local LM Studio |
| Offline work (plane, field, no-internet lab) | Local LM Studio |
| Testing which model answers your domain questions best | Local — side-by-side compare |
| Shared lab endpoint, no GUI on server | Local — headless lms / llmster |
| Complex reasoning, very long documents, frontier capability | Cloud (Claude / GPT / Gemini) |
| Hardware is a constraint (old laptop, no GPU) | Cloud or a smaller local model |