LM Studio — Quick-reference

Run open models locally · private · OpenAI-compatible API · headless-capable (v0.4+)

Model size & RAM guide

Model size	Quantization	RAM needed	Good for
7–8 B	`Q4_K_M`	5–6 GB	Most laptops — daily chat, drafts, code snippets
7–8 B	`Q8_0`	8–9 GB	Higher quality output, same size model, needs a bit more headroom
13 B	`Q4_K_M`	8–10 GB	16 GB RAM Mac/PC — better reasoning, longer context
30–34 B	`Q4_K_M`	18–22 GB	32 GB RAM — strong coding, analysis tasks
70 B	`Q4_K_M`	40–45 GB	GPU workstation / server — near-frontier quality, slow on CPU

Start with a 7–8 B Q4_K_M model family: Llama 3.1/3.2, Qwen2.5, Mistral, or Phi-3. In the Discover tab, filter by your RAM. If the model loads but replies take >30 s, try a smaller quant or smaller parameter count.

GUI quick actions

Chat tab

Chat with a local model

Select a loaded model in the top dropdown → type your prompt. No account, no internet.

Chat tab

Compare models side by side

Click the multi-model button next to the model selector → add a second model → same prompt goes to both.

Discover tab

Download a model

Search by name → choose a GGUF variant → click Download. Models are stored locally from Hugging Face.

Developer tab

Start the local API server

Click Start Server. Status bar turns green: localhost:1234. Stop with the same button.

Developer tab

Serve on your LAN

Change the network binding from localhost to 0.0.0.0 (or your LAN IP). Restart server. Other machines reach http://<your-ip>:1234/v1.

Settings

Install the CLI tools

Settings → Enable 'Install CLI tools'. This puts lms on your PATH so you can run the server without the GUI.

`lms` CLI reference

Command	What it does
`lms server start`	Start the API server (headless, no GUI needed)
`lms server stop`	Stop the server cleanly
`lms server status`	Show whether the server is running and on which port
`lms model load <name>`	Load a downloaded model by its exact name (as shown in the model list)
`lms model unload <name>`	Unload a model and free its RAM
`lms model list`	List all downloaded models
`lms --version`	Print the CLI version
`lms --help`	Full CLI help

Always-on daemon: llmster is a standalone LM Studio server daemon (lmstudio.ai/llmster) that runs as a system service and starts on boot — no GUI, no manual lms server start each time. Install it on a spare machine to create a shared private endpoint for your lab.

Local API — base URL & endpoint

Local machine

http://localhost:1234/v1

Another machine on your LAN

http://<your-lan-ip>:1234/v1

This is an OpenAI-compatible (and Anthropic-compatible) endpoint. Any tool or SDK that accepts a custom base_url can use it — including the Python openai package, LangChain, AnythingLLM, and Hermes. The API key field can be any non-empty string; local servers ignore it.

curl test

curl http://localhost:1234/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF",
    "messages": [
      {"role": "user", "content": "Explain the central dogma of molecular biology in two sentences."}
    ]
  }'

Replace the model name with whatever you have loaded. If the server is running you will see a JSON response; the choices[0].message.content field holds the text.

Python (openai SDK)

# pip install openai
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:1234/v1",
    api_key="local"          # any value; local servers ignore it
)

response = client.chat.completions.create(
    model="lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF",
    messages=[
        {"role": "user", "content": "Summarise RNA-seq analysis in 3 bullet points."}
    ]
)
print(response.choices[0].message.content)

Connect Hermes to LM Studio

Edit ~/.hermes/config.yaml — two lines. Then add a placeholder key to ~/.hermes/.env.

~/.hermes/config.yaml

provider: custom
base_url: "http://localhost:1234/v1"

For a model on another machine, replace localhost with its LAN IP.

~/.hermes/.env

OPENAI_API_KEY=local

Local servers ignore the key value. Hermes still requires the variable to be present.

See the Hermes course, lesson hermes-00 for full install instructions. Once these two lines are set, start Hermes with hermes --tui — it will use your local LM Studio model for all conversations, with no data leaving your machine.

Local vs cloud — quick decision

Situation	Use
Sensitive or unpublished data (patient records, pre-submission results)	Local LM Studio
Offline work (plane, field, no-internet lab)	Local LM Studio
Testing which model answers your domain questions best	Local — side-by-side compare
Shared lab endpoint, no GUI on server	Local — headless `lms` / llmster
Complex reasoning, very long documents, frontier capability	Cloud (Claude / GPT / Gemini)
Hardware is a constraint (old laptop, no GPU)	Cloud or a smaller local model