LM Studio — Quick-reference

Run open models locally · private · OpenAI-compatible API · headless-capable (v0.4+)

Model size & RAM guide

Model size Quantization RAM needed Good for
7–8 B Q4_K_M 5–6 GB Most laptops — daily chat, drafts, code snippets
7–8 B Q8_0 8–9 GB Higher quality output, same size model, needs a bit more headroom
13 B Q4_K_M 8–10 GB 16 GB RAM Mac/PC — better reasoning, longer context
30–34 B Q4_K_M 18–22 GB 32 GB RAM — strong coding, analysis tasks
70 B Q4_K_M 40–45 GB GPU workstation / server — near-frontier quality, slow on CPU

Start with a 7–8 B Q4_K_M model family: Llama 3.1/3.2, Qwen2.5, Mistral, or Phi-3. In the Discover tab, filter by your RAM. If the model loads but replies take >30 s, try a smaller quant or smaller parameter count.

GUI quick actions

Chat tab
Chat with a local model
Select a loaded model in the top dropdown → type your prompt. No account, no internet.
Chat tab
Compare models side by side
Click the multi-model button next to the model selector → add a second model → same prompt goes to both.
Discover tab
Download a model
Search by name → choose a GGUF variant → click Download. Models are stored locally from Hugging Face.
Developer tab
Start the local API server
Click Start Server. Status bar turns green: localhost:1234. Stop with the same button.
Developer tab
Serve on your LAN
Change the network binding from localhost to 0.0.0.0 (or your LAN IP). Restart server. Other machines reach http://<your-ip>:1234/v1.
Settings
Install the CLI tools
Settings → Enable 'Install CLI tools'. This puts lms on your PATH so you can run the server without the GUI.

lms CLI reference

CommandWhat it does
lms server startStart the API server (headless, no GUI needed)
lms server stopStop the server cleanly
lms server statusShow whether the server is running and on which port
lms model load <name>Load a downloaded model by its exact name (as shown in the model list)
lms model unload <name>Unload a model and free its RAM
lms model listList all downloaded models
lms --versionPrint the CLI version
lms --helpFull CLI help

Always-on daemon: llmster is a standalone LM Studio server daemon (lmstudio.ai/llmster) that runs as a system service and starts on boot — no GUI, no manual lms server start each time. Install it on a spare machine to create a shared private endpoint for your lab.

Local API — base URL & endpoint

Local machine

http://localhost:1234/v1

Another machine on your LAN

http://<your-lan-ip>:1234/v1

This is an OpenAI-compatible (and Anthropic-compatible) endpoint. Any tool or SDK that accepts a custom base_url can use it — including the Python openai package, LangChain, AnythingLLM, and Hermes. The API key field can be any non-empty string; local servers ignore it.

curl test

curl http://localhost:1234/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF",
    "messages": [
      {"role": "user", "content": "Explain the central dogma of molecular biology in two sentences."}
    ]
  }'

Replace the model name with whatever you have loaded. If the server is running you will see a JSON response; the choices[0].message.content field holds the text.

Python (openai SDK)

# pip install openai
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:1234/v1",
    api_key="local"          # any value; local servers ignore it
)

response = client.chat.completions.create(
    model="lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF",
    messages=[
        {"role": "user", "content": "Summarise RNA-seq analysis in 3 bullet points."}
    ]
)
print(response.choices[0].message.content)

Connect Hermes to LM Studio

Edit ~/.hermes/config.yaml — two lines. Then add a placeholder key to ~/.hermes/.env.

~/.hermes/config.yaml

provider: custom
base_url: "http://localhost:1234/v1"

For a model on another machine, replace localhost with its LAN IP.

~/.hermes/.env

OPENAI_API_KEY=local

Local servers ignore the key value. Hermes still requires the variable to be present.

See the Hermes course, lesson hermes-00 for full install instructions. Once these two lines are set, start Hermes with hermes --tui — it will use your local LM Studio model for all conversations, with no data leaving your machine.

Local vs cloud — quick decision

SituationUse
Sensitive or unpublished data (patient records, pre-submission results)Local LM Studio
Offline work (plane, field, no-internet lab)Local LM Studio
Testing which model answers your domain questions bestLocal — side-by-side compare
Shared lab endpoint, no GUI on serverLocal — headless lms / llmster
Complex reasoning, very long documents, frontier capabilityCloud (Claude / GPT / Gemini)
Hardware is a constraint (old laptop, no GPU)Cloud or a smaller local model