Jan — Quick-reference

Open-source (Apache-2.0) · offline-first · 100% local · OpenAI-compatible API · optional cloud per thread

What Jan is

Privacy
100% local & offline-first
Models run on your machine; data never leaves it. Works with the network off — on a plane, in a no-internet lab.
Licence
Free & open source
Apache-2.0. Local use costs nothing. There is no paid tier (no pricing page exists).
Platforms
Mac · Windows · Linux
A familiar ChatGPT-style desktop app. No account and no command line required to get started.
Models
Built-in model Hub
One-click downloads of open-weight LLMs (Llama, Gemma, Qwen, GPT-oss) from a built-in Hub / Hugging Face.
Cloud (optional)
Bring your own key
Add an OpenAI, Anthropic, Mistral, or Groq key to call a cloud model per thread — one app for local and cloud.
Agentic
Assistants, Projects & MCP
Save custom Assistants, group work into Projects, and connect external tools via the Model Context Protocol.

Picking a model size

Model size Trade-off Good for
Small Fast, light on memory, lower quality Modest laptops — daily chat, drafts, quick edits
Medium Better reasoning, needs more memory Capable laptop / desktop — analysis, longer context
Large Best quality, heavy on resources Workstation / GPU — near-frontier quality, slow on weak hardware

Start with the smallest model the Hub recommends for your machine — families like Llama, Gemma, Qwen, or GPT-oss. If replies are too slow, switch to a smaller model. Local models need a capable machine; smaller hardware limits quality.

Quick actions in the app

Hub
Download a model
Open the model Hub → pick an open-weight model → click to download. The file is stored locally on your machine.
Chat
Chat with a local model
Start a new thread → select a downloaded model → type your prompt. No account, no internet.
Chat
Work offline
Turn Wi-Fi off and keep chatting — a local model needs no connection. Sensitive text never leaves the machine.
Settings
Add a cloud provider
Paste an OpenAI / Anthropic / Mistral / Groq API key → choose that provider's model per thread when you want cloud capability.
Settings
Enable the local API server
Turn on the local server so other apps can call your model at http://localhost:1337.
Settings
Assistants & MCP
Save a custom Assistant (model + instruction), group threads into Projects, and connect external tools via MCP.

Local API — base URL

Server endpoint

http://localhost:1337

OpenAI-compatible path

http://localhost:1337/v1

Jan exposes an OpenAI-compatible local server. Any tool or SDK that accepts a custom base_url can use it — including the Python openai package and editor extensions for private coding assistance. The API key field can be any non-empty string; local servers ignore it.

curl test

curl http://localhost:1337/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "messages": [
      {"role": "user", "content": "Explain the central dogma of molecular biology in two sentences."}
    ]
  }'

Make sure the local API server is enabled in Jan's settings and a model is loaded. Set the model field to the model name shown in Jan if your build requires it. The choices[0].message.content field of the JSON response holds the text.

Python (openai SDK)

# pip install openai
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:1337/v1",
    api_key="local"          # any value; local servers ignore it
)

response = client.chat.completions.create(
    model="local-model",     # use the model name shown in Jan
    messages=[
        {"role": "user", "content": "Summarise RNA-seq analysis in 3 bullet points."}
    ]
)
print(response.choices[0].message.content)

Local vs cloud — quick decision

SituationUse
Sensitive or unpublished data (patient records, pre-submission results)Local model in Jan
Offline work (plane, field, no-internet lab)Local model in Jan
Private coding assistance in your editorJan local API localhost:1337
A reusable, specialised persona for a recurring taskCustom Assistant
Complex reasoning, very long documents, frontier capabilityCloud model per thread (your key)
Hardware is a constraint (old laptop, no GPU)Smaller local model, or cloud per thread

Note: Jan is offline-first and not built to run as a shared, always-on server, and its curated Hub is smaller than a full Hugging Face browser. For a shared lab endpoint or the widest model selection, pair it with a tool built for that.