Jan — Quick-reference

Open-source (Apache-2.0) · offline-first · 100% local · OpenAI-compatible API · optional cloud per thread

What Jan is

Privacy

100% local & offline-first

Models run on your machine; data never leaves it. Works with the network off — on a plane, in a no-internet lab.

Licence

Free & open source

Apache-2.0. Local use costs nothing. There is no paid tier (no pricing page exists).

Platforms

Mac · Windows · Linux

A familiar ChatGPT-style desktop app. No account and no command line required to get started.

Models

Built-in model Hub

One-click downloads of open-weight LLMs (Llama, Gemma, Qwen, GPT-oss) from a built-in Hub / Hugging Face.

Cloud (optional)

Bring your own key

Add an OpenAI, Anthropic, Mistral, or Groq key to call a cloud model per thread — one app for local and cloud.

Agentic

Assistants, Projects & MCP

Save custom Assistants, group work into Projects, and connect external tools via the Model Context Protocol.

Picking a model size

Model size	Trade-off	Good for
Small	Fast, light on memory, lower quality	Modest laptops — daily chat, drafts, quick edits
Medium	Better reasoning, needs more memory	Capable laptop / desktop — analysis, longer context
Large	Best quality, heavy on resources	Workstation / GPU — near-frontier quality, slow on weak hardware

Start with the smallest model the Hub recommends for your machine — families like Llama, Gemma, Qwen, or GPT-oss. If replies are too slow, switch to a smaller model. Local models need a capable machine; smaller hardware limits quality.

Quick actions in the app

Hub

Download a model

Open the model Hub → pick an open-weight model → click to download. The file is stored locally on your machine.

Chat

Chat with a local model

Start a new thread → select a downloaded model → type your prompt. No account, no internet.

Chat

Work offline

Turn Wi-Fi off and keep chatting — a local model needs no connection. Sensitive text never leaves the machine.

Settings

Add a cloud provider

Paste an OpenAI / Anthropic / Mistral / Groq API key → choose that provider's model per thread when you want cloud capability.

Settings

Enable the local API server

Turn on the local server so other apps can call your model at http://localhost:1337.

Settings

Assistants & MCP

Save a custom Assistant (model + instruction), group threads into Projects, and connect external tools via MCP.

Local API — base URL

Server endpoint

http://localhost:1337

OpenAI-compatible path

http://localhost:1337/v1

Jan exposes an OpenAI-compatible local server. Any tool or SDK that accepts a custom base_url can use it — including the Python openai package and editor extensions for private coding assistance. The API key field can be any non-empty string; local servers ignore it.

curl test

curl http://localhost:1337/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "messages": [
      {"role": "user", "content": "Explain the central dogma of molecular biology in two sentences."}
    ]
  }'

Make sure the local API server is enabled in Jan's settings and a model is loaded. Set the model field to the model name shown in Jan if your build requires it. The choices[0].message.content field of the JSON response holds the text.

Python (openai SDK)

# pip install openai
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:1337/v1",
    api_key="local"          # any value; local servers ignore it
)

response = client.chat.completions.create(
    model="local-model",     # use the model name shown in Jan
    messages=[
        {"role": "user", "content": "Summarise RNA-seq analysis in 3 bullet points."}
    ]
)
print(response.choices[0].message.content)

Local vs cloud — quick decision

Situation	Use
Sensitive or unpublished data (patient records, pre-submission results)	Local model in Jan
Offline work (plane, field, no-internet lab)	Local model in Jan
Private coding assistance in your editor	Jan local API `localhost:1337`
A reusable, specialised persona for a recurring task	Custom Assistant
Complex reasoning, very long documents, frontier capability	Cloud model per thread (your key)
Hardware is a constraint (old laptop, no GPU)	Smaller local model, or cloud per thread

Note: Jan is offline-first and not built to run as a shared, always-on server, and its curated Hub is smaller than a full Hugging Face browser. For a shared lab endpoint or the widest model selection, pair it with a tool built for that.