Open-source (Apache-2.0) · offline-first · 100% local · OpenAI-compatible API · optional cloud per thread
| Model size | Trade-off | Good for |
|---|---|---|
| Small | Fast, light on memory, lower quality | Modest laptops — daily chat, drafts, quick edits |
| Medium | Better reasoning, needs more memory | Capable laptop / desktop — analysis, longer context |
| Large | Best quality, heavy on resources | Workstation / GPU — near-frontier quality, slow on weak hardware |
Start with the smallest model the Hub recommends for your machine — families like Llama, Gemma, Qwen, or GPT-oss. If replies are too slow, switch to a smaller model. Local models need a capable machine; smaller hardware limits quality.
http://localhost:1337.Server endpoint
http://localhost:1337
OpenAI-compatible path
http://localhost:1337/v1
Jan exposes an OpenAI-compatible local server. Any tool or SDK that accepts a custom base_url can use it — including the Python openai package and editor extensions for private coding assistance. The API key field can be any non-empty string; local servers ignore it.
curl http://localhost:1337/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{
"messages": [
{"role": "user", "content": "Explain the central dogma of molecular biology in two sentences."}
]
}'
Make sure the local API server is enabled in Jan's settings and a model is loaded. Set the model field to the model name shown in Jan if your build requires it. The choices[0].message.content field of the JSON response holds the text.
# pip install openai from openai import OpenAI client = OpenAI( base_url="http://localhost:1337/v1", api_key="local" # any value; local servers ignore it ) response = client.chat.completions.create( model="local-model", # use the model name shown in Jan messages=[ {"role": "user", "content": "Summarise RNA-seq analysis in 3 bullet points."} ] ) print(response.choices[0].message.content)
| Situation | Use |
|---|---|
| Sensitive or unpublished data (patient records, pre-submission results) | Local model in Jan |
| Offline work (plane, field, no-internet lab) | Local model in Jan |
| Private coding assistance in your editor | Jan local API localhost:1337 |
| A reusable, specialised persona for a recurring task | Custom Assistant |
| Complex reasoning, very long documents, frontier capability | Cloud model per thread (your key) |
| Hardware is a constraint (old laptop, no GPU) | Smaller local model, or cloud per thread |
Note: Jan is offline-first and not built to run as a shared, always-on server, and its curated Hub is smaller than a full Hugging Face browser. For a shared lab endpoint or the widest model selection, pair it with a tool built for that.