Nexus One AI ๐Ÿ”” Basic Tier
Model Library

AI Models on Your System

All models run entirely on your server โ€” no internet required, no per-query cost. Choose the right one for your task.

๐Ÿ’ก To use a model, open Open WebUI and select it from the model dropdown at the top of the chat. You can switch models at any time without losing your conversation.
Chat & Reasoning Models
Model
Best for
Speed
VRAM
Context
llama3.2:3b
Lightweight
Quick lookups, simple questions, high-concurrency use
Fastest
~2 GB
128K tokens
mistral:7b
Reasoning
Structured tasks, code, logical reasoning, JSON extraction
Very Fast
~5 GB
32K tokens
llama3.1:70b
High accuracy
Complex reasoning, legal/technical analysis, nuanced writing
Slower
~42 GB
128K tokens
gemma2:9b
Efficient
Instruction following, structured outputs, multilingual
Very Fast
~6 GB
8K tokens
phi3:mini
Microsoft
Simple tasks, drafting, lightweight deployments
Fastest
~2.3 GB
128K tokens
Embedding Models
Embedding models don't generate text โ€” they convert documents and queries into numbers (vectors) that ChromaDB uses for semantic search. You don't select these in chat; they run automatically when you upload documents.
Model
Best for
Dimensions
VRAM
mxbai-embed-large
Higher accuracy
Better retrieval accuracy for complex technical documents
1024
~670 MB
Which model should I use?
๐Ÿ’ฌ Everyday chat and Q&A
llama3.1:8b
Fast enough to feel instant, capable enough for most office tasks.
๐Ÿ“‚ Chat with documents
llama3.1:8b
Large 128K context window means it can read long documents in one go.
โšก High concurrent users
llama3.2:3b
Very low VRAM footprint means more users can be served simultaneously.
๐Ÿง  Complex analysis
llama3.1:70b
Significantly more accurate for nuanced reasoning, legal, and technical work. Slower.
๐Ÿ’ป Code generation
mistral:7b
Strong at structured outputs, JSON, SQL, Python, and logical step-by-step tasks.
๐ŸŒ Multilingual tasks
gemma2:9b
Good multilingual coverage for Indian regional languages and English.
๐Ÿ’ก Context window explained: The context window is how much text the model can read at once. 128K tokens โ‰ˆ about 90,000 words โ€” enough for most full documents. If your document is very large, split it into sections or use ChromaDB to chunk and search it automatically.