Model Library — Nexus One AI Portal

💡 To use a model, open Open WebUI and select it from the model dropdown at the top of the chat. You can switch models at any time without losing your conversation.

Chat & Reasoning Models

Model

Best for

Speed

VRAM

Context

llama3.1:8b

Recommended — start here

General Q&A, document chat, writing, summarising

Very Fast

~5 GB

128K tokens

llama3.2:3b

Lightweight

Quick lookups, simple questions, high-concurrency use

Fastest

~2 GB

128K tokens

mistral:7b

Reasoning

Structured tasks, code, logical reasoning, JSON extraction

Very Fast

~5 GB

32K tokens

llama3.1:70b

High accuracy

Complex reasoning, legal/technical analysis, nuanced writing

Slower

~42 GB

128K tokens

gemma2:9b

Efficient

Instruction following, structured outputs, multilingual

Very Fast

~6 GB

8K tokens

phi3:mini

Microsoft

Simple tasks, drafting, lightweight deployments

Fastest

~2.3 GB

128K tokens

Embedding Models

Embedding models don't generate text — they convert documents and queries into numbers (vectors) that ChromaDB uses for semantic search. You don't select these in chat; they run automatically when you upload documents.

Model

Best for

Dimensions

VRAM

nomic-embed-text

Default for RAG

General document search and Q&A

768

~270 MB

mxbai-embed-large

Higher accuracy

Better retrieval accuracy for complex technical documents

1024

~670 MB

Which model should I use?

💬 Everyday chat and Q&A

llama3.1:8b

Fast enough to feel instant, capable enough for most office tasks.

📂 Chat with documents

llama3.1:8b

Large 128K context window means it can read long documents in one go.

⚡ High concurrent users

llama3.2:3b

Very low VRAM footprint means more users can be served simultaneously.

🧠 Complex analysis

llama3.1:70b

Significantly more accurate for nuanced reasoning, legal, and technical work. Slower.

💻 Code generation

mistral:7b

Strong at structured outputs, JSON, SQL, Python, and logical step-by-step tasks.

🌐 Multilingual tasks

gemma2:9b

Good multilingual coverage for Indian regional languages and English.

💡 Context window explained: The context window is how much text the model can read at once. 128K tokens ≈ about 90,000 words — enough for most full documents. If your document is very large, split it into sections or use ChromaDB to chunk and search it automatically.

AI Models on Your System