FAQ — Nexus One AI Portal

The Basics

What's the difference between Ollama and Open WebUI? ›

Ollama is the engine — it runs AI models on the GPU. You never interact with it directly. Open WebUI is the interface — the browser-based chat window that you type into. Open WebUI sends your messages to Ollama, Ollama runs the AI model, and Open WebUI shows you the response. Think of Ollama as the kitchen and Open WebUI as the dining room.

Can I use this system without internet access? ›

Yes — completely. Your Nexus One AI system is designed to run entirely on your local network with no internet connection required. All models are stored on the server's local storage. All queries stay on your network. This is one of the core reasons for choosing on-premise AI over cloud services. If your system is configured in air-gapped mode, it has no internet connection at all by design.

How many people can use it at the same time? ›

The Entry tier (1× RTX Pro 6000, 96 GB VRAM) comfortably handles 5–20 concurrent users on standard inference tasks. "Concurrent" means actively sending and receiving messages at the same moment — not just logged in. If all 20 users send a message simultaneously, responses are queued and returned within seconds. The system degrades gracefully under load — it slows down rather than crashing.

Is my data secure? Can anyone else see my conversations? ›

Your conversations stay on your server — they are never sent to Cezen, NVIDIA, or any external party. Within your organisation, Open WebUI separates conversations by user account, so other users cannot see your chat history. Administrators can access all conversations in the Admin Panel. If you have specific data isolation requirements, speak to your administrator about enabling conversation encryption.

Can the AI access the internet or search the web? ›

No. By design, your Nexus One AI system is isolated from the internet. The AI answers questions based on its training knowledge and any documents you upload — it cannot browse websites, check current news, or look up live information. This is intentional for data security. The trade-off is that the AI's knowledge has a training cutoff date and won't know about very recent events.

Using the AI

What file types can I upload for document Q&A? ›

Open WebUI supports: PDF, DOCX (Word), TXT, Markdown (.md), CSV, and XLSX (Excel). Images, PowerPoint files, and archives (.zip) are not supported for text extraction. Important: PDFs must be text-based (searchable), not scanned images — if you can highlight and copy text from the PDF, it will work. Scanned documents need OCR conversion first.

What happens when the AI doesn't know something? ›

It depends on the model. Well-behaved models will say "I don't know" or "I'm not sure." However, all AI models can hallucinate — generate confident-sounding but incorrect answers. This is a known limitation of the technology. For any factual, legal, regulatory, or financial query, always verify the AI's answer against authoritative sources. Use document upload mode when you need accuracy — the AI is far less likely to hallucinate when it's reading actual text you provided.

Can I use the AI from my mobile phone? ›

Yes — Open WebUI works in mobile browsers (Chrome, Safari). Open http://ai.local:3001 (or the server IP) on your phone while connected to the office Wi-Fi. The interface is responsive and works well on phones and tablets. There is no dedicated mobile app — the browser version is the recommended way to use it on mobile.

Can I give the AI a specific role or personality? ›

Yes — this is called a system prompt. In Open WebUI, go to Workspace → Models and create a custom model. Give it a name (e.g. "HR Policy Assistant"), a system prompt (e.g. "You are an expert in this organisation's HR policies. Answer questions based on the uploaded HR documents. Always cite which policy section your answer is from."), and assign it a knowledge base. This custom model will then appear in the model selector for all users.

How do I create accounts for my team? ›

The system administrator can manage users from Admin Panel → Users in Open WebUI. You can invite users by email, create accounts manually, or enable open registration (where anyone on the network can sign up). Role levels are: Admin (full control), User (chat and upload), and Pending (signed up but waiting for admin approval). Admins can reset passwords and delete accounts from the same panel.

AI Tools & Features

What's the difference between Multimodal Chat and the standard Open WebUI chat? ›

The Multimodal Chat Prompt Studio in the Cezen portal is a purpose-built interface on top of the same AI models. Key differences: it stores your full conversation history in the browser (searchable across all past chats), supports image uploads alongside text, lets you switch the AI model mid-conversation without starting over, renders formatted responses (headers, tables, code blocks) properly, and lets you pin important conversations. Use the Cezen chat for anything you want to keep and refer back to; use Open WebUI for quick one-off queries.

How does the Meeting Assistant work — and is it recording? ›

The Meeting Assistant uses the browser's built-in microphone API to capture audio locally — nothing is streamed to any external server. You click Record, the audio is processed on-premises through a speech-to-text model (Whisper), and the transcript is then passed to an LLM for summarisation, action item extraction, and draft follow-up emails. You can also upload a pre-recorded audio file (.mp3, .wav, .m4a) if you don't want to record live. The audio and transcript never leave your network.

What's an AI Agent and when should I use one? ›

An AI Agent is an AI that can take a multi-step goal and carry it out autonomously — calling tools, reading files, making decisions, and looping until complete — without you guiding every step. Regular chat is a back-and-forth dialogue; an agent is more like delegating a task. Use agents for: processing batches of documents, monitoring a folder and acting on new files, generating reports from multiple data sources, or any workflow that currently takes you several manual steps. Agents on Cezen run with governance controls — actions above a certain risk threshold are held for human approval before executing.

What can Workflow Automation do that an Agent can't? ›

Workflows are deterministic pipelines — you define a fixed sequence of steps (trigger → process → output) that runs the same way every time. Agents are more flexible — they plan their own steps based on the goal you give them. Use a workflow when the process is well-defined and repeatable (e.g., "every night at 11pm, summarise today's audit log and email it to the admin"). Use an agent when the task requires judgment and adaptation (e.g., "review these 50 contracts and flag any that contain unusual termination clauses"). In practice, you'll often use both — a scheduled workflow that kicks off an agent as one of its steps.

What are Secure Chat Rooms and how are they different from regular chat? ›

Chat Rooms are team spaces where multiple users can collaborate together with the AI in a shared conversation. Regular chat is 1-on-1 between you and the AI; a chat room brings your whole team into the same AI-assisted thread. Useful for: project teams that need a shared AI workspace, department-level Q&A rooms where the AI has access to relevant knowledge bases, or support desks where staff can collectively ask and search AI responses. All rooms are private to invited members — no messages leave your server.

Can I upload images and ask questions about them? ›

Yes, if a vision-capable model is installed. In Multimodal Chat Prompt Studio, click the image icon or drag a photo into the chat input — you can then ask questions about what's in the image ("describe this diagram", "what does this chart show?", "extract the text from this screenshot"). The model that handles vision tasks on your system is typically LLaVA 13B — switch to it using the model switcher in the chat toolbar before uploading an image. JPG, PNG, and WEBP are supported.

Automation & Scheduled Jobs

How do I set up a task that runs automatically every day? ›

Use Scheduled Jobs. Create a new job, write the prompt or task the AI should perform, and set a cron schedule (e.g., 0 7 * * * for 7am daily). The job runs on the server automatically — no one needs to be logged in. Common examples: daily log summaries, nightly document ingestion, weekly usage reports, morning briefings. Results are stored in the job's run history and can optionally be emailed or written to a file via an output connector.

How do I connect Nexus One AI to our existing database or file server? ›

Use Connectors. A connector is a configured bridge between Nexus One AI and an external system — your file share, SQL database, REST API, email server, or SharePoint. Once a connector is set up by an admin, agents and workflows can use it as a data source or output destination. For example: an agent with a SQL connector can query your database in real time; a workflow with a SharePoint connector can save generated documents directly to the correct library. Connector credentials are stored encrypted and only accessible to the Cezen backend.

Technical Questions

How do I add more AI models to the system? ›

If your system has internet access: SSH into the server and run ollama pull [model-name]. For example, ollama pull mistral:7b. The model will download and appear in Open WebUI immediately after. If your system is air-gapped, models must be transferred via a secure USB drive provided by your administrator — contact Cezen support for the process. See the model library for what's available.

Can I connect this AI system to our existing software? ›

Yes — Ollama exposes a REST API at http://[server-ip]:11434 that is compatible with the OpenAI API format. Any application that can make an HTTP request can call the AI. The API supports the same /v1/chat/completions endpoint format as OpenAI, so many existing integrations work by just changing the base URL. For custom integrations, see the FastAPI page.

What's the difference between RAG and fine-tuning? ›

RAG (Retrieval Augmented Generation) — you upload documents and the AI searches them at query time to find relevant content, which it then reads and answers from. Fast to set up, no training required, easy to update your documents. Best for Q&A over changing documents.

Fine-tuning — you train the model on your data so it learns your terminology, style, and patterns permanently into its weights. Takes hours to set up, requires training data, but makes the model natively better at your specific domain. Best for consistent style, format, or domain-specific reasoning tasks. See the glossary for more detail.

How do guardrails work — can the AI be made to refuse certain topics? ›

Yes. Guardrails sit between the user and the model and inspect every query before it reaches the AI, and every response before it's returned. You can configure: keyword blocks (queries containing specific words are blocked outright), regex rules (e.g., block any query containing a 16-digit number that looks like a card number), PII redaction (automatically strip email addresses, phone numbers, ID numbers from responses before they're shown), and prompt injection detection (catch attempts to override the system prompt). Rules can have actions: block (query is rejected), redact (sensitive data is removed), or warn (query proceeds but is flagged in the audit log). Guardrails are managed by admins at Admin → Tools → Guardrails.

My document uploads don't seem to be returning relevant answers. What's wrong? ›

This is usually a RAG quality issue. Check the RAG Quality Dashboard for your knowledge base collection — it shows chunk quality scores, retrieval accuracy, and which documents are performing poorly. Common causes: PDFs that are scanned images (not text-based) will produce empty chunks; very short documents may not chunk well; documents in unusual formats or with complex layouts lose structure during extraction. Fixes: re-upload PDFs as text-based versions, adjust chunk size in collection settings, or split very large documents into smaller files. The RAG Quality page has a "Test Query" tool that shows exactly which chunks are being retrieved for any given question.

Which model should I use for which task? ›

The Model Router can handle this automatically — configure rules and the system routes each query to the right model. As a manual guide: llama3.1:8b — best general default, fast, capable; llama3.1:70b — complex reasoning, long documents, nuanced analysis (slower); codellama:34b — code generation, debugging, technical review; llava:13b — image understanding (vision queries); mistral:7b — structured output, multilingual tasks; llama3.2:3b — very fast simple tasks where quality matters less than speed. See the full model library for detailed comparisons and use-case recommendations.

How often does the system need maintenance? ›

Nexus One AI systems are designed to run continuously with minimal maintenance. Services are configured to start automatically on boot. Routine tasks include: periodic software updates (delivered by Cezen via USB for air-gapped systems), disk space monitoring (models and documents consume storage over time), and GPU health checks. Cezen provides a maintenance schedule and support SLA as part of your package. Contact support@cezentech.com for your specific maintenance plan.

Frequently Asked Questions