Ollama — Nexus One AI Portal

What is it?

In plain terms

Ollama is the software that loads AI models onto your GPU and serves them. Think of it as the engine room — you don't interact with it directly day-to-day, but everything else (Open WebUI, LangChain, your applications) sends requests to Ollama to get AI responses.

What it handles

Loading and unloading AI models from GPU memory
Processing requests and returning responses
Downloading new models from the library
Managing multiple models on the same system

How to access it

Access Points

REST API http://localhost:11434

Terminal ollama [command]

Status systemctl status ollama

💡 Most users never need to open Ollama directly — Open WebUI gives you a friendly chat interface that talks to Ollama automatically.

Common tasks

See which models are installed

Open a terminal and run ollama list. You'll see all downloaded models, their size, and when they were last used.

Download a new model

Run ollama pull llama3.1:8b to download a model. Replace llama3.1:8b with any model name from the Ollama library. Note: your system may be air-gapped — check with your administrator before attempting.

Chat with a model in the terminal

Run ollama run llama3.1:8b to start a direct chat session in your terminal. Type /bye to exit.

Check what's currently loaded

Run ollama ps to see which models are currently loaded into GPU memory and using VRAM.

Remove a model to free up space

Run ollama rm model-name to delete a model from disk and free storage space.

Works with

💬 Open WebUI 🔗 LangChain 📓 Jupyter ⚡ FastAPI