Nexus One AI ๐Ÿ”” Basic Tier
๐Ÿฆ™
โ† All Tools

Ollama

The engine that runs AI models on your system. Everything else talks to Ollama.
View API Status โ†— Installed Models โ†— ๐Ÿ”Œ http://ai.local:11434
What is it?

In plain terms

Ollama is the software that loads AI models onto your GPU and serves them. Think of it as the engine room โ€” you don't interact with it directly day-to-day, but everything else (Open WebUI, LangChain, your applications) sends requests to Ollama to get AI responses.

What it handles

  • Loading and unloading AI models from GPU memory
  • Processing requests and returning responses
  • Downloading new models from the library
  • Managing multiple models on the same system
How to access it

Access Points

REST API http://localhost:11434
Terminal ollama [command]
Status systemctl status ollama
๐Ÿ’ก Most users never need to open Ollama directly โ€” Open WebUI gives you a friendly chat interface that talks to Ollama automatically.
Common tasks
1

See which models are installed

Open a terminal and run ollama list. You'll see all downloaded models, their size, and when they were last used.

2

Download a new model

Run ollama pull llama3.1:8b to download a model. Replace llama3.1:8b with any model name from the Ollama library. Note: your system may be air-gapped โ€” check with your administrator before attempting.

3

Chat with a model in the terminal

Run ollama run llama3.1:8b to start a direct chat session in your terminal. Type /bye to exit.

4

Check what's currently loaded

Run ollama ps to see which models are currently loaded into GPU memory and using VRAM.

5

Remove a model to free up space

Run ollama rm model-name to delete a model from disk and free storage space.

Works with
๐Ÿ’ฌ Open WebUI ๐Ÿ”— LangChain ๐Ÿ““ Jupyter โšก FastAPI