In plain terms
nvtop is like htop for GPUs. Run it in a terminal and you get a live, colour-coded dashboard showing GPU utilisation, memory usage, temperature, power draw, and the processes currently using each GPU โ all updating in real time. It's the fastest way to answer "is the GPU working?" and "what's using memory right now?"
What you can see
- GPU compute utilisation (% busy)
- Video memory used vs. free
- GPU temperature in real time
- Power consumption (watts)
- List of processes using each GPU and their memory
- Scrolling history graphs of utilisation
SSH into the server, then run:
GPU utilisation bar
The top bar per GPU shows compute utilisation as a percentage. Green = in use. A bar at 0% means the GPU is idle โ no model is actively processing a request. A bar above 50% means queries are being processed.
Memory bar
Shows VRAM usage. A full or mostly full bar is normal โ it means a model is loaded and ready. An empty bar means no model is loaded; an unexpectedly full bar with no model could indicate a stuck process.
Process list
The lower half shows which processes are using each GPU. You'll typically see ollama listed with its memory allocation. If you see an unexpected process using large amounts of VRAM, contact your administrator.
Temperature
Shown per GPU. Under 80ยฐC under load is normal. Over 85ยฐC consistently is worth flagging to your administrator to check cooling and airflow.