aipackage/README.md

141 lines
5.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Nexus One AI — Installer
## Quick Start
```bash
git clone <cgit-url>
cd cgit
sudo bash install.sh
```
Server reboots automatically after NVIDIA drivers install. Phase 2 runs on its own after reboot.
On the custom ISO, Ubuntu autoinstall now pauses on the installer network screen so the operator can choose the final IP address from the VM console before installation continues.
## Software-Only / Existing Hardware
Run a feasibility scan before quoting or installing on customer-owned hardware:
```bash
bash scripts/cezen-feasibility.sh
```
The checker reports CPU, RAM, disk, NVIDIA GPU/VRAM, tool readiness, available features, and a recommended Cezen profile. It writes JSON to `/opt/cezen/feasibility.json` when possible, otherwise `./feasibility.json`.
Install on existing hardware without the appliance NVIDIA phase:
```bash
sudo bash install.sh --software-only --profile=auto
```
For small systems or slow customer networks, the installer skips default model downloads on lightweight profiles. To force the same behavior manually:
```bash
sudo bash install.sh --software-only --profile=cpu-ai --skip-model-pull
```
Profiles:
| Profile | Use When | Installs |
|---|---|---|
| `core` | no GPU / low RAM | portal, backend, nginx, health/metrics API |
| `cpu-ai` | 32 GB+ RAM, no usable GPU | core + Chroma/Ollama CPU path, model pull optional |
| `gpu-starter` | 24-32 GB VRAM | local AI starter stack, model pull optional |
| `gpu-standard` | 48-96 GB VRAM | standard GPU stack |
| `gpu-pro` | multi/high-VRAM GPU | advanced GPU stack |
| `gpu-max` | multi-node or HGX-class | full stack, custom sizing |
## Sellable v1 Admin APIs
The backend exposes the first productization APIs for software-only and appliance deployments:
| API | Purpose |
|---|---|
| `GET /api/license` | Shows current tier, feature matrix, and whether the tier is locked by Cezen. |
| `GET /api/system/feasibility` | Returns the generated hardware feasibility report or live fallback. |
| `GET /api/system/readiness-report` | Combines license, feasibility, and install readiness into a customer-facing report payload. |
| `GET /api/audit/report?days=7` | Basic audit summary for handover and admin review. |
| `GET /api/system/backups` | Lists local backups. |
| `POST /api/system/backups` | Creates a local backup of Cezen data. |
| `POST /api/system/backups/{name}/restore` | Restores a named local backup and creates a pre-restore safety snapshot. |
CLI backup helper:
```bash
sudo bash scripts/cezen-backup.sh backup
sudo bash scripts/cezen-backup.sh list
sudo bash scripts/cezen-backup.sh restore /opt/cezen/backups/cezen-backup-YYYYmmdd-HHMMSS.zip
```
## What Gets Installed (Entry Tier)
| Service | Port | Notes |
|---|---|---|
| Ollama | 11434 | LLM inference, 2 models pre-loaded |
| Open WebUI | 3001 | Chat interface |
| vLLM | 8000 | OpenAI-compatible API (start manually) |
| JupyterLab | 8888 | Token: `cezen2024` |
| ChromaDB | 8100 | Vector DB for RAG |
| MLflow | 5000 | Experiment tracking |
| MinIO | 9001 | Object storage (user: cezenadmin / Cezen@2024!) |
| Grafana | 3000 | GPU + system monitoring (admin / cezen2024) |
## Testing Without a GPU (Multipass)
```bash
# On your MacBook:
multipass launch 22.04 --name cezen-test --cpus 4 --mem 8G --disk 40G
multipass shell cezen-test
# Inside the VM:
git clone <cgit-url>
sudo bash install.sh
```
NVIDIA driver install will succeed but `nvidia-smi` won't show GPUs — that's expected. All other services will run fine.
## Pull More Models
```bash
bash models/pull-models.sh --tier=starter # phi3:mini + embeddings
bash models/pull-models.sh --tier=basic # llama3.1:8b, mistral:7b, codellama
bash models/pull-models.sh --tier=pro # + llama3.1:70b, mixtral, deepseek-coder
bash models/pull-models.sh --tier=max # + llama3.1:405b, mixtral:8x22b
```
## File Structure
```
cgit/
├── install.sh ← Entry point
├── ansible/
│ ├── phase1_nvidia.yml ← Phase 1: drivers (triggers reboot)
│ ├── starter.yml ← Phase 2: Starter tier (1 GPU, small team)
│ ├── entry.yml ← Phase 2: Basic tier (12 GPU, department)
│ ├── pro.yml ← Phase 2: Pro tier (2+ GPU, multi-team)
│ ├── max.yml ← Phase 2: Max tier (48 GPU, enterprise)
│ └── roles/
│ ├── base/ ← OS, Python, Miniconda, LangChain
│ ├── nvidia/ ← Drivers, CUDA 12.4, cuDNN 9
│ ├── docker/ ← Docker CE + NVIDIA Container Toolkit
│ ├── k3s/ ← Lightweight Kubernetes
│ ├── ollama/ ← Ollama + Open WebUI
│ ├── vllm/ ← vLLM inference server
│ ├── jupyterlab/ ← JupyterLab notebooks
│ ├── chromadb/ ← Vector database
│ ├── mlflow/ ← Experiment tracking
│ ├── minio/ ← Object storage
│ └── monitoring/ ← Grafana + Prometheus + DCGM
└── models/
└── pull-models.sh ← Pull additional models
```
## Change Default Passwords
Before shipping to a customer, update these:
- JupyterLab token: `/opt/cezen/.jupyter/jupyter_lab_config.py`
- MinIO: `/etc/default/minio`
- Grafana: environment vars in monitoring role, or via UI after first login
- MLflow: no auth by default (add reverse proxy if needed)