Nexus One AI ๐Ÿ”” Basic Tier
๐ŸŽฏ
โ† All Tools

QLoRA / Unsloth

Fine-tune a large AI model on your own data โ€” efficiently, without needing massive GPU memory.
Open in Jupyter โ†— ๐Ÿ““ Run fine-tuning jobs via Jupyter
What is fine-tuning?

In plain terms

Fine-tuning is the process of taking a general-purpose AI model (like Llama 3.1) and continuing to train it on your own data so it learns to behave in a way specific to your organisation โ€” using your terminology, following your formats, reflecting your policies. The result is a model that's significantly better at your specific tasks than the base model.

When fine-tuning makes sense

  • You want the AI to answer in your organisation's voice and style
  • The base model doesn't know your industry's terminology
  • You have hundreds of example question-answer pairs from your domain
  • RAG alone isn't giving accurate enough results
  • You want to teach the model a specific task format (extract fields from forms)
What QLoRA and Unsloth are

QLoRA

Quantised Low-Rank Adaptation โ€” a technique that dramatically reduces the GPU memory needed for fine-tuning. Instead of updating all the weights in a model (which would need 4โ€“8ร— the model's VRAM), QLoRA only trains a small set of adapter layers and keeps the rest of the model in 4-bit compressed format. This makes it possible to fine-tune a 7B or 8B model on the RTX Pro 6000.

Unsloth

Unsloth is a Python library that makes QLoRA fine-tuning 2โ€“4ร— faster and uses 40โ€“70% less memory than standard QLoRA implementations. It achieves this through hand-optimised GPU kernels. For Cezen Entry tier with the RTX Pro 6000, Unsloth is the recommended way to fine-tune โ€” it makes jobs that would otherwise take 10 hours complete in 3โ€“4 hours.

๐Ÿ’ก Fine-tuning is an advanced task. You'll need training data in the right format and some Python knowledge. Start by exploring Open WebUI and RAG with ChromaDB โ€” for most use cases, RAG gives 80% of the benefit of fine-tuning with 10% of the effort.
How to approach a fine-tuning project
1

Prepare your training data

Fine-tuning needs examples in a structured format โ€” typically a JSONL file where each line is a conversation: a prompt and the ideal response. A minimum of 50โ€“100 high-quality examples is needed; 500โ€“2000 is better. Quality matters more than quantity.

2

Open Jupyter and set up the training script

Open Jupyter at http://ai.local:8888. Load the Cezen fine-tuning notebook template (provided by your administrator) or start from scratch with the Unsloth documentation examples.

3

Configure and run training

Set your base model, data file path, and training parameters (epochs, learning rate). A typical fine-tuning run for Llama 3.1 8B on 500 examples takes 45โ€“90 minutes on the Entry tier RTX Pro 6000.

4

Export to Ollama format

Once training is complete, export the fine-tuned model to GGUF format using Unsloth's export function. Then load it into Ollama with ollama create my-model -f Modelfile and use it in Open WebUI like any other model.

5

Test and iterate

Compare your fine-tuned model against the base model on your specific tasks. If it's not where you need it, add more training examples focused on the weak areas and re-run.

Works with
๐Ÿ““ Jupyter ๐Ÿฆ™ Ollama ๐Ÿ’ฌ Open WebUI ๐Ÿ“ก DCGM