Mom-Portal/docs/WHISPER_VM.md
KevinB-T 9517bad3dc feat: add WhisperX diarization and speaker transcript UI
- add WhisperX diarization support to the Whisper VM server
- normalize speaker timestamp segments from Whisper responses
- document Hugging Face/pyannote VM setup and health checks
- show diarized speaker transcript blocks in record and transcript views
- group consecutive segments from the same speaker
- remove duplicate paragraph transcript display when diarized segments exist
- let diarized transcript content expand without an inner scrollbar
2026-05-20 16:34:50 +05:30

2.3 KiB

Faster-Whisper VM Setup

The sample env expects the Whisper service to run on the VM at 172.16.10.64:8000.

The backend expects a Whisper-compatible HTTP service:

WHISPER_VM_IP=172.16.10.64
WHISPER_VM_PORT=8000
WHISPER_API_URL=http://172.16.10.64:8000
WHISPER_TRANSCRIBE_PATH=/transcribe
WHISPER_HEALTH_PATH=/health
WHISPER_FILE_FIELD=file
WHISPER_ALLOW_MOCK=false

Expected endpoints:

  • GET /health returns any 2xx status when the VM is ready. For WhisperX diarization, it should also report "whisperx": true and "diarization": true.
  • POST /transcribe accepts multipart audio and returns one of:
{
  "transcript_text": "Meeting transcript...",
  "language": "en",
  "duration": 123.45,
  "timestamps": [{ "speaker": "Speaker 1", "start": 0, "end": 5, "text": "Hello" }]
}

The API retries failed requests, applies WHISPER_TIMEOUT_MS, and marks jobs as failed when the VM is unavailable.

Enable WhisperX diarization on the VM

The systemd unit runs /home/cezen/whisper/server.py inside the existing /home/cezen/whisper/venv. Deploy the updated script without creating a new venv:

scp scripts/whisper_http_server.py cezen@172.16.10.64:/home/cezen/whisper/server.py
scp scripts/orphion-whisper.service cezen@172.16.10.64:/tmp/orphion-whisper.service
ssh cezen@172.16.10.64 'sudo mv /tmp/orphion-whisper.service /etc/systemd/system/orphion-whisper.service'

Create /home/cezen/whisper/.env on the VM with the HuggingFace token accepted by pyannote:

HUGGINGFACE_TOKEN=your_token_here
WHISPERX_DIARIZATION=true
WHISPERX_DEVICE=cuda
WHISPERX_COMPUTE_TYPE=float16
WHISPERX_BATCH_SIZE=8
WHISPERX_DIARIZATION_MODEL=pyannote/speaker-diarization-community-1

The HuggingFace account behind the token must be approved for the configured pyannote diarization model. If transcription returns "diarization": "fallback" and the service log mentions a gated repo, visit https://huggingface.co/pyannote/speaker-diarization-community-1 while signed in to that account and accept/request access.

Restart and verify:

ssh cezen@172.16.10.64 'sudo systemctl daemon-reload && sudo systemctl restart orphion-whisper'
curl -sS http://172.16.10.64:8000/health

Expected health shape:

{ "status": "ok", "model": "large-v3", "device": "cuda", "whisperx": true, "diarization": true }