- add WhisperX diarization support to the Whisper VM server - normalize speaker timestamp segments from Whisper responses - document Hugging Face/pyannote VM setup and health checks - show diarized speaker transcript blocks in record and transcript views - group consecutive segments from the same speaker - remove duplicate paragraph transcript display when diarized segments exist - let diarized transcript content expand without an inner scrollbar
74 lines
2.3 KiB
Markdown
74 lines
2.3 KiB
Markdown
# Faster-Whisper VM Setup
|
|
|
|
The sample env expects the Whisper service to run on the VM at `172.16.10.64:8000`.
|
|
|
|
The backend expects a Whisper-compatible HTTP service:
|
|
|
|
```env
|
|
WHISPER_VM_IP=172.16.10.64
|
|
WHISPER_VM_PORT=8000
|
|
WHISPER_API_URL=http://172.16.10.64:8000
|
|
WHISPER_TRANSCRIBE_PATH=/transcribe
|
|
WHISPER_HEALTH_PATH=/health
|
|
WHISPER_FILE_FIELD=file
|
|
WHISPER_ALLOW_MOCK=false
|
|
```
|
|
|
|
Expected endpoints:
|
|
|
|
- `GET /health` returns any 2xx status when the VM is ready. For WhisperX diarization, it should
|
|
also report `"whisperx": true` and `"diarization": true`.
|
|
- `POST /transcribe` accepts multipart audio and returns one of:
|
|
|
|
```json
|
|
{
|
|
"transcript_text": "Meeting transcript...",
|
|
"language": "en",
|
|
"duration": 123.45,
|
|
"timestamps": [{ "speaker": "Speaker 1", "start": 0, "end": 5, "text": "Hello" }]
|
|
}
|
|
```
|
|
|
|
The API retries failed requests, applies `WHISPER_TIMEOUT_MS`, and marks jobs as failed when the VM
|
|
is unavailable.
|
|
|
|
## Enable WhisperX diarization on the VM
|
|
|
|
The systemd unit runs `/home/cezen/whisper/server.py` inside the existing
|
|
`/home/cezen/whisper/venv`. Deploy the updated script without creating a new venv:
|
|
|
|
```bash
|
|
scp scripts/whisper_http_server.py cezen@172.16.10.64:/home/cezen/whisper/server.py
|
|
scp scripts/orphion-whisper.service cezen@172.16.10.64:/tmp/orphion-whisper.service
|
|
ssh cezen@172.16.10.64 'sudo mv /tmp/orphion-whisper.service /etc/systemd/system/orphion-whisper.service'
|
|
```
|
|
|
|
Create `/home/cezen/whisper/.env` on the VM with the HuggingFace token accepted by pyannote:
|
|
|
|
```env
|
|
HUGGINGFACE_TOKEN=your_token_here
|
|
WHISPERX_DIARIZATION=true
|
|
WHISPERX_DEVICE=cuda
|
|
WHISPERX_COMPUTE_TYPE=float16
|
|
WHISPERX_BATCH_SIZE=8
|
|
WHISPERX_DIARIZATION_MODEL=pyannote/speaker-diarization-community-1
|
|
```
|
|
|
|
The HuggingFace account behind the token must be approved for the configured pyannote diarization
|
|
model. If transcription returns `"diarization": "fallback"` and the service log mentions a gated
|
|
repo, visit `https://huggingface.co/pyannote/speaker-diarization-community-1` while signed in to
|
|
that account and accept/request access.
|
|
|
|
Restart and verify:
|
|
|
|
```bash
|
|
ssh cezen@172.16.10.64 'sudo systemctl daemon-reload && sudo systemctl restart orphion-whisper'
|
|
curl -sS http://172.16.10.64:8000/health
|
|
```
|
|
|
|
Expected health shape:
|
|
|
|
```json
|
|
{ "status": "ok", "model": "large-v3", "device": "cuda", "whisperx": true, "diarization": true }
|
|
```
|