Aligned JSON Workflow

The aligned JSON workflow is the main product flow for combining transcription and diarization.

Run the pipeline

whisper-smith data/sample.m4a --align --output data/sample.aligned.json

Outputs:

data/sample.aligned.json
data/sample.transcript.json
data/sample.diarization.json

Output shape

Each aligned transcript segment contains timestamps, text, and the assigned speaker label:

{
  "segments": [
    {
      "start": 0.0,
      "end": 7.08,
      "text": "Hello world.",
      "speaker": "SPEAKER_01"
    }
  ],
  "text": "Hello world."
}

How speakers are assigned

assign_speakers compares each transcript segment with diarization segments and chooses the speaker with the largest time overlap. If no diarization segment overlaps, the transcript segment keeps its existing speaker value.