Treatment decode

ml/models/whisper::decode

Configuration

⬡ whisper: ml/models/whisper::Whisper

Inputs

⇥ audio: Stream<f32>
⇥ ready: Block<void>

Outputs

↦ transcribed: Stream<string>

Decode a continuous stream of PCM audio samples into text using a Whisper model.

Forwards incoming f32 sample batches to the worker thread as they arrive; the worker decodes each complete 30-second window (480 000 samples at 16 kHz) into text and emits the result on transcribed without waiting for the stream to end. Any remaining samples shorter than one window are flushed and decoded when the audio stream closes.

ℹ️ load must have completed successfully before audio is sent, otherwise the audio is silently discarded.

graph LR
     T("decode()")
     R["〈🟦〉"] -->|ready| T
     A["🟩 🟩 🟩 …"] -->|audio| T
     T -->|transcribed| X["🟩 🟩 …"]

     style R fill:#ffffff,stroke:#ffffff
     style A fill:#ffffff,stroke:#ffffff
     style X fill:#ffffff,stroke:#ffffff

use ml/repos/hf::HfHub
use ml/repos/hf::fetch
use ml/models/whisper::Whisper
use ml/models/whisper::load
use ml/models/whisper::decode
use std/engine/util::startup

treatment example()
  model hub:     HfHub(repo_id = "openai/whisper-tiny")
  model whisper: Whisper()
  input  audio:       Stream<f32>
  output transcribed: Stream<string>
{
    startup()
    fetch[hub=hub]()
    load[whisper=whisper]()
    decode[whisper=whisper]()

    startup.trigger    -> fetch.trigger
    fetch.safetensors  -> load.safetensors
    load.loaded        -> decode.ready
    Self.audio         -> decode.audio
    decode.transcribed -> Self.transcribed
}

Keyboard shortcuts

Mélodium Standard Reference

Treatment decode

Configuration

Inputs

Outputs