Treatment decode

ml/models/whisper::decode


Configuration

⬡ whisper: ml/models/whisper::Whisper

Inputs

⇥ audio: Stream<f32>
⇥ ready: Block<void>

Outputs

↦ transcribed: Stream<string>


Decode a continuous stream of PCM audio samples into text using a Whisper model.

Forwards incoming f32 sample batches to the worker thread as they arrive; the worker decodes each complete 30-second window (480 000 samples at 16 kHz) into text and emits the result on transcribed without waiting for the stream to end. Any remaining samples shorter than one window are flushed and decoded when the audio stream closes.

ℹ️ load must have completed successfully before audio is sent, otherwise the audio is silently discarded.

graph LR
     T("decode()")
     R["〈🟦〉"]     -->|ready|       T
     A["🟩 🟩 🟩 …"] -->|audio|      T
     T              -->|transcribed| X["🟩 🟩 …"]

     style R fill:#ffff,stroke:#ffff
     style A fill:#ffff,stroke:#ffff
     style X fill:#ffff,stroke:#ffff
use ml/repos/hf::HfHub
use ml/repos/hf::fetch
use ml/models/whisper::Whisper
use ml/models/whisper::load
use ml/models/whisper::decode
use std/engine/util::startup

treatment example()
  model hub:     HfHub(repo_id = "openai/whisper-tiny")
  model whisper: Whisper()
  input  audio:       Stream<f32>
  output transcribed: Stream<string>
{
    startup()
    fetch[hub=hub]()
    load[whisper=whisper]()
    decode[whisper=whisper]()

    startup.trigger    -> fetch.trigger
    fetch.safetensors  -> load.safetensors
    load.loaded        -> decode.ready
    Self.audio         -> decode.audio
    decode.transcribed -> Self.transcribed
}