Model Whisper
ml/models/whisper::Whisper
Parameters
↳ const d_model: u64 = 384
↳ const decoder_attention_heads: u64 = 6
↳ const decoder_layers: u64 = 4
↳ const encoder_attention_heads: u64 = 6
↳ const encoder_layers: u64 = 4
↳ const max_source_positions: u64 = 1500
↳ const max_target_positions: u64 = 448
↳ const num_mel_bins: u64 = 80
↳ const vocab_size: u64 = 51865
Whisper automatic speech recognition model configuration.
Holds the architecture parameters for a Whisper model. Weights are not embedded
here — use an HfHub model together with fetch and load to supply them at
runtime, then call decode to transcribe a stream of PCM audio samples.
Architecture parameters (defaults match openai/whisper-tiny):
num_mel_bins: number of mel filter banks.max_source_positions: maximum number of audio context positions.d_model: model hidden dimension.encoder_attention_heads: number of encoder attention heads.encoder_layers: number of encoder layers.vocab_size: vocabulary size.max_target_positions: maximum number of decoder output positions.decoder_attention_heads: number of decoder attention heads.decoder_layers: number of decoder layers.
ℹ️ Use Whisper together with HfHub, fetch, load, and decode.
load must complete successfully before decode will produce output.
use ml/repos/hf::HfHub
use ml/repos/hf::fetch
use ml/models/whisper::Whisper
use ml/models/whisper::load
use ml/models/whisper::decode
use std/engine/util::startup
treatment example()
model hub: HfHub(repo_id = "openai/whisper-tiny")
model whisper: Whisper()
input audio: Stream<f32>
output transcribed: Stream<string>
{
startup()
fetch[hub=hub]()
load[whisper=whisper]()
decode[whisper=whisper]()
startup.trigger -> fetch.trigger
fetch.safetensors -> load.safetensors
load.loaded -> decode.ready
Self.audio -> decode.audio
decode.transcribed -> Self.transcribed
}