Treatment generate

ml/models/mistral::generate

Configuration

⬡ mistral: ml/models/mistral::Mistral

Outputs

↦ generated: Stream<string>

Generate text from a Mistral model, one token fragment per stream item.

For each string received on prompt, enqueues an inference request and emits the decoded token strings on generated as they arrive, one string per token. Generation for a single prompt ends when the model produces </s> or when max_new_tokens is reached. The next prompt is then dequeued.

Conversation history is preserved across turns within the same generate instance: each prompt extends the KV cache rather than resetting it. Multiple concurrent generate instances share the single worker thread; KV state is saved and restored on context switches, with no save/restore cost when only one conversation is active.

ℹ️ load must have completed successfully before any prompt is sent, otherwise prompts are silently discarded.

graph LR
     T("generate()")
     P["🟩 🟩 …"] -->|prompt| T
     T -->|generated| G["🟩 🟩 🟩 🟩 …"]

     style P fill:#ffffff,stroke:#ffffff
     style G fill:#ffffff,stroke:#ffffff

use ml/repos/hf::HfHub
use ml/repos/hf::fetch
use ml/models/mistral::Mistral
use ml/models/mistral::load
use ml/models/mistral::generate
use std/engine/util::startup

treatment example()
  model hub:     HfHub(repo_id = "mistralai/Mistral-7B-v0.1")
  model mistral: Mistral(temperature = 0.7, max_new_tokens = 256)
  input  prompt:    Stream<string>
  output generated: Stream<string>
{
    startup()
    fetch[hub=hub]()
    load[mistral=mistral]()
    generate[mistral=mistral]()

    startup.trigger    -> fetch.trigger
    fetch.safetensors  -> load.safetensors
    fetch.tokenizer    -> load.tokenizer
    Self.prompt        -> generate.prompt
    generate.generated -> Self.generated
}

Mélodium Standard Reference

Treatment generate

Configuration

Inputs

Outputs

Keyboard shortcuts

Mélodium Standard Reference

Treatment generate

Configuration

Inputs

Outputs