Model Mistral

ml/models/mistral::Mistral


Parameters

↳ const hidden_size: u64 = 4096
↳ const intermediate_size: u64 = 14336
↳ const max_new_tokens: u64 = 512
↳ const max_position_embeddings: u64 = 32768
↳ const num_attention_heads: u64 = 32
↳ const num_hidden_layers: u64 = 32
↳ const num_key_value_heads: u64 = 8
↳ const repeat_last_n: u64 = 64
↳ const repeat_penalty: f32 = 1.1
↳ const rms_norm_eps: f64 = 0.00001
↳ const rope_theta: f64 = 10000
↳ const sliding_window: u64 = 4096
↳ const temperature: f64 = 0.8
↳ const top_p: f64 = 0.9
↳ const vocab_size: u64 = 32000


Mistral large language model configuration.

Holds the architecture and inference hyper-parameters for a Mistral model. Weights and tokenizer are not embedded here — use an HfHub model together with fetch and load to supply them at runtime, then call generate to run inference.

Architecture parameters (defaults match Mistral-7B-v0.1):

  • vocab_size: vocabulary size.
  • hidden_size: hidden dimension.
  • intermediate_size: feed-forward intermediate dimension.
  • num_hidden_layers: number of transformer layers.
  • num_attention_heads: number of attention heads.
  • num_key_value_heads: number of key/value heads (grouped-query attention).
  • max_position_embeddings: maximum sequence length.
  • rms_norm_eps: RMS-norm epsilon.
  • rope_theta: rotary positional embedding theta.
  • sliding_window: sliding window attention size (0 disables it).

Inference parameters:

  • temperature: sampling temperature (0.0 selects the highest-probability token).
  • top_p: nucleus sampling cutoff (0.0 disables nucleus sampling).
  • repeat_penalty: logit penalty applied to recently seen tokens.
  • repeat_last_n: number of past tokens considered for the repeat penalty.
  • max_new_tokens: maximum tokens generated per prompt.

ℹ️ Use Mistral together with HfHub, fetch, load, and generate. load must complete successfully before generate will produce output.

use ml/repos/hf::HfHub
use ml/repos/hf::fetch
use ml/models/mistral::Mistral
use ml/models/mistral::load
use ml/models/mistral::generate
use std/engine/util::startup

treatment example()
  model hub:     HfHub(repo_id = "mistralai/Mistral-7B-v0.1")
  model mistral: Mistral(temperature = 0.7, max_new_tokens = 256)
  input  prompt:    Stream<string>
  output generated: Stream<string>
{
    startup()
    fetch[hub=hub]()
    load[mistral=mistral]()
    generate[mistral=mistral]()

    startup.trigger    -> fetch.trigger
    fetch.safetensors  -> load.safetensors
    fetch.tokenizer    -> load.tokenizer
    Self.prompt        -> generate.prompt
    generate.generated -> Self.generated
}