Model Mistral
ml/models/mistral::Mistral
Parameters
↳ const hidden_size: u64 = 4096
↳ const intermediate_size: u64 = 14336
↳ const max_new_tokens: u64 = 512
↳ const max_position_embeddings: u64 = 32768
↳ const num_attention_heads: u64 = 32
↳ const num_hidden_layers: u64 = 32
↳ const num_key_value_heads: u64 = 8
↳ const repeat_last_n: u64 = 64
↳ const repeat_penalty: f32 = 1.1
↳ const rms_norm_eps: f64 = 0.00001
↳ const rope_theta: f64 = 10000
↳ const sliding_window: u64 = 4096
↳ const temperature: f64 = 0.8
↳ const top_p: f64 = 0.9
↳ const vocab_size: u64 = 32000
Mistral large language model configuration.
Holds the architecture and inference hyper-parameters for a Mistral model. Weights and
tokenizer are not embedded here — use an HfHub model together with fetch and load
to supply them at runtime, then call generate to run inference.
Architecture parameters (defaults match Mistral-7B-v0.1):
vocab_size: vocabulary size.hidden_size: hidden dimension.intermediate_size: feed-forward intermediate dimension.num_hidden_layers: number of transformer layers.num_attention_heads: number of attention heads.num_key_value_heads: number of key/value heads (grouped-query attention).max_position_embeddings: maximum sequence length.rms_norm_eps: RMS-norm epsilon.rope_theta: rotary positional embedding theta.sliding_window: sliding window attention size (0disables it).
Inference parameters:
temperature: sampling temperature (0.0selects the highest-probability token).top_p: nucleus sampling cutoff (0.0disables nucleus sampling).repeat_penalty: logit penalty applied to recently seen tokens.repeat_last_n: number of past tokens considered for the repeat penalty.max_new_tokens: maximum tokens generated per prompt.
ℹ️ Use Mistral together with HfHub, fetch, load, and generate.
load must complete successfully before generate will produce output.
use ml/repos/hf::HfHub
use ml/repos/hf::fetch
use ml/models/mistral::Mistral
use ml/models/mistral::load
use ml/models/mistral::generate
use std/engine/util::startup
treatment example()
model hub: HfHub(repo_id = "mistralai/Mistral-7B-v0.1")
model mistral: Mistral(temperature = 0.7, max_new_tokens = 256)
input prompt: Stream<string>
output generated: Stream<string>
{
startup()
fetch[hub=hub]()
load[mistral=mistral]()
generate[mistral=mistral]()
startup.trigger -> fetch.trigger
fetch.safetensors -> load.safetensors
fetch.tokenizer -> load.tokenizer
Self.prompt -> generate.prompt
generate.generated -> Self.generated
}