Model RemoteLlm

ml/remote/llm::RemoteLlm


Parameters

↳ const api_key: Option<string>
↳ const backend: string = "mistral"
↳ const base_url: string = ""
↳ const max_tokens: Option<u64>
↳ const model: string = ""
↳ const system: string = ""
↳ const temperature: Option<f32>
↳ const timeout: Option<u64>
↳ const top_p: Option<f32>


Remote LLM provider configuration.

Holds connection and inference parameters for a remote large language model service.

  • backend: provider name (default "mistral"); see the table below.
  • api_key: API key for authentication (omit for Ollama or unauthenticated endpoints).
  • base_url: override the provider base URL — required for Ollama, Azure OpenAI, or custom OpenAI-compatible endpoints; see the table below.
  • model: model identifier — see the table below for recommended values per backend.
  • system: system prompt injected at the start of every conversation.
  • max_tokens: maximum tokens to generate per response (omit to use the backend default).
  • temperature: sampling temperature 0.0–2.0 (omit to use the backend default; some backends reject temperature and top_p being set simultaneously).
  • top_p: nucleus sampling cutoff 0.0–1.0 (omit to use the backend default).
  • timeout: request timeout in seconds (omit to use the backend default).

Backends

backendapi_keybase_urlExample modelModel list
"mistral"Mistral API key(built-in)"mistral-small-latest"https://docs.mistral.ai/getting-started/models/models_overview/
"openai"OpenAI API key(built-in)"gpt-4.1-nano"https://platform.openai.com/docs/models
"anthropic"Anthropic API key(built-in)"claude-sonnet-4-6"https://docs.anthropic.com/en/docs/about-claude/models/all-models
"google"Google API key(built-in)"gemini-2.5-flash"https://ai.google.dev/gemini-api/docs/models
"groq"Groq API key(built-in)"llama-3.3-70b-versatile"https://console.groq.com/docs/models
"deepseek"DeepSeek API key(built-in)"deepseek-chat"https://api-docs.deepseek.com/quick_start/pricing
"xai"xAI API key(built-in)"grok-2-latest"https://docs.x.ai/docs/models
"cohere"Cohere API key(built-in)"command-a-03-2025"https://docs.cohere.com/docs/models
"openrouter"OpenRouter key(built-in)"provider/model-name"https://openrouter.ai/models
"huggingface"HF token(built-in)any HF Inference model IDhttps://huggingface.co/models?pipeline_tag=text-generation&inference=warm
"ollama"(omit)"http://localhost:11434""llama3.2"https://ollama.com/library
"azure-openai"Azure API key"https://<resource>.openai.azure.com""gpt-4o"https://learn.microsoft.com/azure/ai-services/openai/concepts/models
"aws-bedrock"(AWS env vars)(built-in)Bedrock model ARN or IDhttps://docs.aws.amazon.com/bedrock/latest/userguide/models-supported.html

ℹ️ Use RemoteLlm together with chat, stream, or visionChat.

use ml/remote/llm::RemoteLlm
use ml/remote/llm::stream
use std/engine/util::startup

treatment example()
  model llm: RemoteLlm(
    backend    = "mistral",
    api_key    = "...",
    model      = "mistral-small-latest",
    system     = "You are a helpful assistant.",
    max_tokens = 2048
  )
  input  prompt: Stream<string>
  output token:  Stream<string>
{
    stream[llm=llm]()
    Self.prompt -> stream.prompt,token -> Self.token
}