Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Treatment visionChat

ml/remote/llm::visionChat


Configuration

⬡ llm: ml/remote/llm::RemoteLlm

Inputs

⇥ image: Block<Vec<byte>>
⇥ mime: Block<string>
⇥ prompt: Block<string>

Outputs

↦ error: Block<string>
↦ failed: Block<void>
↦ response: Block<string>


Send an image with a text prompt to a remote vision-capable LLM.

Receives raw image bytes on image, the MIME type on mime ("jpeg", "png", "gif", or "webp"), and a text question on prompt. Sends both to the provider as a two-message chat (image first, then text) and emits the full response on response.

If the provider does not support vision, or if an error occurs, failed and error are emitted instead.

ℹ️ Not all backends support image input. Tested with OpenAI (gpt-4o) and Anthropic (claude-*).

graph LR
     T("visionChat()")
     I["〈🟦〉"] -->|image| T
     M["〈🟨〉"] -->|mime| T
     P["〈🟨〉"] -->|prompt| T
     T -->|response| R["〈🟨〉"]
     T -->|failed| F["〈🟦〉"]
     T -->|error| E["〈🟨〉"]

     style I fill:#ffffff,stroke:#ffffff
     style M fill:#ffffff,stroke:#ffffff
     style P fill:#ffffff,stroke:#ffffff
     style R fill:#ffffff,stroke:#ffffff
     style F fill:#ffffff,stroke:#ffffff
     style E fill:#ffffff,stroke:#ffffff