Treatment visionChat
ml/remote/llm::visionChat
Configuration
⬡ llm: ml/remote/llm::RemoteLlm
Inputs
⇥ image: Block<Vec<byte>>
⇥ mime: Block<string>
⇥ prompt: Block<string>
Outputs
↦ error: Block<string>
↦ failed: Block<void>
↦ response: Block<string>
Send an image with a text prompt to a remote vision-capable LLM.
Receives raw image bytes on image, the MIME type on mime ("jpeg", "png",
"gif", or "webp"), and a text question on prompt. Sends both to the provider
as a two-message chat (image first, then text) and emits the full response on
response.
If the provider does not support vision, or if an error occurs, failed and
error are emitted instead.
ℹ️ Not all backends support image input. Tested with OpenAI (gpt-4o) and
Anthropic (claude-*).
graph LR
T("visionChat()")
I["〈🟦〉"] -->|image| T
M["〈🟨〉"] -->|mime| T
P["〈🟨〉"] -->|prompt| T
T -->|response| R["〈🟨〉"]
T -->|failed| F["〈🟦〉"]
T -->|error| E["〈🟨〉"]
style I fill:#ffff,stroke:#ffff
style M fill:#ffff,stroke:#ffff
style P fill:#ffff,stroke:#ffff
style R fill:#ffff,stroke:#ffff
style F fill:#ffff,stroke:#ffff
style E fill:#ffff,stroke:#ffff