Treatment visionChat

ml/remote/llm::visionChat


Configuration

⬡ llm: ml/remote/llm::RemoteLlm

Inputs

⇥ image: Block<Vec<byte>>
⇥ mime: Block<string>
⇥ prompt: Block<string>

Outputs

↦ error: Block<string>
↦ failed: Block<void>
↦ response: Block<string>


Send an image with a text prompt to a remote vision-capable LLM.

Receives raw image bytes on image, the MIME type on mime ("jpeg", "png", "gif", or "webp"), and a text question on prompt. Sends both to the provider as a two-message chat (image first, then text) and emits the full response on response.

If the provider does not support vision, or if an error occurs, failed and error are emitted instead.

ℹ️ Not all backends support image input. Tested with OpenAI (gpt-4o) and Anthropic (claude-*).

graph LR
     T("visionChat()")
     I["〈🟦〉"] -->|image|    T
     M["〈🟨〉"] -->|mime|     T
     P["〈🟨〉"] -->|prompt|   T
     T -->|response| R["〈🟨〉"]
     T -->|failed|   F["〈🟦〉"]
     T -->|error|    E["〈🟨〉"]

     style I fill:#ffff,stroke:#ffff
     style M fill:#ffff,stroke:#ffff
     style P fill:#ffff,stroke:#ffff
     style R fill:#ffff,stroke:#ffff
     style F fill:#ffff,stroke:#ffff
     style E fill:#ffff,stroke:#ffff