Treatment visionChat

ml/remote/llm::visionChat

Inputs

⇥ image: Block<Vec<byte>>
⇥ mime: Block<string>
⇥ prompt: Block<string>

Outputs

↦ error: Block<string>
↦ failed: Block<void>
↦ response: Block<string>

Send an image with a text prompt to a remote vision-capable LLM.

Receives raw image bytes on image, the MIME type on mime ("jpeg", "png", "gif", or "webp"), and a text question on prompt. Sends both to the provider as a two-message chat (image first, then text) and emits the full response on response.

If the provider does not support vision, or if an error occurs, failed and error are emitted instead.

ℹ️ Not all backends support image input. Tested with OpenAI (gpt-4o) and Anthropic (claude-*).

graph LR
     T("visionChat()")
     I["〈🟦〉"] -->|image| T
     M["〈🟨〉"] -->|mime| T
     P["〈🟨〉"] -->|prompt| T
     T -->|response| R["〈🟨〉"]
     T -->|failed| F["〈🟦〉"]
     T -->|error| E["〈🟨〉"]

     style I fill:#ffffff,stroke:#ffffff
     style M fill:#ffffff,stroke:#ffffff
     style P fill:#ffffff,stroke:#ffffff
     style R fill:#ffffff,stroke:#ffffff
     style F fill:#ffffff,stroke:#ffffff
     style E fill:#ffffff,stroke:#ffffff

Mélodium Standard Reference

Treatment visionChat

Configuration

Inputs

Outputs

Keyboard shortcuts

Mélodium Standard Reference

Treatment visionChat

Configuration

Inputs

Outputs