Multimodal

Your agent sees what customers send.

When someone sends a photo — a product, a damaged item, an error screen — Vivollo passes it to a vision model so the agent can recognize it and reply in context. The image joins the same agentic loop, so a snapshot can match a catalog item, start a return or hand off.

Request a demo See how it works

Sees photos · in-thread context · agentic · graceful fallback

visionlive

customer photo

Do you have this one?

detectedhandbag · quiltedhardware · gold chaincolour · black

That's our Mia Quilted Bag in Black — it's in stock. Want me to add it to your cart?Mia Quilted BagBAG-MIA-BLK · €189in stock

The agent sees what your customer sees — and acts on it.

multimodal·in-thread context·agentic loop·graceful fallback

01Seesvision model

Show, don't describe

Customers rarely know the model number — but they can snap a photo. Vivollo passes the image to a vision model so the agent recognizes the product, reads the label or spots what's wrong, instead of asking twenty questions.

product photorecognized
error screenread
damaged itemspotted
label + serialextracted

02Actssame loop

A photo can trigger a tool

What the agent sees feeds the same agentic loop as text. A product photo becomes a catalog search; a damaged item becomes a return; an error screen becomes a fix — or a clean handoff with the image attached.

photo → catalog search
damage → return / warranty
screenshot → fix
context → handoff

03Resilienttier-gated

Degrades gracefully

Vision is tier-gated and looks at the most recent images in the thread. If a provider hiccups or an image can't be read, the agent keeps the conversation moving with text instead of breaking the reply.

recent imagesin thread
tier-gatedby plan
provider errorgraceful
every channelattachments

From photo to resolution

How an image becomes an answer

A picture enters the same loop as a message — perceived, reasoned over and acted on.

01
Receive the image
A customer attaches a photo on any channel; it's captured and shown in the inbox.
02
See it
The image is passed to a vision model that reads objects, text and detail.
03
Reason in context
What it sees joins the thread — the agent reasons over the image and the conversation together.
04
Act on it
Match a product, start a return, fix the error — or hand off with the image attached.

Other attachments — video, audio, files — are captured and shown to your team, too.

Works with

Agentic AIActs on real data — tools, memory, resolution.Unified InboxEvery channel in one inbox — context follows the customer.Knowledge EngineSourced answers from your real content.

Ready to meet your AI agent?

Book a demo and we'll build a working agent on your real data — across WhatsApp, Instagram and your website. Live in days.

Request a Demo

Your agent sees what customers send.

Show, don't describe

A photo can trigger a tool

Degrades gracefully

How an image becomes an answer

Receive the image

See it

Reason in context

Act on it

Works with

Ready to meet your AI agent?