Vivollo
Multimodal

Your agent sees what customers send.

When someone sends a photo — a product, a damaged item, an error screen — Vivollo passes it to a vision model so the agent can recognize it and reply in context. The image joins the same agentic loop, so a snapshot can match a catalog item, start a return or hand off.

Sees photos · in-thread context · agentic · graceful fallback

visionlive
customer photo
Do you have this one?
detectedhandbag · quiltedhardware · gold chaincolour · black
That's our Mia Quilted Bag in Black — it's in stock. Want me to add it to your cart?Mia Quilted BagBAG-MIA-BLK · €189in stock

The agent sees what your customer sees — and acts on it.

multimodal·in-thread context·agentic loop·graceful fallback
01Seesvision model

Show, don't describe

Customers rarely know the model number — but they can snap a photo. Vivollo passes the image to a vision model so the agent recognizes the product, reads the label or spots what's wrong, instead of asking twenty questions.

  • product photorecognized
  • error screenread
  • damaged itemspotted
  • label + serialextracted
02Actssame loop

A photo can trigger a tool

What the agent sees feeds the same agentic loop as text. A product photo becomes a catalog search; a damaged item becomes a return; an error screen becomes a fix — or a clean handoff with the image attached.

  • photo → catalog search
  • damage → return / warranty
  • screenshot → fix
  • context → handoff
03Resilienttier-gated

Degrades gracefully

Vision is tier-gated and looks at the most recent images in the thread. If a provider hiccups or an image can't be read, the agent keeps the conversation moving with text instead of breaking the reply.

  • recent imagesin thread
  • tier-gatedby plan
  • provider errorgraceful
  • every channelattachments

From photo to resolution

How an image becomes an answer

A picture enters the same loop as a message — perceived, reasoned over and acted on.

  1. 01

    Receive the image

    A customer attaches a photo on any channel; it's captured and shown in the inbox.

  2. 02

    See it

    The image is passed to a vision model that reads objects, text and detail.

  3. 03

    Reason in context

    What it sees joins the thread — the agent reasons over the image and the conversation together.

  4. 04

    Act on it

    Match a product, start a return, fix the error — or hand off with the image attached.

Other attachments — video, audio, files — are captured and shown to your team, too.

Ready to meet your AI agent?

Book a demo and we'll build a working agent on your real data — across WhatsApp, Instagram and your website. Live in days.

Request a Demo