- One Minute Data
- Posts
- Alibaba Introduces QVQ-Max: An AI That Sees, Understands, and Thinks
Alibaba Introduces QVQ-Max: An AI That Sees, Understands, and Thinks
Alibaba's new AI model, QVQ-Max, processes images and videos, solves complex tasks, and enhances real-world interactions with visual reasoning.
A new AI model called QVQ-Max has been released by Alibaba. It is designed to understand photos and videos, then analyse them to provide solutions.

This model has been introduced to bridge the gap between text-based AI and real-world understanding. Through visual reasoning, images can be processed, key details can be identified, and insights can be provided. It has been designed for various tasks such as illustration design, video script generation, and role-playing.
Unlike other AI chatbots, QVQ-Max has been built with visual capabilities. It can assist in solving math and physics problems with diagrams. Cooking guidance can also be provided based on recipe images.
This model is the first version, and Alibaba has shared its plans for improvement. Image recognition accuracy is expected to improve with grounding techniques. Multi-step tasks and complex problems will be handled better, enabling it to operate devices and play games. Future updates will also expand its interaction from text-based responses to tool verification and visual generation.