Edit

Alibaba Launches QVQ-72B: Vision-Based AI with Advanced Reasoning

  • 26 Dec 2024 11:19 AM
  • Alibaba, QVQ-72B, AI model, AI in research, AI technology

Alibaba’s Qwen research team has introduced a new experimental open-source AI model, the QVQ-72B, designed to handle both vision-based analysis and advanced reasoning tasks. The model combines the abilities of analyzing visual information from images and answering complex queries through reasoning-focused techniques. According to internal benchmarks, QVQ-72B surpassed OpenAI’s o1 model on the MathVista (mini) benchmark, scoring 71.4% compared to o1’s 71.0%.

The model also scored 70.3% on the Multimodal Massive Multi-task Understanding (MMMU) benchmark, showcasing its impressive capabilities. Despite its advancements, QVQ-72B has some limitations, including issues with language switching and recursive reasoning loops, which can impact the output quality. As part of Alibaba’s growing lineup of open-source AI models, including the QwQ-32B, the QVQ-72B represents a significant step toward integrating vision and reasoning functions in a single AI system, setting a new standard for multimodal AI applications.

AD