Qwen2-VL 72B Instruct

$0.0016/1k

qwen/qwen-2-vl-72b-instruct

上下文长度: 4,096 text+image->text Qwen 2024-09-18 更新

Qwen2 VL 72B is a multimodal LLM from the Qwen Team with the following key enhancements: SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc. Understanding videos of 20min+: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc. Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions. Multilingual Support: to serve global users, besides English and Chinese, Qwen2-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc. For more details, see this blog post and GitHub repo. Usage of this model is subject to Tongyi Qianwen LICENSE AGREEMENT.

模型参数

架构信息

模态: text+image->text

Tokenizer: Qwen

限制信息

上下文长度: 4,096

Qwen2-VL 72B Instruct

模型参数

架构信息

限制信息

相关模型

Rocinante 12B

Qwen: QwQ 32B Preview

Qwen: QvQ 72B Preview

Qwen2.5 Coder 32B Instruct

Qwen2.5 7B Instruct

Qwen2.5 72B Instruct