Alibaba’s Qwen team releases AI models that can control PCs and phones

Date:

Share post:


Chinese AI lab DeepSeek might be getting the bulk of the tech industry’s attention this week. But one of its top domestic rivals, Alibaba, isn’t sitting idly by.

Alibaba’s Qwen team on Monday released a new family of AI models, Qwen2.5-VL, that can perform a number of text and image analysis tasks. The models can parse files, understand videos, and count objects in images, as well as control a PC — similar to the model powering OpenAI’s recently launched Operator.

Per the Qwen team’s benchmarking, the best Qwen2.5-VL model beats OpenAI’s GPT-4o, Anthropic’s Claude 3.5 Sonnet, and Google’s Gemini 2.0 Flash on a range of video understanding, math, document analysis, and question-answering evaluations.

Image Credits:Alibaba

Qwen2.5-VL, which is available to test in Alibaba’s Qwen Chat app and to download from AI dev platform Hugging Face, can analyze charts and graphics, extract data from scans of invoices and forms, and “comprehend” multiple-hours-long videos, the Qwen team says. Qwen2.5-VL can also recognize “IPs from film and TV series, as well as a wide variety of products,” per the team — suggesting that the models might’ve been trained in part on copyrighted works.

Qwen2.5-VL, being AI developed by a Chinese company, has certain restrictions on the topics it will discuss — at least in Qwen Chat. When I asked the largest and most capable Qwen2.5-VL model, Qwen2.5-VL-72B, to talk about “Xi Jinping’s mistakes,” Qwen Chat threw an error message.

China’s internet regulator benchmarks many models developed in the country to ensure their responses “embody core socialist values.” Many Chinese AI systems decline to respond to topics that might raise the ire of regulators, such as Taiwan’s autonomy.

One of Qwen2.5-VL’s more interesting features is its ability to interact with software — both on PCs and mobile devices. A video posted on X by Philipp Schmid, a technical lead at Hugging Face, Qwen2.5-VL launching the Booking.com app for Android and booking a flight from Chongqing to Beijing.

In the video below, a Qwen2.5-VL model controls apps on a Linux desktop — but doesn’t seem to accomplish much beyond switching tabs. Perhaps tellingly, Qwen’s benchmarking shows Qwen2.5-VL scoring poorly on OSWorld, a benchmark that tries to mimic a real computer environment.

The two smaller, less sophisticated models in the Qwen2.5-VL series, Qwen2.5-VL-3B and Qwen2.5-VL-7B, are available under a permissive license. The flagship Qwen2.5-VL-72B, however, is under Alibaba’s custom license, which requires that companies and devs with more than 100 million monthly active users request permission from Qwen/Alibaba before deploying the model commercially.





Source link

Lisa Holden
Lisa Holden
Lisa Holden is a news writer for LinkDaddy News. She writes health, sport, tech, and more. Some of her favorite topics include the latest trends in fitness and wellness, the best ways to use technology to improve your life, and the latest developments in medical research.

Recent posts

Related articles

Airbnb co-founder Joe Gebbia takes wraps off his first assignment for DOGE

Almost two weeks after The New York Times reported that Airbnb co-founder Joe Gebbia had joined Elon...

2025 TechCrunch Events Calendar

For two decades, TechCrunch has provided a front row view to the future of technology, shaping conversations...

Sequoia’s Roelof Botha warns ‘chumps’ not to buy into SPVs

One of Sequoia’s most prominent investors, managing partner Roelof Botha, sees signs of another greed cycle brewing...

SEC says meme coins are not securities

The Securities and Exchange Commission issued guidance on Thursday saying it does not view most meme coins,...

Meta is reportedly planning a standalone AI chatbot app

Meta reportedly plans to release a standalone app for its AI assistant, Meta AI, in a bid...

Snowflake grows startup accelerator with $200M in new capital

Snowflake plans to expand its startup accelerator with $200 million in additional commitments, the tech giant that...

Meta fires around 20 employees for leaking confidential information

Meta has fired “roughly” 20 employees for leaking confidential information, The Verge reports. “We tell employees when...

Waymo has doubled its weekly robotaxi rides in less than a year

Waymo is logging more than 200,000 paid robotaxi rides every week, according to Alphabet CEO Sundar Pichai,...