Gemini Live first look: Better than talking to Siri, but worse than I’d like

Date:

Share post:


Google launched Gemini Live during its Made By Google event in Mountain View, California, on Tuesday. The feature allows you to have a semi-natural spoken conversation, not typed out, with an AI chatbot powered by Google’s latest large language model. TechCrunch was there to test it out firsthand.

Gemini Live is Google’s answer to OpenAI’s Advanced Voice Mode, ChatGPT’s nearly identical feature that’s current in a limited alpha test. While OpenAI beat Google to the punch by demoing the feature first, Google is the first to roll out the finalized feature.

In my experience, these low latency, verbal features feel much more natural than texting with ChatGPT, or even talking with Siri or Alexa. I found that Gemini Live responded to questions in less than two seconds, and was able to pivot fairly quickly when interrupted. Gemini Live is not perfect, but it’s the best way to use your phone hands-free that I’ve seen yet.

How it works

Before speaking with Gemini Live, the feature lets you choose from 10 voices, compared to just three voices from OpenAI. Google worked with voice actors to create each one. I appreciated the variety there, and found each one to sound very humanlike.

In one example, a Google product manager verbally asked Gemini Live to find family-friendly wineries near Mountain View with outdoor areas and playgrounds nearby, so that kids could potentially come along. That’s a far more complicated task than I’d ask Siri — or Google Search, frankly — but Gemini successfully recommended a spot that met the criteria: Cooper-Garrod Vineyards in Saratoga.

That said, Gemini Live leaves something to be desired. It seemed to hallucinate a nearby playground called Henry Elementary School Playground that is supposedly “10 minutes away” from that vineyard. There are other playgrounds nearby in Saratoga, but the nearest Henry Elementary School is more than a two-hour drive from there. There’s a Henry Ford Elementary School in Redwood City, but it’s 30 minutes away.

Google liked to show off how users can interrupt Gemini Live mid-sentence, and the AI will quickly pivot. The company says this allows users to control the conversation. In practice, this feature doesn’t work perfectly. Sometimes Google’s project managers and Gemini Live were talking over each other, and the AI didn’t seem to pick up on what was said.

Notably, Google is not allowing Gemini Live to sing or mimic any voices outside of the 10 it provides, according to product manager Leland Rechis. The company is likely doing this to avoid run ins with copyright law. Further, Rechis said Google is not focused on getting Gemini Live to understand emotional intonation in a user’s voice – something OpenAI touted during its demo.

Overall, the feature seems like a great way to dive deeply into a subject more naturally than you would with simple Google Search. Google notes that Gemini Live is a step along the way to Project Astra, the fully multimodal AI model the company debuted during Google I/O. For now, Gemini Live is just capable of voice conversations, however, in the future Google wants to add real-time video understanding.



Source link

Lisa Holden
Lisa Holden
Lisa Holden is a news writer for LinkDaddy News. She writes health, sport, tech, and more. Some of her favorite topics include the latest trends in fitness and wellness, the best ways to use technology to improve your life, and the latest developments in medical research.

Recent posts

Related articles

Sam Altman disputes Marc Andreessen’s description of AI meetings with Biden administration

Famed investor Marc Andreessen recently talked about meetings with Biden administration staff who gave him the impression...

EV startup Canoo places remaining employees on a ‘mandatory unpaid break’

Struggling electric van startup Canoo has placed its remaining employees on what it’s calling a “mandatory unpaid...

After causing outrage on the first day of Y Combinator, AI code editor PearAI lands $1M seed

On the first day of Y Combinator’s winter 2024 session – right after orientation and a photo...

Third member of LockBit ransomware gang has been arrested

U.S. prosecutors in New Jersey on Friday publicly announced charges against Rostislav Panev, 51, a dual Russian-Israeli...

Feds clear the way for robotaxis without steering wheels and pedals

The National Highway Traffic Safety Administration (NHTSA) on Friday proposed a new national framework that could make...

VCs pledge not to take money from Russia or China, and Databricks raises a humongous round

Welcome to Startups Weekly — your weekly recap of everything you can’t miss from the world of...

Nvidia clears regulatory hurdle to acquire Run:ai

Chip company Nvidia gets the green light from the European Union to complete its acquisition of Run:ai. The...

Google is expanding Gemini’s in-depth research mode to 40 languages

Google said Friday that the company is expanding Gemini’s latest in-depth research mode to 40 more languages. The...