Why AI can’t spell ‘strawberry’

Date:

Share post:


How many times does the letter R appear in the word “strawberry?” According to formidable AI products like GPT-4o and Claude, the answer is twice.

Large language models can write essays and solve equations in seconds. They can synthesize terabytes of data faster than humans can open up a book. Yet, these seemingly omniscient AIs sometimes fail so spectacularly that the mishap turns into a viral meme, and we all rejoice in relief that maybe, there’s still time before we must bow down to our new AI overlords.

The failure of large language models to understand the concepts of letters and syllables is indicative of a larger truth that we often forget: These things don’t have brains. They do not think like we do. They are not human, nor even particularly humanlike.

Most LLMs are built on transformers, a kind of deep learning architecture. Transformer models break text into tokens, which can be full words, syllables, or letters, depending on the model.

“LLMs are based on this transformer architecture, which notably is not actually reading text. What happens when you input a prompt is that it’s translated into an encoding,” Matthew Guzdial, an AI researcher and assistant professor at the University of Alberta, told TechCrunch. “When it sees the word ‘the,’ it has this one encoding of what ‘the’ means, but it does not know about ‘T,’ ‘H,’ ‘E.’”

This is because the transformers are not able to take in or output actual text efficiently. Instead, the text is converted into numerical representations of itself, which is then contextualized to help the AI come up with a logical response. In other words, the AI might know that the tokens “straw” and “berry” make up “strawberry,” but it may not understand that “strawberry” is composed of the letters “s,” “t,” “r,” “a,” “w,” “b,” “e,” “r,” “r,” and “y,” in that specific order. Thus, it cannot tell you how many letters — let alone how many “r”s — appear in the word “strawberry.”

This isn’t an easy issue to fix, since it’s embedded into the very architecture that makes these LLMs work.

TechCrunch’s Kyle Wiggers dug into this problem last month and spoke to Sheridan Feucht, a PhD student at Northeastern University studying LLM interpretability.

“It’s kind of hard to get around the question of what exactly a ‘word’ should be for a language model, and even if we got human experts to agree on a perfect token vocabulary, models would probably still find it useful to ‘chunk’ things even further,” Feucht told TechCrunch. “My guess would be that there’s no such thing as a perfect tokenizer due to this kind of fuzziness.”

This problem becomes even more complex as an LLM learns more languages. For example, some tokenization methods might assume that a space in a sentence will always precede a new word, but many languages like Chinese, Japanese, Thai, Lao, Korean, Khmer and others do not use spaces to separate words. Google DeepMind AI researcher Yennie Jun found in a 2023 study that some languages need up to ten times as many tokens as English to communicate the same meaning.

“It’s probably best to let models look at characters directly without imposing tokenization, but right now that’s just computationally infeasible for transformers,” Feucht said.

Image generators like Midjourney and DALL-E don’t use the transformer architecture that lies beneath the hood of text generators like ChatGPT. Instead, image generators usually use diffusion models, which reconstruct an image from noise. Diffusion models are trained on large databases of images, and they’re incentivized to try to recreate something like what they learned from training data.

Image Credits: Adobe Firefly

Asmelash Teka Hadgu, co-founder of Lesan and a fellow at the DAIR Institute, told TechCrunch, “Image generators tend to perform much better on artifacts like cars and people’s faces, and less so on smaller things like fingers and handwriting.”

This could be because these smaller details don’t often appear as prominently in training sets as concepts like how trees usually have green leaves. The problems with diffusion models might be easier to fix than the ones plaguing transformers, though. Some image generators have improved at representing hands, for example, by training on more images of real, human hands.

“Even just last year, all these models were really bad at fingers, and that’s exactly the same problem as text,” Guzdial explained. “They’re getting really good at it locally, so if you look at a hand with six or seven fingers on it, you could say, ‘Oh wow, that looks like a finger.’ Similarly, with the generated text, you could say, that looks like an ‘H,’ and that looks like a ‘P,’ but they’re really bad at structuring these whole things together.”

Screenshot 2024 03 19 at 11.05.24AM
Image Credits: Microsoft Designer (DALL-E 3)

That’s why, if you ask an AI image generator to create a menu for a Mexican restaurant, you might get normal items like “Tacos,” but you’ll be more likely to find offerings like “Tamilos,” “Enchidaa” and “Burhiltos.”

As these memes about spelling “strawberry” spill across the internet, OpenAI is working on a new AI product code-named Strawberry, which is supposed to be even more adept at reasoning. The growth of LLMs has been limited by the fact that there simply isn’t enough training data in the world to make products like ChatGPT more accurate. But Strawberry can reportedly generate accurate synthetic data to make OpenAI’s LLMs even better. According to The Information, Strawberry can solve the New York Times’ Connections word puzzles, which require creative thinking and pattern recognition to solve, and can solve math equations that it hasn’t seen before.

Meanwhile, Google DeepMind recently unveiled AlphaProof and AlphaGeometry 2, AI systems designed for formal math reasoning. Google says these two systems solved four out of six problems from the International Math Olympiad, which would be a good enough performance to earn as silver medal at the prestigious competition.

It’s a bit of a troll that memes about AI being unable to spell “strawberry” are circulating at the same time as reports on OpenAI’s Strawberry. But OpenAI CEO Sam Altman jumped at the opportunity to show us that he’s got a pretty impressive berry yield in his garden.



Source link

Lisa Holden
Lisa Holden
Lisa Holden is a news writer for LinkDaddy News. She writes health, sport, tech, and more. Some of her favorite topics include the latest trends in fitness and wellness, the best ways to use technology to improve your life, and the latest developments in medical research.

Recent posts

Related articles

Zepto raises another $350 million amid retail upheaval in India

Zepto has secured $350 million in new funding, its third round of financing in six months, as...

YouTube Shorts’ Dream Screen feature can now generate AI video backgrounds

YouTube announced on Thursday that its Dream Screen feature for Shorts now lets you create AI-generated video...

Battery unicorn Northvolt files for bankruptcy, upending Europe’s industrial plan

Beleaguered Swedish battery manufacturer Northvolt announced today that it was filing for bankruptcy in the U.S., striking...

Brave Search adds AI chat for follow-up questions after your initial query

Brave announced on Thursday that it’s introducing an AI chat mode for follow-up questions based on initial...

Cruise fesses up, Pony AI raises its IPO ambitions, and the TuSimple drama dials back up

Welcome back to TechCrunch Mobility — your central hub for news and insights on the future of...

WhatsApp rolls out voice message transcripts

WhatsApp announced on Thursday it’s rolling out voice message transcripts. The Meta-owned company says the new feature...

Threads adjusts its algorithm to show you more content from accounts you follow

After several complaints about its algorithm, Threads is finally making changes to surface more content from people...

Spotify tests a video feature for audiobooks as it ramps up video expansion

Spotify is enhancing the audiobook experience for premium users through three new experiments: video clips, author pages,...