Google says its new AI model family has a curious feature: the ability to “identify” emotions.
Announced on Thursday, the PaliGemma 2 family of models can analyze images, enabling the AI to generate captions and answer questions about people it “sees” in photos.
“PaliGemma 2 generates detailed, contextually relevant captions for images,” Google wrote in a blog post shared with TechCrunch, “going beyond simple object identification to describe actions, emotions, and the overall narrative of the scene.”
Emotion recognition doesn’t work out of the box, and PaliGemma 2 has to be fine-tuned for the purpose. Nonetheless, experts TechCrunch spoke with were alarmed at the prospect of an openly available emotion detector.
“This is very troubling to me,” Sandra Wachter, a professor in data ethics and AI at the Oxford Internet Institute, told TechCrunch. “I find it problematic to assume that we can ‘read’ people’s emotions. It’s like asking a Magic 8 Ball for advice.”
For years, startups and tech giants alike have tried to build AI that can detect emotions for everything from sales training to preventing accidents. Some claim to have attained it, but the science stands on shaky empirical ground.
The majority of emotion detectors take cues from the early work of Paul Ekman, a psychologist who theorized that humans share six fundamental emotions in common: anger, surprise, disgust, enjoyment, fear, and sadness. Subsequent studies cast doubt on Ekman’s hypothesis, however, demonstrating there are major differences in the way people from different backgrounds express how they’re feeling.
“Emotion detection isn’t possible in the general case, because people experience emotion in complex ways,” Mike Cook, a research fellow at Queen Mary University specializing in AI, told TechCrunch. “Of course, we do think we can tell what other people are feeling by looking at them, and lots of people over the years have tried, too, like spy agencies or marketing companies. I’m sure it’s absolutely possible to detect some generic signifiers in some cases, but it’s not something we can ever fully ‘solve.’”
The unsurprising consequence is that emotion-detecting systems tend to be unreliable, and biased by the assumptions of their designers. In a 2020 MIT study, researchers showed that face-analyzing models could develop unintended preferences for certain expressions, like smiling. More recent work suggests that emotional analysis models assign more negative emotions to Black people’s faces than white people’s faces.
Google says it conducted “extensive testing” to evaluate demographic biases in PaliGemma 2, and found “low levels of toxicity and profanity” compared to industry benchmarks. But the company didn’t provide the full list of benchmarks it used, nor did it indicate which types of tests were performed.
The only benchmark Google has disclosed is FairFace, a set of tens of thousands of people’s headshots. The company claims that PaliGemma 2 scored well on FairFace. But some researchers have criticized the benchmark as a bias metric, noting that FairFace represents only a handful of race groups.
“Interpreting emotions is quite a subjective matter that extends beyond use of visual aids, and is heavily embedded within a personal and cultural context,” said Heidy Khlaaf, chief AI scientist at the AI Now Institute, a nonprofit that studies the societal implications of artificial intelligence. “AI aside, research has shown that we cannot infer emotions from facial features alone.”
Emotion detection systems have raised the ire of regulators overseas, who’ve sought to limit the use of the technology in high-risk contexts. The AI Act, the major piece of AI legislation in the EU, prohibits schools and employers from deploying emotion detectors (but not law enforcement agencies).
The biggest apprehension around open models like PaliGemma 2, which is available from a number of hosts including AI dev platform Hugging Face, is that they’ll be abused or misused, which could lead to real-world harm.
“If this so-called ’emotional identification’ is built on pseudoscientific presumptions, there are significant implications in how this capability may be used to further — and falsely — discriminate against marginalized groups such as in law enforcement, human resourcing, border governance, and so on,” Khlaaf said.
Asked about the dangers of publicly releasing PaliGemma 2, a Google spokesperson said the company stands behind its tests for “representational harms” as they relate to visual question answering and captioning. “We conducted robust evaluations of PaliGemma 2 models concerning ethics and safety, including child safety, content safety,” they added.
Watcher isn’t convinced that’s enough.
“Responsible innovation means that you think about the consequences from the first day you step into your lab and continue to do so throughout the lifecycle of a product,” she said. “I can think of myriad potential issues [with models like this] that can lead to a dystopian future, where your emotions determine if you get the job, a loan, and if you’re admitted to uni.”