A USC study finds that (some people think) AI is as funny as the average person.

July 8, 2024, 3:06pm

A new study out of USC compared comedy writing by humans to comedy writing generated by ChatGPT, and found that “ChatGPT can produce written humor at a quality that exceeds laypeople’s abilities and equals some professional comedy writers.” But their experiments didn’t fully convince me that your next favorite joke will be generated by a program.

The study involved two experiments. The first asked a group of adults to generate punchlines for three different prompts: writing funny phrases to fit a provided acronym; finishing Apples-To-Apples-style phrases like, “A remarkable achievement you probably wouldn’t list on your resume: ________”; and coming up with “roast” punchlines like, “To be honest, listening to you sing was like ________.” ChatGPT completed the same tasks, and then a separate group of people rated the human and computer results. The evaluating group thought the computer did a little better: “ChatGPT’s responses were rated funnier than the human responses, with 69.5% of participants preferring them (26.5% preferred the human responses, and 4.0% thought both were equally funny).”

I don’t doubt that some people found things that a computer generated to be funny, but the experiment seems far too constrained to demonstrate much: using a game of MadLibs with pre-existing formats is hardly a demonstration of creativity.

More importantly, I think there’s a more compelling way of reading these study results: the average person off the street isn’t funnier than predictive text. This conclusion tracks for me—I don’t think it’s a particularly hot take to say that most people aren’t funny. Writing comedy is very, very hard and not the kind of thing an average person is going to be able to do well or consistently. Even the most seasoned comedy writer will tell you that joke-writing is a volume game: most of what a funny person writes will not be funny.

The second experiment by the USC scholars involved using 50 Onion headlines to generate 20 new Onion headlines, and then asking a group to rate all 70:

Those who self-reported seeking out comedy more and reading more satirical news rated the headlines as funnier, independent of whether they were AI-generated or produced by professional writers. Based on mean ratings, 48.8% preferred The Onion’s headlines, 36.9% preferred the headlines generated by ChatGPT, and 14.3% showed no preference.

Again: sure, maybe some people couldn’t tell the difference, but that’s hardly a demonstration of true comedic creativity. Being able to mimic The Onion’s distinctive voice and style, which a lot of writers worked hard to hone and perfect, doesn’t mean the model is writing jokes. It’s just good at copying off of the A-student sitting next to them. It also doesn’t mean that ChatGPT is “writing comedy”—A dog wearing sunglasses is funny, but that doesn’t mean the dog is doing a bit.

One of the things missing from this academic conception of AI’s comedy writing ability is that it’s discounting the importance of context to successful joke-telling. By starting with funny set-ups and comedy written by professionals, the framing of these experiments is doing a lot of the heavy lifting. Context is so important for a joke to land at all, which is why comedy ages so poorly—jokes work best when they’re responding to a specific time, place, and culture. Take the famous, oldest bar joke in the world from ancient Sumer: “A dog walks into a bar and says, ‘I cannot see a thing. I’ll open this one.’” Without any ancient Sumerians to explain the punchline to us, this joke is meaningless. A language model can’t take the pulse of the culture or read a room, so it won’t be able to generate comedy without a lot of help from a person to give its writing context.

What’s also missing here is the importance of taste. The study dismisses a lot of these concerns: “Our studies suggest that the subjective experience of humor may not be required for the production of good humor—merely knowing the patterns that make up comedy may suffice.” I don’t buy this. The computer can’t know what patterns are funny and what patterns aren’t without someone with enough taste to guide them. When a funny person does an impression in front of an audience or sits down to write a Reductress headline, they’re not regurgitating a pattern; they’re telling you, “this is funny to me.” A joke is an expression of taste, which is something that a language model can’t do without the help of an editor with a sense of humor. Someone needs to put the sunglasses on the dog.

So no, I don’t think that language models are on the verge of replacing all comedians and late night hosts. Overall, I think the hype around generative tech’s ability to make anything of value is overblown. Even the market seems to be in agreement: a recent analysis by Goldman Sachs found that AI isn’t making much money and is being used by a mere 5% of companies. It’s a problem for an industry that is spending an immense amount of money to create and operate these programs and decimating the environment in the process.

Even the AI fanatics seem to have lost the thread. This Reddit poster frets, “And now I have generated well over 200,000 images and I have no clue what I’m supposed to do with them? There has to be a use for that many images except I wouldn’t know what is.” Without an opinion or taste, who’s to know what’s good and what’s bad?

It shouldn’t be surprising that the corporate hype of AI isn’t panning out: remember when we were all going to be 24/7 living, laughing, and loving in the metaverse? Tech hype and PR have a remarkable ability to capture our imagination—because we, as humans, are good at telling stories—but it’s not often backed up by much more than hot air.

Overall, people promising that they can crunch the numbers and tell you what is funny aren’t to be trusted. It reminds me of a phrase from Connor O’Malley’s recent, very funny special, “Stand-Up Solutions”, where he plays an unhinged salesman for an AI standup comedian. O’Malley’s character claims that his AI-comedian can deliver “100% accurate comedy,” an absurd and nonsensical promise.

There is one conclusion that I agree with in this study, though I think they’re getting there in the wrong way:

“That ChatGPT can produce written humor at a quality that exceeds laypeople’s abilities and equals some professional comedy writers has important implications for comedy fans and workers in the entertainment industry,” they said. “For professional comedy writers, our results suggest that LLMs [large language models like ChatGPT] can pose a serious employment threat.”

I agree that writers should be worried, but not because of quality. If audiences enjoy jokes that a language model churns out, that’s fine—people are allowed to have whatever taste they want, even bad taste.

What worries me are the executives and purse-string-holders who are reading these studies that audiences can’t distinguish between AI and human text, and concluding that they no longer need to pay writers. I’ll keep beating the drum: LLMs are a problem because they’re a labor issue. If generative text is seen as a viable alternative by boardroom denizens, we’re going to see a lot more writers out of work and underpaid. We’re going to see a lot more writers who aren’t paid to think and to tell us what they think is funny, but to instead pick the best jokes that a computer generated, or punch up a first draft that ChatGPT extruded, or copy-edit a novel that a LLM squeezed out. These are jobs, sure, but they will be less abundant and less well-paid.

Without true writing jobs, how will people develop a sense of humor? How can people afford to try things out, to experiment? How will we get anything new, when everything is just a sausage made from pre-existing material? We’ll be left with a smaller, less equal, more top-heavy entertainment industry. And do we really want C-suites to have more creative input on our art?

Source link

A USC study finds that (some people think) AI is as funny as the average person.

Recent posts

“What Zarqaʾ al-Yamama Didn’t Say,” a Poem by Mohamed Abdelbari

Burn

Elif Shafak on the Power of Literature and Being a Writer in the “Age of Angst”

What to read if you can’t wait for the next season of You Must Remember This.

Joanna Pearson on Writing a Literary Whodunnit

“Moonstruck.” How Myths of Lunar Power Continue to Fascinate Us

The Poet of the Revolution: Read Newly Translated Work by One of Egypt’s Most Prominent Poets, Mostafa Ibrahim

What’s In a Lie? On the Different Ways Politicians Mislead the Public

The Cosmic Sublime: On the Unpredictability and Allure of Comets

Ten World-Spanning New Children’s Books Out in October

Crooked Parallels: On Alice Munro, Andrea Skinner, and My Mother’s Failure to Protect Me

Embrace the Journey: An Octogenarian’s Advice For Younger Writers

Tired of Today’s Tech: Writing Historical Fiction in a Technocratic American Present

Close Encounters of Animal Kind: On the Porous Urban Boundaries Between Predator and Prey

Poems of Power and Our Planet: Six Essential Ecopoetry Collections to Read

Related articles

Lit Hub Weekly: December 16 – 20, 2024

Lit Hub Daily: December 20, 2024

This Week on the Lit Hub Podcast: ‘Twas the Episode Before Christmas

Lit Hub’s 50 Noteworthy Nonfiction Books of 2024

New Media, Old Anxieties: Why is “Brain Rot” the Word of the Year?

The Thick Muddy Soil of Language: On Mosab Abu Toha’s Forest of Noise

“We Need to Be Rigorous in Defending Our Experiences of Art.” Chris Knapp Talks to Andrew Martin

The 10 Best Literary Adaptations of 2024

Company

Follow us