Linkup connects LLMs with premium content sources (legally)

Date:

Share post:


If you’ve used ChatGPT Search or Perplexity you know that being able to search the web and get citations inline greatly improves these AI chatbots. Results are better when they involve timely information, and web search may reduce so-called hallucinations (i.e. when a generative AI outputs incorrect information).

That’s why French startup Linkup is building an API that lets developers access web content from premium, trusted sources and hand the results to a large language model (LLM) to enrich its answers. Many AI developers call this workflow Retrieval-Augmented Generation (or RAG).

More importantly, the future of scraping bots is uncertain. If there’s no pre-existing financial agreement between content publishers and the entities scraping web pages, these bots are lifting content from the open web without paying and many people aren’t happy about that deal — which is increasing regulatory scrutiny around AI training.

There are also now high-profile legal cases in the frame, such as the ongoing lawsuit between OpenAI, the maker of ChatGPT, and the New York Times — so the situation around web scraping could change in the near future. Hence why OpenAI has signed multi-year content licensing deals with major publishers such as AP, Axel Springer, Condé Nast, El País, the Financial Times, Le Monde, and others.

“We set up the company around the time when OpenAI was making deals with news sources… for training or inference purposes, to augment the answers from OpenAI models and their products. And we thought: ‘OK, this is great because we finally have AI companies that pay their sources,’” Linkup co-founder and CEO Philippe Mizrahi told TechCrunch, laying out what propelled the founders to set up a business to connect AI devs with content providers for — hopefully — their mutual benefit.

Currently, content publishers are faced with a difficult decision over what to do about GenAI’s thirst for data. They can block web scrapers using the (non-legally binding) robots.txt metadata file (which indicates whether a website can be used to train an AI model or not). Furthermore, they can sue AI companies that they believe have breached their copyright. Alternatively, they could let bots index their content freely (er, YOLO?). Or they may be able to license content to AI devs to get some recompense for their intellectual property.

But there are thousands of AI companies (or tech companies using AI) that don’t have the scale and reach of OpenAI. At the same time, what’s great about the web is that there’s a long tail of content publishers. But this means that a small content publisher usually doesn’t have enough financial resource to file a lawsuit. It also means that it will be difficult to switch from a scraping model to a licensing model for millions of websites.

That’s why Linkup isn’t just a technical solution. It’s a marketplace; an intermediary between content publishers and companies that want to augment their LLM answers with web content.

Linkup signs content licensing deals with publishers and integrates with their CMS so that it can fetch content from publishers without any scraping. Linkup then pays content partners based on how often their content is accessed by Linkup clients.

Linkup’s founding teamImage Credits:Linkup

“We’re really targeting applications that are implementing AI in their own products,” said Mizrahi. “So, the typical use case is that I create an AI application using a model from Mistral or OpenAI. I build my own pipeline, but I need to enrich this pipeline with external information.”

As a side note, while ChatGPT can browse the web, GPT models can’t. OpenAI provides both a massively popular application (ChatGPT) and LLMs that developers can use with an API (GPT). But web search is a ChatGPT feature.

“There’s an example I like, which is one of our customers… built an internal application for their sales people,” Mizrahi also told us. “On the one hand, they have listed all the advantages of their own products. And thanks to us, they get fresh, quality information on their prospects and put it into a Mistral LLM. And Mistral’s LLM is going to generate a sort of sales pitch for the sales reps, which they’ll have in front of them when they make the calls with the customer leads.”

At first, Linkup decided to focus on corporate and business information. In addition to news websites, the startup works with knowledge databases — think Statista, Xerfi or other resources in the same vein.

It isn’t the only startup working on bringing premium content to LLMs with licensing contracts behind the scenes. The most visible competitor is ScalePost, a startup that works with Perplexity to speed up its licensing deals with publishers.

Linkup raised a €3 million seed round ($3.2 million at current exchange rates) a few months ago from Axeleo Capital, Motier Ventures, Seedcamp, and a hundred business angels. There are around 10 people working for the startup right now, and it plans to hire another 10 staff over the next year.



Source link

Lisa Holden
Lisa Holden
Lisa Holden is a news writer for LinkDaddy News. She writes health, sport, tech, and more. Some of her favorite topics include the latest trends in fitness and wellness, the best ways to use technology to improve your life, and the latest developments in medical research.

Recent posts

Related articles

Akhetonics gets fresh funding for a contrarian bet on all-optical chips

Photonics — a field that underpins light-based systems for manipulating data — has a bright future, as...

French court blocks popular porn site… subdomain

When the Paris Court of Appeal recently blocked four porn sites in France for not having a...

Australia approves law banning social media for under 16s

In a step likely to be tracked around the world, Australia has approved legislation that will ban...

North Korean hackers have stolen billions in crypto by posing as VCs, recruiters and IT workers

A venture capitalist, a recruiter from a big company, and a newly hired remote IT worker might...

Intenty nudges you to provide a reason every time you unlock your phone

Intentional phone usage is a tricky habit to develop. Operating systems like iOS and Android, along with social media networks like...

EU ends Amazon state aid case with no back taxes in its basket

Amazon can finally close the book on a long-running state aid saga in the European Union. The...

From pond scum to premium skincare? Deep Blue Biotech is all in on blue-green algae to make better chemicals

Decarbonizing our economies in the race to fight climate change demands a wholesale overhauling of all sorts...

Financial inclusion drives African fintech M-KOPA to $400M in ARR

An African fintech that has grown on the strength of a 30,000-strong team of direct salespeople is...