Monday, April 22, 2024

Google can now train AI with Reddit posts thanks to new Data API access deal


Google and Reddit

Google and Reddit have “expanded and deepened” their existing partnership in a deal that will give the search giant near-unfettered access to content posted on the site. As well as helping users to find Reddit content via the search engine, the move also means that Redditor’s posts will be used to train Google’s AI models.

Reddit says that in giving Google access to it Data API, it will be “easier to discover and access the communities and conversations people are looking for”. It is being pushed by both companies as a way of promoting the “open internet”, but it is something that is proving conversial.

See also:

As part of the deal, Reddit will be able to integrate new AI-powered capabilities using Vertex AI. Google says that Reddit intends to use Vertex AI to “enhance search and other capabilities on the Reddit platform”. Reddit describes this slightly differently, saying that it’s partnership with Google Cloud will enable it to “integrate new AI-powered capabilities to improve Reddit and help achieve our mission of bringing community, belonging, and empowerment to everyone in the world.

In a post about the deal, Reddit says: “Aligned with our belief that everyone should be able to find the information they need and the experiences they want online, we’ve expanded our partnership with Google to make it easier to discover and access the communities and conversations people are looking for on Reddit”.

It goes on to add:

With this partnership, and via our Data API, we’re ushering in new ways for Reddit content to be displayed across Google products by providing programmatic access to new, constantly evolving, and dynamic public posts, comments, etc., on Reddit. This enhanced collaboration provides Google with an efficient and structured way to access the vast corpus of existing content on Reddit and enables Google to use the Reddit Data API to improve its products and services — including supporting new ways to display Reddit content and providing more efficient ways to train models.

It is the access to data and how it will be used by Google that is causing a good deal of upset among Reddit users. The company says of the new arrangement:

Google now has access to Reddit’s Data API, which delivers real-time, structured, unique content from their large and dynamic platform. With the Reddit Data API, Google will now have efficient and structured access to fresher information, as well as enhanced signals that will help us better understand Reddit content and display, train on, and otherwise use it in the most accurate and relevant ways. This expanded partnership does not change Google’s use of publicly available, crawlable content for indexing, training, or display in Google products.

Reddit is at pains to point out: “This expanded partnership does not change Reddit’s Data API Terms or Developer Terms, which state content accessed through Reddit’s Data API cannot be used for commercial purposes without Reddit’s approval. API access remains free for non-commercial usage under our published threshold”.

This, however, is doing nothing to calm user who are irate at their posts being “stolen” by Google, and used in ways they are not comfortable with.

Image credit: MMollaretti / depositphotos

Read more

Local News