Reddit sues Anthropic for scraping user data to train AI without permission

Staff Writer
Jun 5
3 min read

Social media major Reddit has joined the chorus of companies and individuals seeking legal action to stop AI firms from using their data to train large language models (LLMs) without permission. On Wednesday, Reddit filed a lawsuit against Amazon-backed Anthropic in a US court in California for using personal data of its users to train their AI models without first obtaining a license.

Reddit claims in the lawsuit that it protested against Anthropic’s use of its data last year, and the AI firm assured that it had blocked its bots from accessing its data. Reddit asserts that not only did Anthropic refuse to remove deleted posts from its systems, but its bots have accessed Reddit’s servers more than 100,000 times since then.

The unauthorized commercial use of its content is harmful not just to users but also the platform, claims Reddit, adding that the licensing agreements, which it has signed with other AI firms, enable it to impose guardrails on the use of its data.

In May 2024, OpenAI signed a deal for access to Reddit’s data APIs, which gives them real-time updates on the social media platform. Reddit had signed a similar deal reportedly worth $60 million with Google in February 2024. These deals also give Reddit access to their AI models to build applications and add AI features.

“Anthropic scraped and used Reddit content to train and power a model that has enriched Anthropic to the tune of billions of dollars. Reddit has been denied any compensation for the content which Anthropic has taken and for its role in creating Anthropic’s AI models,” Reddit said in the lawsuit filing.

Reddit is seeking compensation equivalent to the amount generated by Anthropic through the use of Reddit content.

Founded in 2005, Reddit is one of the biggest social media platforms with more than 100 million daily active users. CEO Steve Huffman claims it is one of the largest open archives of “authentic and relevant human conversations” on the latest topics.

Currently valued at $61 billion, Anthropic is one of the most promising AI startups behind Claude Sonnet AI models and is considered as one of the top rivals of ChatGPT creator OpenAI. Amazon is one of its biggest investors with close to $8 billion of funding.

Data collection practices of AI firms have been widely criticized and also challenged in court by news publishers and writers.

US based startup Perplexity AI was sued last October by The Wall Street Journal and New York Post, while OpenAI is facing a similar infringement lawsuit filed by eight US newspapers including the New York Times.

Microsoft and OpenAI are also facing legal actions over copyright infringement from the Authors’ Guild and several prominent writers including David Baldacci, John Grisham, George R.R. Martin, and Michael Connelly.

In January, Digital News Publishers Association (DNPA), which represents 20 Indian media outlets including Hindustan Times, Indian Express, Network18, The Hindu, India Today, and Dainik Jagran filed an appeal in Delhi High Court to join an ongoing copyright infringement lawsuit filed by ANI against OpenAI.

To avoid copyright lawsuits many of the AI firms have signed data sharing deals with newspaper publishers. For instance, OpenAI has signed deals with several global news publishers including Associated Press, Vox Media, Axel Springer and News Corp, and Financial Times for license to use select news content for training AI models and answering user queries.

Image credit: Pexels