Reddit Doubles Down on Protecting Data from AI
June 27, 2024
1 min 21 sec read
Reddit is making it more challenging for AI to access its data.
The popular social news website is taking proactive steps to prevent AI from accessing its data. If, like Reddit, you're using robot.txt files to stop your content from getting "scraped," it might be a good idea to find a more stringent way to get a grip on your data. Reddit says the robot.txt files aren't cutting it.
Many crawlers ignore or the robot.txt files, allowing some companies (particularly AI startups) to grab Reddit posts to feed their AI tools and train them on stolen content.
Reddit says it will begin "updating a web standard used by the platform to stop automated data scraping from its website." This follows reports that AI startups bypass the rule to gather content for their systems. Reddit's move comes after a huge controversy over artificial intelligence firms ripping off content and plagiarizing to create AI-generated summaries without giving credit, asking for permission, or paying for stolen content.
Reddit says it will
maintain rate-limiting, which controls the number of requests from any one entity and will "block unknown bots and crawlers from data scraping."
The battle against AI and plagiarism has heated up as AI continues stealing content, and businesses use it to their advantage. Earlier this month, Forbes accused Perplexity, an AI search startup, of plagiarizing its stories and using them in its generative AI systems without giving credit. Now Perplexity is in hot water as an investigation determines they bypassed Forbes' efforts to block AI thieves.
But that case is just one of many under scrutiny as more and more companies try to let AI do all the heavy lifting without paying for content that is essentially "lifted" from the internet and used without payment or credit.
The takeaway is that the currently accepted protocol isn't enough to protect your content, and Reddit's proactive steps are empowering website owners to take a stand against AI data scraping and plagiarism. Reddit is stepping up, and it's time for others to do the same.
Want to read this in Spanish?
Spanish Version >>