Big Move: News publishers block OpenAI’s web crawler to protect content
Times of India, Hindustan Times, Dainik Bhaskar and The Hindu have safeguarded their websites from OpenAI's web crawler GPTBot, many others to follow suit this week
India’s leading news publishers have started blocking OpenAI’s web scanning tool to stop it from accessing their content to power ChatGPT, e4m has learnt.
ChatGPT is a generative Artificial Intelligence (AI) application developed by Microsoft-backed OpenAI. Over the last three weeks, The Times of India, Hindustan Times, Dainik Bhaskar and The Hindu have blocked access to OpenAI's web crawler ‘GPTBot’ in order to safeguard their content, top officials in all the three newspapers confirmed the development to e4m. Some others are planning to follow the suit this week only.
The move comes days after leading international publications CNN, NY Times, The Guardian, ABC and Reuters took similar measures to thwart GPTbot, a web crawler launched by OpenAI on August 8.
When asked about it, Sujata Gupta, Secretary General, Digital News Publishers Association (DNPA) expressed concern over the challenges posed by web crawlers and automated bots accessing and potentially using content without permission.
“Some of our members have already taken steps to block access to OpenAI's web crawler, GPTBot, in order to safeguard their content. Others are in the process of implementing similar measures or are actively evaluating their options,” said Gupta.
She added, “Most of the publishers are also considering updating their Terms of Service to restrict any use of their content without prior consent for the development of any artificial intelligence (AI) systems or similar programs or models, a move similar to the one taken by a lot of global news publishers already, with some planning to do it as soon as this week itself.”
OpenAI, which does not disclose the data that helped build the model behind ChatGPT, announced in August that it will enable website operators to block its web crawler from accessing their content, although the move does not allow material to be removed from existing training datasets.
According to plagiarism tracker Originality.ai, over 10 percent of news websites across the globe have blocked the web crawler of OpenAI within weeks of its launch.
Revenue loss
Apart from content piracy, ChatGPT is also being blamed for causing a drop in referral traffic to news websites through Google Search as people are shifting to AI-chatbots for their queries, alleged digital publishers.
Online news publishers’ revenue has been on a decline for the last few months due to a range of reasons such as a drop in news consumption, decline in sales of mobile phones. Generative AI has led to further blow to their revenues as users have almost stopped clicking any news links, news industry leaders say.
News or other websites earn revenue if users visit their sites and click on ads displayed on their webpages.
“Even Google has brought AI in search now. About 90 percent of news consumers anyway don’t click news links. They just read the headings thrown up by Google Search. With AI addition in Google Search, referral traffic to our websites would completely stop. We would be left with no digital revenue. How will we invest in journalists and news production then?” rues a publisher.
All Generative AI-tools being blocked
A top digital publisher added, “Not just OpenAI, all generative AI tools which are working on ‘Large language models’ (LLMs)are being blocked so that they can’t access our content to further develop their generative AI models.”
Large language models are fed vast amounts of text in order to be taught how to generate plausible sentences.
Generative AI firms are accused of lifting unlicensed content from news websites to create their LLMs. All these firms have become larger than life within a few months.
For instance, OpenAI, which launched ChatGPT in November 2022 only, is valued at $30 billion, according to international media reports although the tech firm has not reported any revenue figures so far.
Publishers across the world feel that OpenAi may earn huge revenue in the coming days by feeding ChatGPT with their content but without sharing a single penny with news publishers who spend huge sums of money to produce the content. They also feel that journalism itself is in danger due to the advent of ChatGPT and other generative AI-tools.
Digital Competition Bill gives hope
The digital publishers now pin hope on the upcoming legislation on Digital Competition that seeks to regulate the tech companies.
The government of India set up a committee in February this year, following a report by the Parliamentary Standing Committee on Finance in December 2022 on anti-competitive practices by big tech companies. It had mooted a digital competition bill to check such practices. Industry insiders and experts feel the report will have far-reaching implications.
Gupta stated, “Our primary goal is to strike a balance that respects copyright protection, fosters innovation, and maintains a free flow of credible news to the citizens of the country. We are hopeful that upcoming bills of the Government of India on Digital Governance and Competition matters would also factor these recent changes in the domain of technology that would have ramifications on both revenue and copyright matters. A win-win situation needs to exist.”