Websites Blocking Google and OpenAI for AI Training

0
118
Websites Blocking Google and OpenAI for AI Training

A significant shift is occurring in the online landscape as top websites are increasingly blocking Google and OpenAI from using their data to train AI models. Let’s delve into the details of this evolving scenario.

The Role of Robots.txt

Robots.txt, a small but powerful piece of code, has long served as the gatekeeper of the web, allowing website owners to decide whether tech giants like Google can scrape their content. This arrangement has historically benefited both parties, with Google driving valuable traffic to compliant websites.

The Emergence of AI Wars

However, the emergence of AI wars has disrupted this equilibrium. The vast troves of online content, crucial for training advanced AI models, have become contested territories. Companies like OpenAI, Google, Meta, and others rely on this data to power their AI technologies, which directly respond to user queries, potentially reducing traffic redirection.

Google’s Response with Google-Extended

In response to these challenges, Google has introduced Google-Extended, a tool that enables websites to block Google from using their content for AI training. This tool, launched in September, has garnered some adoption among top websites, with around 10% of the top 1,000 websites utilizing it by late March, as per data from Originality.ai.

The Case of The New York Times

The New York Times (NYT) stands out as a prominent example of a website using Google-Extended to block access for AI model training. NYT has also barred OpenAI from accessing its content, reflecting a broader trend of companies guarding their data against AI training endeavors.

Comparing Google-Extended and Other Data-Blockers

While Google-Extended has seen adoption from major websites like CNN, BBC, Yelp, and Business Insider, it lags behind OpenAI’s GPTBot and CCBot in terms of usage among the top 1,000 websites. Originality.ai CEO Jonathan Gillham suggests that websites may fear exclusion from future AI-generated search results if they block Google’s AI from accessing their data.

The Future of AI-Driven Search

Google’s experimentation with genAI search, showcased through its Search Generative Experience (SGE), hints at a potential shift in how AI-powered search engines operate. The decisions made by companies regarding AI data access will significantly shape the future of web interactions in this AI-driven era.

As the web navigates these transformative AI dynamics, partnerships like Axel Springer’s global deal with OpenAI underscore the complex interplay between AI training, data access, and media reporting.

Leave a reply