Google announced today that it’s initiating a public discussion on developing new protocols and guidelines for how AI systems access and use content from websites.
In a blog post, Google wants to explore “technical and ethical standards to enable web publisher choice and control for emerging AI & research use cases.”
The announcement follows Google’s recent I/O conference, where the company discussed new AI products and its AI principles, which aim to ensure that AI systems are fair, transparent, and accountable.
Google’s blog post reads:
“We believe everyone benefits from a vibrant content ecosystem. Key to that is web publishers having meaningful choice and control over their content, and opportunities to derive value from participating in the web ecosystem.”
Google acknowledges that technical standards like robots.txt were created nearly 30 years ago and developed before modern AI technologies that can analyze web data on a massive scale.
Robots.txt lets publishers specify how search engines crawl and index their content. However, it lacks mechanisms for addressing how AI systems may utilize data to train algorithms or develop new products.
Google is inviting members of the web and AI communities, including web publishers, academics, civil society groups, and its partners, to join a public discussion on developing new protocols and ethical guidelines.
“We want this to be an open process and hope that a wide range of stakeholders will engage to discuss how to balance AI progress with privacy, agency and control over data.”
The discussion reflects an increasing recognition that AI technologies can leverage web data in new ways that raise ethical challenges regarding data use, privacy, and bias.
By initiating an open process, Google aims for a collaborative solution that addresses the interests of technology companies and content creators.
The outcome of these discussions could shape how AI systems interact with and utilize data from websites for years to come.
“The web has enabled so much progress, and AI has the potential to build on that progress,” Google says. “But we have to get it right.”
Criticism Of Google’s Data Collection Methods
Google’s announcement comes as it faces criticism over how much data it has already gathered from across the web to train its AI systems and language models.
Some in the SEO community argue Google’s effort is too little too late.
Barry Adams mocked the announcement on Twitter, saying:
“Now that we’ve already trained our LLMs on all your proprietary and copyrighted content, we will finally start thinking about giving you a way to opt out of any of your future content for being used to make us rich.”
Others argue that Google needs to do more to gather feedback in this process.
Nate Hake, a travel marketer, tweeted:
“’Kicking off a discussion’ requires actually letting the other side SAY something. This is just an email capture form. No field to give feedback. Not even a confirmation message.”
AI Relies On Data—But How Much Is Too Much?
AI systems need large amounts of data to function, improve and benefit society. However, the more data AI has access to, the greater the risks to personal privacy.
There are difficult trade-offs between enabling AI progress and protecting people’s information.
There’s debate over whether people should be able to opt out of AI using their public social media data. Some say individuals should control their data, while others say this slows AI’s advancement.
Both sides present valid arguments, and we’re far from a consensus on the right policy approach.
Google’s call for discussion is a step in the right direction, but the company needs to follow through on implementing the feedback it receives.
Google isn’t alone in facing these challenges. Every tech company developing AI relies on data gathered from the web. The discussion should involve the whole tech industry, not just Google.
Featured Image: JDres/Shutterstock