OpenAI Collaborates with AP, Establishing a New Precedent for AI Training
OpenAI, the creator of ChatGPT, has made a groundbreaking deal with the Associated Press (AP), marking the first major agreement of its kind in the ongoing debate surrounding compensation for content used by tech companies to train artificial intelligence algorithms. This partnership entails OpenAI obtaining access to AP’s extensive archive of text stories dating back to 1985, and in return, the news organization will benefit from OpenAI’s technology to explore enhancements in its journalism through experimentation.
AP Embraces Automation but Refrains from Using “Generative” AI
Over the years, the AP has leveraged automation to produce local sports reporting and financial earnings reports. However, the news organization explicitly clarified that it does not employ “generative” AI, such as chatbots like ChatGPT, for generating news stories.
Tech Companies’ Use of Web Content Spurs Controversy
OpenAI, Google, and other AI companies have utilized vast amounts of publicly available text, including news stories, Wikipedia articles, social media comments, and blog posts, to train their large language models powering chatbots. This practice has sparked a growing debate regarding whether tech companies should be obligated to compensate content creators for scraping their work from the web to build AI tools.
Challenges of Obtaining Consent and Constructing a “Clean Database”
Critics argue that employing copyrighted content without consent for AI training represents a significant shift in internet dynamics, particularly as AI tools trained on human-made content increasingly replace human workers. Recent weeks have seen a surge in lawsuits filed against OpenAI and Google, with authors, musicians, news organizations, and social media companies voicing concerns about improper data usage. Although striking deals, such as the OpenAI-AP partnership, may help create a “clean database,” experts warn that obtaining the necessary agreements from copyright owners to construct viable datasets could prove challenging due to their massive size.
Chatbot Limitations and Attempts to Address Freshness of Information
Chatbots like ChatGPT rely on a fixed set of information and necessitate re-training from scratch to incorporate new data, making them less suitable for providing real-time news and current information. Tech companies have explored various solutions to address this limitation, such as allowing chatbots to search the web or access separate, constantly updating databases. While the AP deal grants OpenAI access to its archive, which is regularly updated with recent news stories, it does not provide real-time access to fresh information.
Past Precedents and Legislative Actions Shaping the Landscape
Tech companies have previously compensated news sites for direct access to their content, especially for display on platforms like Google and Facebook. Certain countries, such as Australia, have enacted laws mandating payment to news publishers, with Canada set to implement a similar legislation. As the debate over content usage for AI training continues, the OpenAI-AP collaboration sets a precedent for forging partnerships that benefit both tech companies and content creators.