ChatGPT creator OpenAI Inc. is stealing “vast amounts” of personal information to train its artificial intelligence models in a heedless hunt for profits, a group of anonymous individuals claimed in a lawsuit seeking class action status.
OpenAI has violated privacy laws by secretly scraping 300 billion words from the internet, tapping “books, articles, websites and posts — including personal information obtained without consent,” according to the sprawling, 157-page lawsuit. It doesn’t shy from sweeping language, accusing the company of risking “civilizational collapse.”
The plaintiffs are described by their occupations or interests but identified only by initials for fear of a backlash against them, the Clarkson Law Firm said in the suit, filed Wednesday in federal court in San Francisco. They cite $3 billion in potential damages, based on a category of harmed individuals they estimate to be in the millions.
‘A Different Approach: Theft’
“Despite established protocols for the purchase and use of personal information, Defendants took a different approach: theft,” they allege. The company’s popular chatbot program ChatGPT and other products are trained on private information taken from what the plaintiffs described as hundreds of millions of internet users, including children, without their permission.
Microsoft Corp., which plans to invest a reported $13 billion in OpenAI, was also named as a defendant.
A spokesperson for OpenAI didn’t immediately respond to a call or email seeking comment on the lawsuit. A spokesperson for Microsoft didn’t respond right away to an email.
Read More: OpenAI, GitHub Must Face IP Claims in Suit Over Coding Tool
ChatGPT and other generative AI applications have stirred intense interest in the technology’s promise but also sparked a firestorm over privacy and misinformation. Congress is debating the potential and dangers of AI as the products raise questions about the future of creative industries and the ability to tell fact from fiction. OpenAI Chief Executive Officer Sam Altman himself, in testimony on Capitol Hill last month, called for AI regulation. But the lawsuit focuses on how OpenAI got the guts of its products to begin with.
Secret Scraping
OpenAI, which is at the forefront of the burgeoning industry, is accused in the suit of conducting an enormous clandestine web-scraping operation, violating terms of service agreements and state and federal privacy and property laws. One of the laws cited is the Computer Fraud and Abuse Act, a federal anti-hacking statute that has been invoked in scraping disputes before. The suit also includes claims of invasion of privacy, larceny, unjust enrichment and violations of the Electronic Communications Privacy Act.
Misappropriating personal data on a vast scale to win an “AI arms race,” OpenAI illegally accesses private information from individuals’ interactions with its products and from applications that have integrated ChatGPT, the plaintiffs claim. Such integrations allow the company to gather image and location data from Snapchat, music preferences on Spotify, financial information from Stripe and private conversations on Slack and Microsoft Teams, according to the suit.
QuickTake: A Cheat Sheet to AI Buzzwords and Their Meanings
Chasing profits, OpenAI abandoned its original principle of advancing artificial intelligence “in the way that is most likely to benefit humanity as a whole,” the plaintiffs allege. The suit puts ChatGPT’s expected revenue for 2023 at $200 million.
While seeking to represent the massive class of allegedly harmed individuals, and requesting monetary damages to be determined at trial, the plaintiffs are also asking the court to temporarily freeze commercial access to and further development of OpenAI’s products.
The case is P.M. et al. v. OpenAI LP, 23-cv-03199, US District Court, Northern District of California (San Francisco).
--With assistance from Christopher Brown, Rachel Metz and Dina Bass.