OpenAI 被勒令向《纽约时报》提供 2000 万用户对话记录。
OpenAI Ordered To Produce 20 Million User Conversations To NY Times

原始链接: https://www.zerohedge.com/technology/openai-ordered-produce-20-million-user-conversations-ny-times

一位联邦法官已下令 OpenAI 向《纽约时报》和其他起诉该公司侵犯版权的报纸提供 2000 万条匿名 ChatGPT 用户日志。这些报纸认为,这些日志对于证明 ChatGPT 如何利用受版权保护的新闻内容并生成潜在的错误输出至关重要。 OpenAI 最初表示反对,理由是用户隐私问题,但法官认为这些担忧被确保删除识别信息的保护令所抵消。OpenAI 必须在 11 月 14 日之前遵守。 这场诉讼的核心是声称 OpenAI 非法使用受版权保护的材料——例如《纽约时报》的文章——来训练其人工智能模型,从而避免了在内容创作方面的巨额投资。 此案与来自作者和 Getty Images 的其他案件一起,构成了围绕“合理使用”以及人工智能公司利用公开数据进行训练目的的权利的日益激烈的法律斗争,这是快速发展的生成式人工智能领域的一个关键因素。 结果可能会对人工智能的未来发展和版权法产生重大影响。

相关文章

原文

OpenAI has been ordered by a federal judge to turn over 20 million anonymized ChatGPT user logs to the NY Times and other newspapers suing the chat giant over its generative AI model. 

In a Nov. 7 order revealed today, New York Magistrate Judge Ona T. Wang said producing the logs in whole is appropriate - granting the plaintiffs' motion to compel production. The newspapers had demanded the user logs to inspect how ChatGPT is used to create outputs they say infringe their copyrighted works. OpenAI pushed back, citing privacy concerns. 

Wang, however, did not find their argument compelling in explaining how consumers' privacy rights were at risk given that there's a protective order in place, and identifying information would be removed from the logs (so anyone who's uploaded their tax return or a legal document is safe?).

OpenAI has until Nov. 14 to hand over the data - the latest twist in the hotly contested discovery process in the newspapers' copyright lawsuits against OpenAI, Bloomberg Law reports. 

OpenAI had contested the wholesale production of the 20 million user logs and asked to narrow the sample, saying in a Oct. 30 briefing that the ask was inappropriate and would disclose private user conversations that had nothing to do with the copyright issue in the case.

Newspaper-plaintiffs including New York Times Co., however, pushed back and said without the user logs they couldn’t conduct expert analysis on topics such as how ChatGPT worked to pull news content for its users or how often the AI model hallucinated and generated false outputs attributed to the outlets. -BBG

The fight over user logs dates back to April - before lawsuits against OpenAI by various news outlets were consolidated for pretrial proceedings. In May, Wang issued a preservation order, rejecting OpenAI's argument that the request was "sweeping" and "invasive."

NYT's lawsuit, filed in Dec. 2023, claims that the companies violated copyright laws by using Times' content to train their AI models, including ChatGPT and Microsoft's Copilot.

"Times journalism is the work of thousands of journalists, whose employment costs hundreds of millions of dollars per year," reads the complaint. "Defendants have effectively avoided spending the billions of dollars that The Times invested in creating that work by taking it without permission or compensation."

The lawsuit has potentially huge implications over 'fair use' of copyrighted materials, a complex legal doctrine governing factors such as the purpose of use, the nature of the copyrighted work, the amount and substantiality of the portion used, and the effect of the use on the potential market for the copyrighted work.

The legal landscape surrounding generative-AI is unsettled, with the technology still in its early days. There are other lawsuits that could test the rights of AI companies to “scrape” content from the web to train AI tools, including one by several prominent book authors against OpenAI. In February, Getty Images sued the AI art company Stability AI in Delaware, alleging that it had infringed on Getty’s copyrights. Stability AI at the time said it doesn’t comment on pending litigation. -WSJ

According to the NYT, AI tools developed by Microsoft and OpenAI have significantly increased their valuations due to the data 'scraped' for training.

Loading recommendations...

联系我们 contact @ memedata.com