法官驳回 GitHub Copilot 诉讼中的 DMCA 版权主张
Judge dismisses DMCA copyright claim in GitHub Copilot suit

原始链接: https://www.theregister.com/2024/07/08/github_copilot_dmca/

开发人员与 GitHub、微软和 OpenAI 之间正在进行的针对 GitHub Copilot 涉嫌未经授权使用其开源代码的诉讼已导致多项索赔被驳回。 最初有 22 项索赔,在最近的裁决后减少到两项 - 一项开源许可证违规指控和一项违反合同投诉。 最近撤销的索赔涉及《数字千年版权法》(DMCA) 下的潜在侵权行为,特别是第 1202(b) 条。 开发人员认为 Copilot 删除了重要的版权信息,包括作者详细信息,同时建议了代码片段,这可能违反了这项法律。 然而,法官发现生成的代码与开发者受版权保护的作品不同,从而使 DMCA 索赔无效。 剩下的问题包括许可合规性和合同义务。 竞争各方在发现阶段的文件制作方面存在分歧,并对延误和回应不足表示担忧。 双方均声称对方不遵守相关证据的披露规定。 随着双方寻求法庭解决,案件仍在继续展开。

本文讨论了美国版权法在实质性相似方面的模糊适用。 它强调了有争议的 Zenimax 与 Oculus 案件,其中抽象代码部分导致了版权侵权指控。 作者对法律制度容易受到影响版权决策的企业利益的影响表示担忧。 他们参考维基百科和 Ars Technica 文章以获取更多信息。 从本质上讲,本文探讨了在确定作品之间的实质性相似性时版权法面临的挑战和潜在的滥用。
相关文章

原文

Claims by developers that GitHub Copilot was unlawfully copying their code have largely been dismissed, leaving the engineers for now with just two allegations remaining in their lawsuit against the code warehouse.

The class-action suit against GitHub, Microsoft, and OpenAI was filed in America in November 2022, with the plaintiffs claiming the Copilot coding assistant was trained on open source software hosted on GitHub and as such would suggest snippets from those public projects to other programmers without care for licenses – such as providing appropriate credit for the source – thus violating the original creators' intellectual property rights.

Microsoft owns GitHub and uses OpenAI's generative machine-learning technology to power Copilot, which auto-completes source code for engineers as they type out comments, function definitions, and other prompts.

Ergo, the plaintiffs are unhappy that, in their view, portions of their copyrighted open source code might be provided – copied, rather – by Copilot to other programmers to use, without due credit given and other requirements of the original licenses honored.

The case started with 22 claims in all, and over time this has been whittled down as the defending corporations motioned to have the accusations thrown out of court, requests that Judge Jon Tigar has mostly sustained.

In an order [PDF] unsealed on Friday, July 5, Judge Tigar ruled on yet another batch of the plaintiffs' claims, and overall it was a win for GitHub, Microsoft, and OpenAI. Three claims were dismissed as requested and just one allowed to continue. According to a count by Microsoft and GitHub's lawyers, that leaves just two allegations standing in total.

The most recently dismissed claims were fairly important, with one pertaining to infringement under the Digital Millennium Copyright Act (DMCA), section 1202(b), which basically says you shouldn't remove without permission crucial "copyright management" information, such as in this context who wrote the code and the terms of use, as licenses tend to dictate.

It was argued in the class-action suit that Copilot was stripping that info out when offering code snippets from people's projects, which in their view would break 1202(b).

The judge disagreed, however, on the grounds that the code suggested by Copilot was not identical enough to the developers' own copyright-protected work, and thus section 1202(b) did not apply. Indeed, last year GitHub was said to have tuned its programming assistant to generate slight variations of ingested training code to prevent its output from being accused of being an exact copy of licensed software.

The plaintiffs won't be able to offer a new section 1202(b) DMCA copyright claim as Judge Tigar dismissed the allegation with prejudice.

The anonymous programmers have repeatedly insisted Copilot could, and would, generate code identical to what they had written themselves, which is a key pillar of their lawsuit since there is an identicality requirement for their DMCA claim. However, Judge Tigar earlier ruled the plaintiffs hadn't actually demonstrated instances of this happening, which prompted a dismissal of the claim with a chance to amend it.

The amended complaint argued that unlawful code copying was an inevitability if users flipped Copilot's anti-duplication safety switch to off, and also cited a study into AI-generated code in attempt to back up their position that Copilot would plagiarize source, but once again the judge was not convinced that Microsoft's system was ripping off people's work in a meaningful way.

Specifically, the judge cited the study's observation that Copilot reportedly "rarely emits memorized code in benign situations, and most memorization occurs only when the model has been prompted with long code excerpts that are very similar to the training data."

"Accordingly, plaintiffs’ reliance on a study that, at most, holds that Copilot may theoretically be prompted by a user to generate a match to someone else’s code is unpersuasive," he concluded.

The DMCA argument was, as we said, one of three claims just now tossed out. The other two were claims for unjust enrichment and punitive damages, though not with prejudice, meaning it's possible these claims could be amended and resubmitted. Until then, however, that leaves the standing claims at just two: an open source license violation allegation, and a breach of contract complaint that was previously reintroduced after being dismissed initially.

"We firmly believe AI will transform the way the world builds software, leading to increased productivity and most importantly, happier developers," GitHub said in a statement to The Register.

"We are confident that Copilot adheres to applicable laws and we’ve been committed to innovating responsibly with Copilot from the start. We will continue to invest in and advocate for the AI-powered developer experience of the future."

We also approached all parties in the lawsuit and their legal teams.

Both sides squabble during discovery

Also filed for the case on Friday was a joint case management statement [PDF] chock full of various grievances and complaints each side made against the other over the discovery process, with both saying the other hasn't given up all the documents they were supposed to.

The plaintiffs accuse the defendants of deliberately dragging their feet, saying the documents that have been produced so far were already publicly available or should have been disclosed a long time ago. Much of the focus is on Microsoft and its single submitted document so far, something that the plaintiffs say makes no sense.

"That Microsoft employees were involved in many of these GitHub-sourced conversations demonstrates that Microsoft's production of one document thus far has been a function of delay and obfuscation, and nothing else," the anonymous developers said. "Microsoft has known but failed to disclose that its employees were directly involved in the creation, operation, and management of Copilot and its underlying models."

The lack of documents from the Windows maker is apparently down to "technical difficulties" in collecting Slack messages, something the plaintiffs aren't convinced by. Similarly, the programmers say that OpenAI should have also submitted lots more information by now, pointing out that it had submitted tens of thousands as a defendant in the Authors Guild lawsuit.

Microsoft and GitHub, however, counter that the plaintiffs have been asking for way too much info, accusing them of having "failed to pursue relevant discovery of these topics efficiently and in good faith." One of these topics includes Microsoft's 2018 acquisition of GitHub.

Meanwhile, OpenAI says the plaintiffs haven't been following proper procedure in respect to asking for emails, saying it can't (or won't) produce any until it receives a correct request.

The corporate trio also say that the dismissal of the above DMCA copyright claim has fundamentally changed the case and argue that the scope of discovery should now be narrowed. This is something the plaintiffs dispute on the grounds that the open source license violation claim pertains to pretty much the same documents as the DMCA issue should bring up.

GitHub, Microsoft, and OpenAI say the plaintiffs haven't properly responded to their discovery requests, arguing that their documents include "JSON files, a blank HTML file, emails without any metadata, and improperly redacted PNG files of Slack and other messages."

The plaintiffs have asked for more time for discovery, and although the defendants argue this isn't necessary, the three tech titans say they're open to a "reasonable extension." ®

联系我们 contact @ memedata.com