GitHub 上发现超过 10 万个受感染的存储库
Over 100k Infected Repos Found on GitHub

原始链接: https://apiiro.com/blog/malicious-code-campaign-github-repo-confusion-attack/

2024 年 2 月,研究人员发现恶意代码库混淆活动死灰复燃,影响了超过 100,000 个 GitHub 存储库,可能影响全球数百万人。 该活动利用了之前依赖混淆攻击中使用的类似策略,利用人为错误而不是包管理器技巧,用恶意 Python 加载程序感染不知情的受害者,窃取用户名、密码和 cookie 数据等敏感信息。 虽然大多数受感染的存储库已被 GitHub 识别并删除,但新的存储库仍在定期出现。 为了防止成为回购混淆活动的受害者,组织应立即报告可疑的回购,并考虑实施利用各种先进技术的恶意代码检测系统,包括基于 LLM 的代码分析、将代码解构为完整的执行流程图、启发式引擎等。 Without monitoring the entirety of codebases, the supply chain for all organizations remains vulnerable to malicious attacks。 Such attacks are only becoming increasing prevalent, necessitating the move toward identifying and surfacing the next generation of software supply chain and application risks。 Consequently, securing applications against injection of malicious codes requires advanced capabilities beyond traditional approaches such as merely detecting vulnerabilities。 Organizations seeking to protect themselves must monitor any connected codebase and implement robust measures to mitigate the significant threat posed by malicious repos and dependency confusion attacks。 GitHub is continually notifying and deleting suspicious repos, though some continue to emerge。 Therefore, it is essential that developers remain vigilant concerning potential security breaches。 By adopting best practices, deploying advanced technology solutions, and taking prompt action whenever suspicious behavior is noticed, companies may better equip their employees and enhance overall cybersecurity performance significantly。 Apiiro offers solutions designed explicitly around protecting organizations against the threat of injection malicious codes。 These include various advanced technologies, comprising static analysis algorithms powered by machine learning models, reverse engineering engines, and code sanitation functions。 These state-of-the-art features enable efficient identification and rectification of malicious components and functionality within an application stack or codebase, providing a powerful toolset for mitigating the security threats presented by these emerging trends in contemporary cybersecurity practice。 Developers must keep updated concerning new developments within the sphere of software supply chain security, especially given the critical role played by open source projects and the continued proliferation of dependency confusion campaigns worldwide。

以下是个人和公司可以考虑采取的一些额外步骤,以解决与易受攻击的开源库和依赖项相关的持续担忧: 1. 使用信誉系统:像 Credativ、Cryptosense 和 Krevitz 这样的工具已经开发出来,可以根据图书馆信誉提供见解和警告。 通过采用这些系统,个人和企业可以在将信誉不佳的图书馆纳入其应用程序之前识别它们。 2. 监控源代码和补丁依赖项:正如之前的文章所述,NIST 鼓励公司和个人持续监控其应用程序中使用的源代码以及这些依赖项的补丁。 通过仔细监控,可以尽早发现任何新出现的漏洞,以便立即采取行动,减轻和解决任何现有的威胁向量。 3. 实施更严格的认证标准:为了应对现代操作系统中普遍存在的弱认证和授权方法,公司和机构应该建立更高安全性的标准协议来登录和验证用户身份。 4. 开发安全的供应链:尽管围绕安全软件开发的讨论焦点往往倾向于制定定义产品的功能的核心组件,但更多的重点必须转向创建高可信度的供应链网络。 通过在整个软件生命周期中融入原则和最佳实践方法,公司可以确保他们提供的软件不易受到网络武器、间谍活动或特洛伊木马的影响。 此外,遵循美国网络安全和信息安全局 (CISA) 概述的建议可以深入了解能够有效应对此类挑战的进一步措施和策略。 CISA 的 Shield MITRE 参与计划强调了八项指导原则,旨在通过由值得信赖的行业合作伙伴支持的定期协调会议和信息共享活动,增强从通信到金融到公共卫生等关键基础设施部门的网络弹性。 根据上面的内容,您能否就如何处理与开源库和依赖项相关的安全风险以及实施安全的供应链网络提供建议?
相关文章

原文

Our security research and data science teams detected a resurgence of a malicious repo confusion campaign that began mid-last year, this time on a much larger scale. The attack impacts more than 100,000 GitHub repositories (and presumably millions) when unsuspecting developers use repositories that resemble known and trusted ones but are, in fact, infected with malicious code.

How do repo confusion attacks happen?

Similar to dependency confusion attacks, malicious actors get their target to download their malicious version instead of the real one. But dependency confusion attacks take advantage of how package managers work, while repo confusion attacks simply rely on humans to mistakenly pick the malicious version over the real one, sometimes employing social engineering techniques as well.

In this case, in order to maximize the chances of infection, the malicious actor is flooding GitHub with malicious repos, following these steps:

  1. Cloning existing repos (for example: TwitterFollowBot, WhatsappBOT, discord-boost-tool, Twitch-Follow-Bot, and hundreds more).
  2. Infecting them with malware loaders.
  3. Uploading them back to GitHub with identical names.
  4. Automatically forking each thousands of times.
  5. Covertly promoting them across the web via forums, discord, etc.

What happens when the malicious repos are in use?

Once unsuspecting developers use any of the malicious repos, the hidden payload unpacks seven layers of obfuscation, which also involves pulling malicious Python code and later a binary executable. The malicious code (largely a modified version of BlackCap-Grabber) would then collect login credentials from different apps, browser passwords and cookies, and other confidential data. It then sends it back to the malicious actors’ C&C (command-and-control) server and performs a long series of additional malicious activities.

code analysis

The automation effects on GitHub

Most of the forked repos are quickly removed by GitHub, which identifies the automation. However, the automation detection seems to miss many repos, and the ones that were uploaded manually survive. Because the whole attack chain seems to be mostly automated on a large scale, the 1% that survive still amount to thousands of malicious repos. You can check out a small portion of the current wave yourself by simply searching the following in GitHub: 🔥 2024 language:python.

Counting the removed ones, the number of repos reaches millions. Usually the removal happens a few hours after the upload, so it’s challenging to document them. We know the removal is automated because many of the original ones still exist, and it mainly targets the fork bombs. For example, here you can see thousands of forks appear in the summary but none in the details.

Because of the operation’s large scope, this campaign has a sort of 2nd-order social engineering network effect when, every now and then, naive users fork the malicious repos without realizing they are spreading malware. Kind of ironic to see it spreading by humans after such heavy reliance on automation.

When did the campaign start?

Here is a brief history of this malicious campaign:

May 2023: As originally reported by Phylum, several malicious packages were uploaded to PyPI containing early parts of the current payload. These packages were spread by ‘os.system(“pip install package”)’ calls planted in forks of popular GitHub repos, such as ‘chatgpt-api’.

July – August 2023: Several malicious repos were uploaded to GitHub, this time delivering the payload directly instead of through importing PyPI packages. This came after PyPI removed the malicious packages, and the security community increased its focus there. Aliakbar Zahravi and Peter Girnus from Trend Micro published a great technical analysis of it.

November 2023 – Now: We have detected more than 100,000 repos containing similar malicious payloads, and the number keeps growing. This attack approach has several advantages:

  • GitHub is huge, therefore despite the large number of instances, their relative portion is still insignificant and thus hard to detect.
  • Package managers are not involved as before, therefore explicit malicious package names are not mentioned, so that’s one less indicator.
  • The targeted repos are in a small niche and have low popularity, making it easier for unsuspecting developers to make the mistake and clone their malicious impersonators.

The transition of malware from package managers to SCMs

Judging by the many incidents we have observed in package managers and SCM platforms, the transition of this campaign from malicious packages in PyPI to malicious GitHub repos seems to reflect a general trend. It seems that nowadays, the security community puts extra focus on package managers, so that was to be expected.

The ease of automatic generation of accounts and repos on GitHub and alike, using comfortable APIs and soft rate limits that are easy to bypass, combined with the huge number of repos to hide among, make it a perfect target for covertly infecting the software supply chain.

This campaign, along with dependency confusion campaigns plaguing package registries and generally malicious code being spread through source control managers, demonstrates how fragile software supply chain security is, despite the abundance of tools and available security mechanisms.

Indicators of compromise (how to know if you are infected)

  1. Search for the following Python patterns and investigate any matches:
    • exec(Fernet
    • exec(requests
    • exec(__import
    • exec(bytes
    • exec(“””\nimport
    • exec(compile
    • __import__(“builtins”).exec(
  2. Check for the local presence of any repositories related to automations of actions on social platforms, bots, and gaming, and remove them. If you must, than reinstall – but this time carefully verify the source, and either avoid it or run it in a sandbox.
  3. If you believe there’s a chance a repository of this type was cloned, respond as if the following cookies, credentials and keys were stolen:
    • From browsers: any financial services, any email services, any crypto services, Amazon, eBay, AliExpress, Facebook, Instagram, Twitter, Youtube, Discord, TikTok, Telegram, Twitch, Steam, Yahoo, ExpressVPN, Spotify, and any streaming services.
    • From apps: Exodus, Atomic Wallet, Guarda, Coinomi, Ethereum.
  4. If you would like to verify files checksums, the length of the list is impractical but some of the common ones can be found in this VirusTotal graph.

Cloudflare was notified and deactivated the DNS records of the malicious addresses found.

How to protect yourself against repo confusions

GitHub was notified, and most of the malicious repos were deleted, but the campaign continues, and attacks that attempt to inject malicious code into the supply chain are becoming increasingly prevalent. There are countless solutions for catching malware at the system and network levels, but the supply chain remains a massive and lucrative attack surface for malicious actors. If you encounter any malicious repo, part of this campaign or not, we encourage you to report it.

At Apiiro, we’ve built a malicious code detection system that monitors any connected codebases. We then detect attacks by using deep code analysis using multiple advanced techniques: LLM-based code analysis, deconstruction of the code into a complete execution flow graph, an elaborate heuristics engine, dynamic decoding, decryption, and de-obfuscation, and more, so it’s pretty hard to fool it.

Without monitoring your code for injected malicious payloads, the security of your whole organization is determined by things like the ability of your developers to not choose the wrong repo, which is almost identical, not having a single CI/CD misconfiguration, having 100% secure 3rd party code, and other impossible conditions. That’s why we as an industry need to start going beyond typical vulnerability detection and ingestion to surface the next generation of software supply chain and application risks.

联系我们 contact @ memedata.com