``` 查找 VS Code 内存泄漏 ```
Finding a VS Code Memory Leak

原始链接: https://randomascii.wordpress.com/2025/10/09/finding-a-vs-code-memory-leak/

在2021年的一次远程结对编程中,一位开发者注意到同事的系统中进程ID异常高(达到七位数),当时同事正在使用VS Code——尽管他自己从未用过该工具。这引发了一项调查,发现了一个严重的内存泄漏,最终增长到超过64GB且没有明显限制。 泄漏在标准的任务管理器视图中不可见,因为它是一个*句柄*泄漏——应用程序正在打开进程句柄但未能关闭它们,导致操作系统保留内存。凭借过去经验的识别,该开发者迅速将问题定位到VS Code代码中缺失的`CloseHandle()`调用,具体是在`GetProcessMemoryUsage`函数中。 添加这一行代码解决了泄漏问题。该错误被报告并在几天内修复。作者反思了由于过去处理句柄泄漏的经验而使问题易于发现,并建议实施资源限制(内存/句柄)以及自动崩溃转储,作为未来预防类似问题的潜在措施。

这个Hacker News讨论围绕一篇博客文章,文章详细描述了一位甚至不使用VS Code的用户发现其中的内存泄漏。原作者brucedawson分享了他在2021年识别这个问题的故事。 评论者深入探讨了资源限制的技术方面。一个观点是Windows缺乏Linux `ulimit`命令的直接等价物。其他人讨论了Windows Job Objects作为一种潜在的(但并非保证有效)解决方案。 一个相关的观察指出,将Visual Studio迁移到64位去除了资源限制,可能加剧了内存泄漏,而另一位用户分享了一个手动触发Visual Studio垃圾回收的技巧。这次对话突出了在现代开发环境中管理内存的挑战。
相关文章

原文

In 2021 I found a huge memory leak in VS code, totalling around 64 GB when I first saw it, but with no actual limit on how high it could go. I found this leak despite two obstacles that should have made the discovery impossible:

  1. The memory leak didn’t show up in Task Manager – there was no process whose memory consumption was increasing.
  2. I had never used VS Code. In fact, I have still never used it.

So how did this work? How did I find an invisible memory leak in a tool that I have never used?

This was during lockdown and my whole team was working from home. In order to maintain connection between teammates and in order to continue transferring knowledge from senior developers to junior developers we were doing regular pair-programming sessions. I was watching a coworker use VS Code for… I don’t remember what… and I noticed something strange.

So many of my blog posts start this way. “This doesn’t look right”, or “huh – that’s weird”, or some variation on that theme. In this case I noticed that the process IDs on her system had seven digits.

That was it. And as soon as I saw that I knew that there was a process-handle leak on her system and I was pretty sure that I would find it. Honestly, the rest of this story is pretty boring because it was so easy.

You see, Windows process IDs are just numbers. For obscure technical reasons they are always multiples of four. When a process goes away its ID is eligible for reuse immediately. Even if there is a delay before the process ID (PID) is reused there is no reason for the highest PID to be much more than four times the maximum number of processes that were running at one time. If we assume a system with 2,000 processes running (according to pslist my system currently has 261) then PIDs should be four decimal digits. Five decimal digits would be peculiar. But seven decimal digits? That implies at least a quarter-million processes. The PIDs I was seeing on her system were mostly around four million, which implies a million processes. Nope. I do not believe that there were that many processes.

It turns out that “when a process goes away its ID is eligible for reuse” is not quite right. If somebody still has a handle to that process then its PID will be retained by the OS. Forever. So it was quite obvious what was happening. Somebody was getting a handle to processes and then wasn’t closing them. It was a handle leak.

The first time I dealt with a process handle leak it was a complicated investigation as I learned the necessary techniques. That time I only realized that it was a handle leak through pure luck. Since then I’ve shipped tools to find process-handle and thread handle leaks, and have documented the techniques to investigate handle leaks of all kinds. Therefore this time I just followed my own recipe and had a call stack for the leaking code within the hour (this image stolen from the github issue):

The bug was pretty straightforward. A call to OpenProcess was made, and there was no corresponding call to CloseProcess. And because of this a boundless amount of memory – roughly 64 KiB for each missing CloseProcess call – was leaked. A tiny mistake, with consequences that could easily consume all of the memory on a high-end machine.

This is the buggy code (yay open source!):

void GetProcessMemoryUsage(ProcessInfo process_info[1024], uint32_t* process_count) {
  DWORD pid = process_info[*process_count].pid;
  HANDLE hProcess;
  PROCESS_MEMORY_COUNTERS pmc;
  hProcess = OpenProcess(PROCESS_QUERY_INFORMATION | PROCESS_VM_READ, false, pid);
  if (hProcess == NULL) {
    return;
  }
  if (GetProcessMemoryInfo(hProcess, &pmc, sizeof(pmc))) {
    process_info[*process_count].memory = (DWORD)pmc.WorkingSetSize;
  }
}

And this is the code with the fix – the bold-faced line was added to fix the leak:

void GetProcessMemoryUsage(ProcessInfo& process_info) {
  DWORD pid = process_info.pid;
  HANDLE hProcess;
  PROCESS_MEMORY_COUNTERS pmc;
  hProcess = OpenProcess(PROCESS_QUERY_INFORMATION | PROCESS_VM_READ, false, pid);
  if (hProcess == NULL) {
    return;
  }
  if (GetProcessMemoryInfo(hProcess, &pmc, sizeof(pmc))) {
    process_info.memory = (DWORD)pmc.WorkingSetSize;
  }
  CloseHandle(hProcess);
}

That’s it. One missing line of code is all that it takes.

The bug was found back when I still used Twitter so I reported my findings there (broken link) and somebody else then filed a github issue based on my report. I stopped using twitter a couple of years later and then my account got banned (due to not being used?) and then deleted, so now that bug report along with everything else I ever posted is gone. That’s pretty sad actually. Yet another reason for me to dislike the owner of Twitter.

It looks like the bug was fixed within a day or two of the report. Maybe The Great Software Quality Collapse hadn’t quite started then. Or maybe I got lucky.

Anyway, if you don’t want me posting embarrassing stories about your software on my blog or on bsky then be sure to leave the Handles column open in Task Manager and pay attention if you ever see it getting too high in a process that you are responsible for.

Sometimes I think it would be nice to have limits on resources in order to more automatically find mistakes like this. If processes were automatically crashed (with crash dumps) whenever memory or handles exceeded some limit then bugs like this would be found during testing. The limits could be set higher for software that needs it, but 10,000 handles and 4 GiB RAM would be more than enough for most software when operating correctly. The tradeoff would be more crashes in the short term but fewer leaks in the long term. I doubt it will ever happen, but if this mode existed as a per-machine opt-in then I would enable it.

I'm a programmer, working for Google, focusing on optimization and reliability. Nothing's more fun than making code run 10x as fast. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And sled hockey. And juggle. And worry about whether this blog should have been called randomutf-8. 2010s in review tells more: https://twitter.com/BruceDawson0xB/status/1212101533015298048
联系我们 contact @ memedata.com