C++ 中愚蠢的智能指针
Stupid Smart Pointers in C

原始链接: http://blog.kevinalbs.com/stupid_smart_pointers

本文介绍了一个在C语言中实现“智能指针”的概念验证,模仿C++的`unique_ptr`来自动释放分配的内存。C语言的手动内存管理容易出现内存泄漏,尤其是在有多个出口点的函数中容易忘记调用`free`。虽然像`goto`这样的技术可以提供帮助,但C语言缺乏析构函数来自动清理资源。 其核心思想是利用内联汇编劫持调用栈。`free_on_exit`函数拦截调用函数的返回地址,将其替换为自定义的`trampoline`函数。当调用函数返回时,它跳转到`trampoline`,然后调用`do_free`来释放分配的内存。`do_free`然后跳转到原始返回地址,确保程序执行正常继续。 为了处理函数中的多个内存分配,维护了一个跟踪指针的栈。每次`free_on_exit`调用都会将指针添加到与调用者基指针 (EBP) 关联的栈条目中。`do_free`迭代当前栈帧中的指针,在返回之前释放它们。这演示了完全在C语言中通过直接操作调用栈实现的简易智能指针。

Hacker News上的一篇博客文章讨论了在C语言中实现“愚蠢的智能指针”,引发了热烈讨论。评论者强烈反对这种方法,因为它存在可移植性和安全问题,C语言本身就存在内存不安全的问题。经验丰富的C程序员推荐了替代方案: * 使用`__attribute__((cleanup))`(GCC和Clang可用,有望成为C标准的一部分),例如systemd项目中所用。 * 使用内存池分配器,例如Samba的talloc或Apache的APR。 不推荐使用引用计数,因为它可能导致bug,并且与现代CPU架构不兼容。一位评论者指出,讨论的“智能指针”方法可能会导致分支预测错误,因为这与现代CPU处理返回地址的方式有关。另一位评论者指出,栈保护cookie也可能受到影响。

原文

Managing memory in C is difficult and error prone. C++ solves this with smart pointers like std::unique_ptr and std::shared_ptr. This article demonstrates a proof-of-concept (aka stupid) smart pointer in C with very little code. Along the way we'll look at the layout of the 32-bit x86 call stack and write assembly in a C program.

§1 Managing Memory in C

In C, heap memory is allocated with a call to malloc and deallocated with a call to free. It is the programmer's responsibility to free allocated memory when no longer in use. Otherwise, memory leaks grow the program's memory usage, exhausting valuable system resources.

Sometimes knowing where to call free is clear.

char *data = (char *) malloc (100);
// do something with data, don't need it anymore
free (data);

But even simple cases may be difficult to properly free. For example, suppose a function f allocates resources in order and frees them before returning.

void f () {
   char *resource_1 = get_resource ();
   if (resource_1 == NULL) return;

   char *resource_2 = get_resource ();
   if (resource_2 == NULL) {
      free (resource_1);
      return;
   }

   char *resource_3 = get_resource ();
   if (resource_3 == NULL) {
      free (resource_2);
      free (resource_1);
      return;
   }

   // etc.
}

Each return must free everything previously allocated. The list of calls to free grows for every additional resource allocated. There are ways to organize this to reduce some redundancy. But the root of the problem remains: the lifetime of the allocated resource is bound to where f returns. Whenever f returns, we need to guarantee all of these resources are freed.

A nice solution in C is described in Eli Bendersky's article: Using goto for error handling in C. This uses the goto statement and places all free calls at the end of the function.

void f () {
   char *resource_1 = NULL, *resource_2 = NULL, *resource_3 = NULL;
   resource_1 = get_resource ();
   if (resource_1 == NULL) return;

   resource_2 = get_resource ();
   if (resource_2 == NULL) goto free_resource_1;

   resource_3 = get_resource ();
   if (resource_3 == NULL) goto free_resource_2;

// etc.

free_resource_2:
   free (resource_2); // fall through
free_resource_1:
   free (resource_1);
   return;
}

But C++ has an even better solution. Since objects have destructors, we can explicitly bind the lifetime of a pointer to the lifetime of an object.

void f () {
   auto resource_1 = std::unique_ptr<char> (get_resource ());
   if (resource_1.get () == nullptr) return;
   auto resource_2 = std::unique_ptr<char> (get_resource ());
   if (resource_2.get () == nullptr) return;
   auto resource_3 = std::unique_ptr<char> (get_resource ());
   if (resource_3.get () == nullptr) return;
   /* ... */
}

The unique_ptr object wraps around the allocated pointer, and frees it when the unique_ptr goes out of scope.

Unfortunately, C has no destructors we can hook onto, so there are no native smart pointers. But we can create a surprisingly simple approximation.

§2 Implementation

The smart pointer will only consist of one function, free_on_exit, to free the passed pointer when the current function returns. This will allow us to rewrite our above example without any calls to free.

void f () {
   char *resource_1 = free_on_exit (get_resource ());
   if (resource_1 == NULL) return;

   char *resource_2 = free_on_exit (get_resource ());
   if (resource_2 == NULL) return;

   char *resource_3 = free_on_exit (get_resource ());
   if (resource_3 == NULL) return;
}

Wherever f returns, it frees everything allocated before. But how can we possibly implement free_on_exit? How can we know when f returns and free all previous allocations? The trick is to manipulate the call stack. Instead of f returning to its original caller, we can manipulate the stack to have it return to our own custom function.

§2.1 The Call Stack

Let's refresh on what the call stack looks like. The layout of the call stack depends on the architecture. We'll use 32 bit x86 as our target architecture (which has a simpler layout and calling conventions than 64 bit). Eli Bendersky has another great article, Where the top of the stack is on x86, with more depth, but the following is a brief overview.

Here's an example of what the stack looks like when function main calls function sum in 32 bit x86 architecture.

int sum (int x, int y) {
   int z = x + y;
   return z;
}

int main () {
   int value = sum (2, 3);
}
The call stack during a function call.

During a function call, the caller and callee split the responsibilities of what data to push onto the stack. The caller main is responsible for saving the current eip, but the callee f is responsible for saving the current ebp.

§2.2 Hijacking a Return Address

But how can the stack be modified in a C program? One way is to use assembly to obtain stack addresses, and then change the values they point to. The following uses inline assembly to change a function's return address.

#include <stdio.h>
void hijacked () {
   printf ("hijacked\n");
}

void f () {
   printf ("f starts\n");

   int *base = NULL;
   // get the value of ebp.
   __asm__("movl %%ebp, %0 \n"
           : "=r"(base) // output
           );

   // change the return address.
   *(base + 1) = (int) hijacked;

   printf ("f ends\n");
}

int main () {
   printf ("main starts\n");
   f ();
   printf ("main ends\n");
}
hijack.c

To run this program:

$ gcc -O0 hijack.c -m32 -o hijack
$ ./hijack
main starts
f starts
f ends
hijacked
Bus error: 10

We can see that f never returns to main, but instead to the hijacked function! How does this work?

In f, we get the current value of the ebp register in base using the inline assembly function __asm__.

__asm__ (
"movl %%ebp, %0 \n" 
: "=r" (base) // output
);

"movl %%ebp, %0 \n" is the assembly instruction we run. Registers are denoted with %%. The %0 is a placeholder for the first output variable.

: "=r" (base) says use the C variable base as the first output variable. "=r" means store the operand in a register before copying to the output variable.

For more information about __asm__, see the article A Tiny Guide to GCC Inline Assembly by Eric Woroshow.

Once we have the value of ebp in base, we can use it just like any pointer.

*(base + 1) = (int) hijacked;

Since base is of type int* adding one increments the address by the size of an int (4 bytes in this case). Therefore, this line changes the saved eip on the stack from main to the address of the function hijacked.

Note, after we return from hijacked there's an error (yours may be a segmentation fault). Next we'll see how to fix that error.

§2.3 Restoring the Return Address

The example before ended with an error. When hijacked returns, there isn't an address to pop off of the stack, so it jumps to an invalid address.

The caller is responsible for pushing the return address. When we jump directly to hijacked we bypass the usual call convention.

Instead we want hijacked to return back to the original return address in main. To do so we can use a pure assembly function to avoid the typical function call and return sequence of a compiled C function.

.section .text
.globl trampoline
.type trampoline, @function
trampoline:
# call hijacked. This pushes the address of the next instruction.
# when hijacked returns, we jump directly to the address in eax.
# eax contains the returned value of hijacked.
call hijacked
jmp %eax
trampoline.S

This assembly function named trampoline bypasses the usual call sequence generated by compiling a C function. Instead of popping a return address to return to, we jmp directly to the value stored in eax. The value returned by hijacked is stored in eax. We modify hijacked and f as follows:

// forward declare the assembly trampoline.
void trampoline ();
int return_address;

int hijacked () {
   printf ("hijacked\n");
   return return_address;
}

void f () {
   printf ("f starts\n");

   int *base;
   // get the value of the ebp.
   __asm__("movl %%ebp, %0 \n"
           : "=r"(base) // output
           );

   // save the return address.
   return_address = *(base + 1);
   // change the return address.
   *(base + 1) = (int) trampoline;

   printf ("f ends\n");
}

Compile and run with:

$ gcc -o hijack -O0 -m32 hijack.c trampoline.S
$ ./hijack
main starts
f starts
f ends
hijacked 
main ends

Now our hijacked function restores the original return address after executing. We'll use this same technique to implement our smart pointer.

§2.4 One Smart Pointer

We're one small step away from creating a smart pointer. Let's rename hijacked to do_free, and add the function free_on_exit, which now hijacks the caller's return address.

#include <stdio.h>
#include <stdlib.h>
/* forward declare the assembly trampoline. */
void trampoline ();
int return_address;
void *tracked_pointer;

int do_free () {
   free (tracked_pointer);
   return return_address;
}

void *free_on_exit (void *ptr) {
   int *base;
   // get the value of the caller's ebp by dereferencing ebp.
   __asm__("movl (%%ebp), %0 \n"
           : "=r"(base) // output
           );

   // save and change the caller's return address.
   return_address = *(base + 1);
   *(base + 1) = (int) trampoline;
   return ptr;
}

void f () {
   char *resource = free_on_exit (malloc (1));
}

int main () {
   f ();
}

Calling free_on_exit stores the passed pointer and sets the caller's return address to trampoline. After the caller f returns, it automatically frees its malloc'ed byte. We now have a smart pointer!

§2.5 Many Smart Pointers

The free_on_exit above is only a single-use function. If called multiple times, it only frees the pointer passed in the most recent call. Fortunately, it's only another small step to make free_on_exit work with any number of repeated calls.

To do so we can store a list of tracked pointers for each function call. Stack these lists, and each time a new function calls free_on_exit, add a new stack entry. When do_free is called, it frees the list of pointers on the top most entry of the stack.

At the risk of including too much code in this article, here is the full implementation in under one hundred lines of code:

#ifndef _SMART
#define _SMART
void *free_on_exit (void *);
#endif
smart.h
#include "smart.h"

#include <stdlib.h> // free
#include <string.h> // memset

/* these limits are arbitrary. */
#define STACK_SIZE 256
#define MAX_PER_FRAME 32

typedef struct {
   int caller_ebp; /* ebp of the caller. This identifes the frame. */
   int caller_eip; /* the original return eip of the caller. */
   void *tracked_pointers[MAX_PER_FRAME];
   int tail; /* points to one past last entry. */
} tracked_stack_entry_t;

typedef struct {
   tracked_stack_entry_t stack[STACK_SIZE];
   int tail; /* points to one past last entry. */
} tracked_stack_t;

/* forward declare the assembly trampoline. */
void trampoline ();

tracked_stack_t tracked = {0};

int do_free () {
   tracked_stack_entry_t *entry = tracked.stack + (tracked.tail - 1);
   tracked.tail--; /* pop. */
   for (int i = 0; i < MAX_PER_FRAME; i++) {
      if (entry->tracked_pointers[i] == 0) break;
      free (entry->tracked_pointers[i]);
   }
   return entry->caller_eip;
}

void *free_on_exit (void *entry) {
   int ret_addr = 0;
   int do_free_addr = (int) &do_free;
   int *caller_ebp;

   /* get the value of ebp. */
   __asm__("movl (%%ebp), %0 \n"
           : "=r"(caller_ebp) /* output. */
           );

   /* check if there is a pre-existing stack entry for this caller
    * (identified by caller's ebp). */
   tracked_stack_entry_t *tracked_entry;

   if (tracked.tail > 0 &&
       tracked.stack[tracked.tail - 1].caller_ebp == (int) caller_ebp) {
      /* re-use. */
      tracked_entry = tracked.stack + tracked.tail - 1;
   } else {
      /* make a new entry. */
      tracked_entry = tracked.stack + tracked.tail++;
      memset (tracked_entry, 0, sizeof (*tracked_entry));
      tracked_entry->caller_ebp = (int) caller_ebp;
      /* hijack caller's return eip to return to do_free. */
      tracked_entry->caller_eip = *(caller_ebp + 1);
      *(caller_ebp + 1) = (int) trampoline;
   }

   /* push the passed pointer. */
   tracked_entry->tracked_pointers[tracked_entry->tail++] = entry;
   return entry;
}
smart.c
# This can be compiled by itself with `as --32`
# This is GNU assembler syntax (aka AT-T syntax) (src, dest)
.section .text
.globl trampoline 
.type trampoline, @function
trampoline:
call do_free
jmp %eax # jump directly back to the old eip.
trampoline.S

Additionally, the code is in a GitHub repository.

§3 Conclusion

In this article we've shown how to build a simple and incomplete smart pointer on an 32 bit x86 architecture. We've looked at the call stack, hijacked return addresses, and written some assembly in the process.

I recently discovered the implementation of free_on_exit won't work if called directly from main if gcc aligns the stack. In this case, main adds padding between the saved eip and the saved ebp, (example). I think this can be fixed some tweaking, and will update this article when it is fixed.

For more reading, check out the following articles:

联系我们 contact @ memedata.com