``` C语言中Printf的泛化 ```
Generalizing Printf in C

原始链接: https://webb.is-a.dev/articles/generalizedprintf/

## 简化 `printf` 与状态管理 标准 C 库包含一系列 `printf` 函数(如 `printf`、`sprintf`、`vfprintf` 等),它们仅在输入/输出方法上有所不同。更高效的实现可以将它们整合到一个核心函数中处理格式化,并使用包装器管理具体的输入/输出。这可以将 12 个函数减少到几个可管理的函数。 作者提出一个通用的函数 `_vfsprintf`,它接受流/缓冲区、大小、一个“提交”函数指针、格式字符串和可变参数。这个“提交”函数处理实际的输出,从而提供灵活性——写入文件、缓冲区或其它位置。 为了高效地实现这一点,特别是对于需要输出限制的函数(如 `vsnprintf`),需要管理状态。C 语言缺乏直接的面向对象编程,因此状态使用一个 `struct`(如 `bufinfo`,包含索引和长度)传递给“提交”函数,作为 `void*`。这允许增量缓冲区写入,而无需全局变量,有效地模拟了类似对象行为。 这种方法展示了函数指针和细致的状态管理如何在 C 语言的约束下实现模块化和灵活性,从而反映了面向对象编程中的概念。

这个Hacker News讨论的核心是C语言中`printf`函数的泛化。主要观点是,在GNU系统(以及可能包括BSD等其他系统)上,`vfprintf`结合`fmemopen`、`open_memstream`和`fopencookie`等函数已经提供了足够灵活性,可以将输出重定向到各种目标,包括内存缓冲区,而无需自定义API。 用户建议避免专门为泛化打印使用`FILE*`或大小参数。相反,使用一个`void * context`参数以及一个写入回调函数(`int (write)(char data, void * context)`)的更通用方法,允许用户以灵活和可定制的方式处理输出重定向。 基本上,现有工具已经提供了实现所需功能所需的足够能力,而无需使核心`printf`接口变得复杂。
相关文章

原文

In ANSI C89, there are 6 printf functions:

  • printf
  • sprintf
  • fprintf
  • vprintf
  • vsprintf
  • vfprintf

Other C versions add more:

  • dprintf
  • snprintf
  • asprintf
  • vdprintf
  • vsnprintf
  • vasprintf

The sole difference between these functions is the format of input and the source of output. It would make sense to generalize this class of functions in a implementation of printf. Ideally down into one function that constructs and outputs the formatted string which the 12 functions can wrap around or be based upon.

Some simple optimizations can be made already, all printf functions wrap around their varaidic forms, which cuts the list in half. printf and dprintf can both be thought of as wrappers, which reduces the now 6 functions needed into 4:

  • vsprintf
  • vfprintf
  • vsnprintf
  • vasprintf

sprintf was a bad idea for the same reason gets was (in fact, GCC will throw a warning if it detects the use of either of those in almost every case). snprintf is theoretically different because it needs an output limit, but the behaviour of sprintf can be replicated by setting that limit to SIZE_MAX or similar.

The 3 functions left are vfprintf, vasprintf, and vsnprintf. There is a straightforward way to reduce these down into one function, to vasprintf a dynamically allocated string, output it with memcpy or fputs, then free it.

The problem with this approach is that it is inefficient. Even if using a vector that doubles in size when expanding to be O(log n), there are redundant allocations. If the formatted result is especially large, it will take up a large amount of memory. Trying to generalize printf functions that outputs the string as its constructed is a harder and more interesting problem.

# The art of transmutation and opaqueness

C89 defines qsort as:

void qsort(void *base, size_t nmemb, size_t size, int (*compar)(const void *, const void *));

The first 3 arguments are the parameters for an array, with the last being a function to pass the elements to (often a wrapper around strcmp, the comparison operator, or similar).

What is notable about this function is the use of function pointers to modularize code, and the fact that it has no information about the array it is given besides its size and element size. Because of this:

  • Transmutation is required to use it
  • Which makes use of the function much more volatile since it is harder to check if the types being given or interpreted are valid

Consider the two arrays:

long long a[] = {0, 0, 0, 0, 0};
char *b[] = {"b", "a", "c", "d", "e"}; 

Both contain 5 8-byte elements. But they both cast to a void pointer when passed into qsort. And the information about the type is lost until it is recast presuming it’s the desired type. The compiler won’t throw a warning if you sort the first array with a wrapper around strcmp. Nor will it warn you if you give a wrong number of elements or element size.

In our printf implementation, we can pass in both a file and a buffer to write to, along with a output function and the needed format string and list of arguments:

static int _vfsprintf(FILE *stream, char *buffer, size_t size, int (*submit)(char *, char *, FILE *), const char *format, va_list va);

Since the FILE stream, the buffer, and the size are only used inside the submit function, they can be set to null and ignored in the output handlers.

If you could write a static/dynamic buffer in multiple distributed steps using a function pointer and a pointer to the start of the buffer alone, this and a few wrapper functions would be the end of it. Unfortunately, doing this in C requires state.

# OOP in C

Writing a buffer in multiple steps with a function requires you to continue from where you left off. Keeping track of where you left off is trivial:

struct bufinfo {
	int idx;
	size_t len;
	char *buffer;
};

This needs to be carried with you the entire time you’re writing to the buffer. Which can be done by passing in a void pointer in containing the state that the output function needs:

static int __submit_vfprintf(FILE *stream, char *str, int (*submit)(char *, FILE *, char *, void *), const char *format, va_list va, void *initstate)

The pointer initstate is passed to the output function, which uses it to write to the buffer then modifies the information inside it for the next time the function is called. This is a way of keeping state that is neither local nor global.

static int vfp_strapp_submit(char *str, FILE *ignored, char *ignoredalso, void *bufferinfo) {
	struct bufinfo *bi = bufferinfo;
	size_t len = strlen(str);
	size_t left = bi->len - bi->idx;
	if (len > left) {
		return 0;
	}
	memcpy(bi->buffer+bi->idx, str, len);
	bi->idx += len;
	bi->buffer[bi->idx + 1] = '\0';
	return len;
}

Thus, an implementation of vsnprintf (and by extension sprintf and snprintf) can be made with a few lines of boilerplate code:

int vsnprintf(char *s, size_t n, const char *format, va_list arg) {
	struct bufinfo bufinfo;
	bufinfo.buffer = s;
	bufinfo.len = n;
	bufinfo.idx = 0;
	return __submit_vfprintf(NULL, NULL, vfp_strapp_submit, format, arg, &bufinfo);
};
联系我们 contact @ memedata.com