周末项目：用 C 变得愚蠢

周末项目：用 C 变得愚蠢
Weekend projects: getting silly with C

原始链接: https://lcamtuf.substack.com/p/weekend-projects-getting-silly-with

C 语言的简单性和表现力：一个奇迹尽管 C 语言很复杂，但它以其简单性和开发整个操作系统的能力而脱颖而出。它的影响力超越了操作系统开发，多年来塑造了 Java 和 Go 等许多流行编程语言的语法。 C 简洁而强大的语法激发了一种称为代码混淆的艺术形式。代码混淆竞赛，特别是国际混淆代码竞赛 (IOCCC)，展示了令人困惑的 C 代码提交，其中充满了晦涩的预处理器宏、神秘的格式、误导性的变量名称和复杂的逻辑运算。然而，这些竞赛往往对理解具有挑战性，而不是激发人们对 C 语言本身的钦佩。我们不关注困难的例子，而是讨论一些关于 C 的鲜为人知的事实。例如，“switch (...)”语句由于需要花括号而可能看起来很复杂。然而，它相当于“if (...”或“for (...”语句。虽然违反直觉，但即使不使用花括号，这种开关结构也可以正确运行，如下所示： ```c 开关（一） { 情况 1：puts("i = 1"); // 当 'i' 等于 1 时执行情况 2: put("i = 2"); // 当 'i' 等于 2 时执行 } ````` 尽管它的用法很常见，但这个快捷方式基本上没有被使用，因为它只能容纳每个 switch 结构的单独语句，这使得它无效。尽管存在这样的限制，了解它的存在可以为 C 编程世界提供有价值的见解。同样令人着迷的是，case 标签的顺序并不是在 switch 语句的相应块中一成不变的，如下所示： ```c 开关（一） { if (0) 情况 0: put("i = 0"); if (0) 情况 1：puts("i = 1"); if (0) 情况 2: put("i = 2"); } ````` 这里，每个 put 语句对应于各自的案例编号。请注意，没有任何中断来阻止失败，这可能会导致意外的结果

C 和 C++ 具有相似的基本概念，但在未定义的行为和优化方面存在差异。在 C 语言中，未定义行为的存在不会限制可观察效果内的优化（也称为“假设”规则）。然而，重要的是要记住，时间旅行，改变早期不可观察的影响，并不是未定义行为所独有的。它可能是由于任何导致意外结果的因素而发生的。一种常见的误解是编译器可以根据假定条件任意删除代码段，例如“A 永远不会为真”。当处理未定义的行为时，这个假设不一定成立。具体来说，在未定义的行为（例如输入/输出语句）之前删除输入是不允许的，因为它会改变可观察的效果。让我们讨论一个涉及 C 和 C++ 函数“foo”的示例；该函数包括一个条件检查，然后是潜在的未定义行为。 ````cpp // C 和 C++ 中 int foo(int x) 的函数定义 int foo(int x){ // 语句 A：在未定义行为之前打印可观察的效果 printf("%d\n", x); // 语句B：刷新输出缓冲区 fflush（标准输出）； // 语句C：执行未定义的行为返回 1/x； } ```` 尽管编译器可能不会直接删除“printf”或“fflush”，但它可能会在编译期间重新排列它们的顺序，从而影响可观察的结果。确保适当的诊断或错误消息以防止意外后果非常重要。通过强制执行这些约束，我们可以保持编码意图和编译输出之间的一致性。

原文

For all its warts, the C language is a marvelous thing. It is remarkably simple, yet somehow expressive enough to allow entire operating systems to be written with ease. Just as curiously, its terse, minimalistic syntax became the way to structure code — copied by nearly all of its mainstream successors, from Java to Go.

Among geeks, the syntax can also be credited for the emergence of code obfuscation as an art form. The IOCCC contest is perhaps the best-known outlet for this craft; a typical IOCCC submission looks like this:

#define			      q [v+a]
#define			     c b[1]
#define			    O 1 q
#define			   o 0 q
#define			  r(v,a\
)v<0&&(			 v*=-1,		a*=-1);
#define			p(v,m,	    s,w)*c==*#v?2 q\
<m?(c++		       ,d=1,3	   q=0,5      q=m,main\
(a+3,b)		      ,o=o*s	 q,O=O*		 w q):0:
static		     d,v[99	];main		  (int a,
char**b		    ){d=7;     if(*c?!		  (p(+,3
,4 q+O*		   3,4)p(			   -,(o?3
:(O=1,6		  )),4 q			  -O*3,4)
p(*,4,3		 ,4)p(/				  ,5,4,3)
p((),d,		0+3,0+				 04)*c==
')'?2 q	       <02?(c				++,0):0
:(o=012	      *o+*c-			      '0',c++
,O=1)):	     2 q?3-			   2:printf(
"%d/%d"	    "\n",o		       ,O))return
1;d=a,r    (o,d)r		     (O,d)3 q
=o<O?(4	  q=o,O)		   :(4 q=O,
	 o);r(d,		 o)a+=3;O?
				 1:(O=1,2
				q=1);while
				(2 q=o%1 q)a++;v[d]/=O;d[
				v+1]/=O;return main(d,b);}

There’s plenty to admire about the winning IOCCC entries, but they’re usually not fun to study: they tend to rely on confusing preprocessor macros, nonsensical formatting, unhelpful variable names, and simple logic encoded as obtuse arithmetic expressions that need to be reverse-engineered back into normal code.

This is unfortunate; the C language can easily confound seasoned developers without being hard to read. To illustrate, consider the humble switch (…) statement:

  switch (i) {
    case 0: puts("i = 0"); break;
    case 1: puts("i = 1"); break;
    case 2: puts("i = 2"); break;
  }

There are very few C developers who realize that switch (…) is no different from if (…) or for (…) in that it doesn’t actually need curly brackets. This will compile just fine:

  switch (i) case 1: puts("i = 1");

Such switch (…) notation is unheard-of and never encountered in real life simply because it defeats the purpose: without angle brackets, you can only have one statement riding on the coattails. In other words, this will not work:

   switch (i)
     case 1: puts("i = 1");
     case 2: puts("i = 2"); ← ERROR: no longer in switch (...)

Oh well!

On a seemingly unrelated note, let’s ponder the actual mechanics of switch (…): in essence, it’s a glorified goto. It jumps to the matching case label, but it doesn’t care about what’s going on in between the curly brackets; it’s a code block like any other:

  switch (i) {
    int a = 123;
    puts("This code is unreachable!");
    default: printf("a = %d\n", a);
  }

The above example should print the value of a, but it won’t be initialized to 123 (in fact, you technically get undefined behavior). If you don’t believe me, you can try it out here.

Just as unexpectedly, case labels don’t really need to appear top-level in their associated switch (…) block. In particular, this code works perfectly fine (link):

  switch (i) {
    if (0) case 0: puts("i = 0");
    if (0) case 1: puts("i = 1");
    if (0) case 2: puts("i = 2");
  }

Note that in this example, you don’t need break statements to avoid fallthrough; the code unconditionally jumps to the appropriate case label, skipping over the preceding if (0); but once the relevant puts(…) is executed, all subsequent calls are gated behind the remaining, perpetually-false if (0) conditionals.

But wait, there’s more! Recall that if can be chained with else — and that syntactically, the entire blob functions as a single top-level statement:

if (one_thing) do_one_thing; else do_another_thing;

So… without further ado, I present you the following curly-bracket-free monstrosity that combines all the quirks we discussed so far (link):

#include <stdio.h>

int main() {
 
  int i = 1;

  switch (i)

         if (0) case 0:        puts("i = 0");
    else if (0) case 1 ... 10: puts("i = 1 ... 10");
    else if (0) case 11:       puts("i = 11");
    else if (0) default:       puts("i = something else");

  return 0;

}

And who needs switch (…), anyway? The && operator is a longstanding GNU extension that lets you get an address of a label; you can then goto to that address. Equipped with this knowledge, you can make your own switch (…) —with blackjack, et cetera (link):

#include <stdio.h>

int main() {
 
  int i = 1;

  goto *(void*[]){ &&case_0, &&case_1, &&case_2 }[i];

  if (0) case_0: puts("i = 0");
  if (0) case_1: puts("i = 1");
  if (0) case_2: puts("i = 2");

  return 0;

}

Heck, here’s another fantastic use for &&: why bother with for (…) if you can use labels to implement loops directly within variable declarations? Check this out (link):

#include <stdio.h>

int main() {

  /* Iterate from i = 0 to i = 5: */

  int i = (i = 0) & ({_:0;}) | printf("i = %d\n", i) * 
          (++i > 5) ?: ({goto *&&_;0;});

  return 0;

}

This last snippet is probably not UB-safe and is GCC-specific. But the point stands: you can write completely alien and befuddling code in C without making it unreadable.

If you liked this article, please subscribe! Unlike most other social media, Substack is not a walled garden and not an addictive doomscrolling experience. It’s just a way to stay in touch with the writers you like.

For a thematic catalog of posts on this site, click here.

周末项目：用 C 变得愚蠢 Weekend projects: getting silly with C

周末项目：用 C 变得愚蠢
Weekend projects: getting silly with C