语义压缩 (2014)

语义压缩 (2014)
Semantic Compression (2014)

原始链接: https://caseymuratori.com/blog_0015

在添加新功能之前，作者优先简化了底层代码，以便更容易扩展。他们将编程视为两个部分：定义*需要做什么*，以及*如何高效地表达它*。然而，真正的效率并非关于优化的代码速度，而是最小化整个开发生命周期中的*人力成本*——包括打字、调试、修改和适应。作者提倡一种“压缩导向”的方法，类似于字典压缩器如PKZip。他们不是预先创建可重用的代码，而是首先专注于编写特定的解决方案。可重用性是在*识别出重复代码之后*才出现的——有效地“压缩”它。这避免了在可能不需要的抽象上浪费精力。当出现重用机会时，代码要么按原样使用，要么经过深思熟虑地修改/分层。这产生简洁、易读的代码，它反映了问题的自然语言，并且更容易维护和扩展。这种自下而上的方法，从具体细节开始，避免了过早架构规划的陷阱，并最终导向更健壮、更高效的代码库。

一个黑客新闻的讨论围绕着凯西·穆拉托里2014年关于“语义压缩”的文章。核心思想，正如评论者们所共鸣的，是一种务实的编码方法：优先让事物*工作*，然后再优化抽象或消除重复。几位用户表示同意这种“欠工程化”的理念，他们认为修复简单的、可工作的代码比理清复杂的、过早优化的系统更容易。一位评论员指出，这篇文章的简洁性得益于缺乏像数据库或业务需求这样的实际约束。另一些人指出，语义压缩并非面向对象编程（OOP）的简单替代方案，而且从头开始构建项目比改造现有的、可能混乱的代码库更容易进行压缩——尤其是在协作环境中。最终，这篇文章似乎对鼓励开发者优先考虑功能和清晰度而非抽象设计产生了影响。

原文

So, before I started adding lots of new buttons, I already felt like I should spend a little time working on the underlying code to make it simpler to add new things. Why did I feel that way, and how did I know what “simpler” means in this case?

I look at programming as having essentially two parts: figuring out what the processor actually needs to do to get something done, and then figuring out the most efficient way to express that in the language I’m using. Increasingly, it is the latter that accounts for what programmers actually spend their time on: wrangling all those algorithms and all that math into a coherent whole that doesn’t collapse under its own weight. So any experienced programmer who’s any good has had to come up with some way — if even just by intuition — of thinking about what it means to program efficiently. By “efficiently”, this doesn’t just mean that the code is optimized. Rather, it means that the development of the code is optimized — that the code is structured in such a way so as to minimize the amount of human effort necessary to type it, get it working, modify it, and debug it enough for it to be shippable. I like to think of efficiency as holistically as possible. If you look at the development process for a piece of code as a whole, you won’t overlook any hidden costs. Given a certain level of performance and quality required by the places the code gets used, beginning at its inception and ending with the last time the code is ever used by anyone for any reason, the goal is to minimize the amount of human effort it cost. This includes the time to type it in. It includes the time to debug it. It includes the time to modify it. It includes the time to adapt it for other uses. It includes any work done to other code to get it to work with this code that perhaps wouldn’t have been necessary if the code were written differently. All work on the code for its entire usable lifetime is included. When considered in this way, my experience has led me to conclude that the most efficient way to program is to approach your code as if you were a dictionary compressor. Like, literally, pretend you were a really great version of PKZip, running continuously on your code, looking for ways to make it (semantically) smaller. And just to be clear, I mean semantically smaller, as in less duplicated or similar code, not physically smaller, as in less text, although the two often go hand-in-hand. This is a very bottom-up programming methodology, a pseudo-variant of which has recently gained the monicker “refactoring”, even though that is a ridiculous term for a number of reasons that are not worth belaboring at the moment. I also think that the formal “refactoring” stuff missed the main point, but that’s also not worth belaboring. Point being, they are sort-of related, and hopefully you will understand the similarities and differences more over the course of this article series. So what does compression-oriented programming look like, and why is it efficient? Like a good compressor, I don’t reuse anything until I have at least two instances of it occurring. Many programmers don’t understand how important this is, and try to write “reusable” code right off the bat, but that is probably one of the biggest mistakes you can make. My mantra is, “make your code usable before you try to make it reusable”. I always begin by just typing out exactly what I want to happen in each specific case, without any regard to “correctness” or “abstraction” or any other buzzword, and I get that working. Then, when I find myself doing the same thing a second time somewhere else, that is when I pull out the reusable portion and share it, effectively “compressing” the code. I like “compress” better as an analogy, because it means something useful, as opposed to the often-used “abstracting”, which doesn’t really imply anything useful. Who cares if code is abstract? Waiting until there are (at least) two examples of a piece of code means I not only save time thinking about how to reuse it until I know I really need to, but it also means I always have at least two different real examples of what the code has to do before I try to make it reusable. This is crucial for efficiency, because if you only have one example, or worse, no examples (in the case of code written preemptively), then you are very likely to make mistakes in the way you write it and end up with code that isn’t conveniently reusable. This leads to even more wasted time once you go to use it, because either it will be cumbersome, or you will have to redo it to make it work the way you need it to. So I try very hard to never make code “prematurely reusable”, to evoke Knuth. Similarly, like a magical globally optimizing compressor (which sadly PKZip isn’t), when you are presented with new places where a previously reused piece of code could be reused again, you make a decision: if the reusable code is already suitable, you just use it, but if it’s not, you decide whether or not you should modify how it works, or whether you should introduce a new layer on top of or underneath it. Multiresolution entry points are a big part of making code resuable, but I’ll save discussion of that for a later article, since it’s a topic unto itself. Finally, the underlying assumption in all of this is, if you compress your code to a nice compact form, it is easy to read, because there’s a minimal amount of it, and the semantics tend to mirror the real “language” of the problem, because like a real language, those things that are expressed most often are given their own names and are used consistently. Well-compressed code is also easy to maintain, because all the places in the code that are doing identical things all go through the same paths, but code that is unique is not needlessly complicated or separated from its use. Finally, well-compressed code is easy to extend, because producing more code that does similar operations is simple, as all the necessary code is there in a nicely recomposable way. These are all things that most programming methodologies claim to do in an abstract fashion (build UML diagrams, make class hierarchies, make systems of objects, etc.), but always fail to achieve, because the hard part of code is getting the details right. Starting from a place where the details don’t exist inevitably means you will forget or overlook something that will cause your plans to fail or lead to suboptimal results. Starting with the details and repeatedly compressing to arrive at the eventual architecture avoids all the pitfalls of trying to conceive the architecture ahead of time. With all that in mind, let’s take a look at how all this can be applied to the simple Witness UI code.

语义压缩 (2014) Semantic Compression (2014)

语义压缩 (2014)
Semantic Compression (2014)