为什么你最珍贵的知识是你无法言说的。

为什么你最珍贵的知识是你无法言说的。
Why the Most Valuable Things You Know Are Things You Cannot Say

原始链接: https://deadneurons.substack.com/p/why-the-most-valuable-things-you

## 专家知识的悖论尽管专家判断是可以通过学习获得的，但将其转移却出乎意料地困难。这并非矛盾，而是源于不同的学习模式：*指导*（通过语言进行显性知识传递）和*校准*（通过重复经验和反馈培养技能）。指导擅长传递事实和规则，而真正的专业知识——特别是“街头智慧”——则来自于通过实践校准复杂的、高维模型。专家不会有意识地权衡众多相互作用的变量，他们依靠的是经过数千次经验磨练的模式匹配。这种细微的理解过于复杂，难以用语言表达，而语言在处理组合交互方面存在困难。试图将这种“内隐知识”编码成规则，不可避免地会丢失关键信息，从而创建在不可预测情况下容易失效的脆弱系统。正规教育优先考虑可传递的“书本知识”，而真正的判断力则需要不可传递的经验学习。组织常常错误地将可读的证书优先于经验证的专业知识，导致系统技能下降，并在不可避免的失败之后陷入规则不断增加的循环。最终，有价值的专业知识抵制形式化；它通过个体校准获得，而非可扩展的指导。聪明人可以引导，但技能需要个人经验。

## 经验的不可言传的价值这个Hacker News讨论围绕一篇Substack文章，探讨了为什么最有价值的知识往往难以完全表达。核心观点是，真正的专业知识不仅仅在于知道*做什么*，更在于*如何做*——一种通过反复经验和与环境“校准”而形成的深刻的、直觉性的理解。这种“校准”在大脑中建立内部模型，变得几乎是自动的，并且与我们感知和互动世界的方式密不可分。虽然我们可以描述*过程*，但我们无法传递伴随具身知识而来的*感觉*或细微的判断。试图将这种专业知识编纂成文字往往会失败，因为它们缺乏生活经验的丰富性和维度。评论者将这与3D建模（直观的UI如Rhino3D难以复制）、体育裁判以及学习驾驶等领域联系起来。讨论涉及语言的局限性、学徒制的重要性以及转移内隐知识的挑战——强调了当原始专业知识丧失时，机构为何会“腐朽”。最终，文章表明，人类知识的很大一部分仍然未被记录，存在于个人之中，并通过实践而非仅仅通过指导来建立。

原文

There is an apparent contradiction at the heart of expertise. Expert judgement is learnable, in the sense that people demonstrably acquire it over time. It is also non-transmissible, in the sense that no expert can transfer their judgement to another person through explanation. If it was once learnable, why can it not be taught?

The resolution lies in a distinction between two fundamentally different modes of learning. The first is instruction: the transfer of explicit models, rules, and relationships from one person to another through language. The second is calibration: the development of internal models through repeated exposure to feedback in a specific environment. Judgement is learnable through calibration. It is not transmissible through instruction. These are different processes operating on different substrates, and conflating them is the source of the apparent contradiction.

To see why, we need to be precise about what “high-dimensional” means when applied to expert knowledge, because the concept is doing all the real work.

Consider a simple decision: should I cross the road? A rule-based encoding of this decision might operate on three variables: is a car visible, how fast is it moving, and how far away is it. These three dimensions are sufficient to produce a reasonable crossing decision most of the time. You could write this as an explicit rule, transmit it through language, and a person who had never crossed a road could apply it successfully in straightforward cases.

Now consider the actual model that an experienced pedestrian uses. They are integrating: the car’s speed, its acceleration (is it slowing down?), the road surface (wet or dry, affecting stopping distance), the driver’s apparent attentiveness (are they looking at their phone?), the car’s trajectory (drifting within the lane?), the presence of other cars that might obscure the driver’s view, the width of the road, their own walking speed today (are they carrying something heavy, are they injured?), the behaviour of other pedestrians (are they crossing confidently or hesitating?), the sound of the engine (accelerating or decelerating, even before the speed change is visible), the type of vehicle (a truck has different stopping characteristics than a bicycle), the time of day (affecting driver fatigue and visibility), and dozens of other variables they could not enumerate if asked.

This is perhaps thirty to fifty dimensions of input, processed simultaneously, producing a crossing decision in under a second. The experienced pedestrian is not consciously evaluating each variable. They are running a pattern-matching model that was calibrated over thousands of crossings, each of which provided feedback (safe crossing, near miss, honked at, had to run). The model works. It produces better decisions than the three-variable rule. It cannot be articulated.

Language is a serial, low-bandwidth channel. It transmits one proposition at a time, sequentially. Each proposition can relate a small number of variables: “if X and Y, then Z.” Complex conditionals can extend this to perhaps five or six variables before the sentence becomes unparseable: “if X and Y but not Z, unless W and V, then Q.”

The expert’s model does not operate through conditionals of this form. It operates through a continuous, nonlinear mapping from a high-dimensional input space to an output. The interaction effects between variables are where the real information lives. The road being wet matters differently depending on the car’s speed, which matters differently depending on the driver’s attentiveness, which matters differently depending on the time of day. These are not additive effects that can be listed sequentially. They are multiplicative interactions across many variables simultaneously.

The number of possible interaction effects grows combinatorially with the number of input variables. For fifty variables, the pairwise interactions alone number 1,225. Three-way interactions exceed 19,000. The expert’s model has been calibrated, through experience, to weigh the interactions that actually matter and ignore those that do not. This weighing is the expertise. It cannot be transmitted through language because language would need to enumerate and weigh each relevant interaction explicitly, and the number of relevant interactions exceeds what linguistic description can practically encode.

This is not a claim about human cognitive limitations. It is an information-theoretic claim about the channel capacity of natural language relative to the complexity of the models being transmitted. You could, in principle, write a book of ten thousand pages enumerating every interaction effect an expert pedestrian uses when crossing the road. The book would be unreadable, unapplicable in real time, and still incomplete, because some of the interactions are context-dependent in ways that would require yet another layer of meta-rules to capture. The recursion does not terminate. At some point, the only faithful encoding of the model is the model itself, running on neural hardware that was shaped by direct experience.

This framework captures the book smarts versus street smarts distinction, but it reframes it in a way that strips out the anti-intellectual connotation that usually accompanies the phrase.

“Book smarts” refers to knowledge and models that are low-dimensional enough to be transmitted through language. Formal education excels at transmitting these: mathematical relationships, historical facts, scientific theories, logical frameworks, grammatical rules. These models are powerful. They are also, by definition, the models simple enough to compress into linguistic form. The entire institution of formal education is built on the assumption that the important knowledge in a domain can be transmitted through instruction, which is true for some domains and false for others.

“Street smarts” refers to models that are too high-dimensional for linguistic transmission and were therefore acquired through calibrated experience. The street-smart person cannot explain why they know what they know, which makes them look inarticulate to the book-smart person, which leads the book-smart person to conclude that the street-smart person’s knowledge is inferior. This conclusion is precisely backwards in domains where judgement matters. The inability to articulate the model is not evidence of a crude model. It is evidence of a model too sophisticated for the transmission channel.

The cultural bias toward book smarts is a specific instance of a broader legibility problem. Book knowledge is legible: it can be tested, credentialed, and verified through examination. Experiential judgement is illegible: it cannot be examined, only observed in its outputs over time. Institutions that allocate authority based on legible credentials systematically promote book-smart people over street-smart people, which works in knowledge domains and fails catastrophically in judgement domains.

The tragedy is that the people making this allocation decision are themselves products of the book-smart selection process. They evaluate intelligence through the lens of articulacy and formal reasoning, because those are the dimensions on which they were selected. The experienced operator who makes correct decisions but cannot explain their reasoning in a boardroom-legible format looks unsophisticated, when in fact they are running a more complex model than the articulate strategist who can produce a compelling slide deck but whose actual predictive accuracy is no better than chance.

The deeper question is why calibration requires so many repetitions. If the expert’s model is, in some sense, a function mapping inputs to outputs, why can we not simply describe the function?

The answer has two parts. The first is that the expert does not know the function explicitly. Their model is implemented in neural weight configurations that produce correct outputs without representing the mapping in a form accessible to conscious inspection. This is not mysticism. It is the well-established property of neural networks, both biological and artificial, that they can approximate arbitrarily complex functions without representing those functions symbolically. The network “knows” the mapping in the sense that it produces correct outputs, but the knowledge is distributed across millions of connection weights, none of which individually encodes a meaningful proposition.

The second, more fundamental reason is that the relevant features are not given in advance. A large part of what the expert learns through experience is which features of the environment matter. The novice does not merely weigh the features incorrectly. They do not perceive the relevant features at all. The experienced driver does not just judge speed and distance better than the novice. They notice the subtle drift of a vehicle within its lane that indicates an inattentive driver, which is a feature the novice does not even encode as an input.

This is the deepest reason why experience cannot be compressed. You cannot transmit a model defined over features that the recipient does not perceive. Teaching someone to cross the road safely is not primarily about transmitting rules for weighing speed and distance. It is about developing the perceptual apparatus that detects lane drift, engine tone, driver gaze direction, and the subtle body language of other pedestrians. This perceptual development requires direct exposure. The features cannot be pointed to linguistically, because the pointing requires the recipient to already perceive the thing being pointed at.

This creates a bootstrapping problem. The expert cannot transmit their model because the recipient lacks the perceptual categories the model operates on. The recipient cannot develop those perceptual categories except through the experience that builds them. Instruction can accelerate the process by directing attention (”watch the driver’s eyes, not the car”), but it cannot replace it, because the instruction only makes sense to someone who has already begun developing the relevant perceptual sensitivity.

This suggests a hierarchy of knowledge types ordered by transmissibility.

At the most transmissible level: facts and explicit rules. “Water boils at 100°C at sea level.” Perfectly compressible into language. Perfectly transmissible through instruction.

Next: formal models and frameworks. “Net present value is the sum of discounted future cash flows.” Requires some background knowledge to receive but is fully transmissible once the prerequisite concepts are in place.

Next: heuristics derived from experience. “Be wary of founders who talk more about their competitors than their customers.” Partially transmissible. Useful as an attention-directing pointer for someone undergoing calibration. Misleading as a standalone rule, because the exceptions are as important as the rule and the exceptions cannot be enumerated.

At the least transmissible level: perceptual calibration itself. The ability to perceive the features that matter, to weigh them appropriately in context, to detect the subtle interactions that distinguish this situation from superficially similar ones. Not transmissible through any linguistic channel. Acquirable only through prolonged, feedback-rich exposure to the domain.

The conventional education system operates almost entirely in the top two levels. Apprenticeship operates primarily at the third level. The fourth level is acquired only through independent practice after apprenticeship, which is why even the best-trained professionals require years of post-training experience before their judgement becomes genuinely expert. The medical profession acknowledges this with residency. Most other domains pretend that graduation from the transmissible levels is sufficient, which is why those domains systematically underperform relative to the quality of their formal training.

The implications for how organisations handle expertise are severe. When organisations attempt to de-skill judgement domains through codification, through frameworks, checklists, decision trees, and process documentation, they are attempting to compress level-four knowledge (perceptual calibration) into level-two knowledge (formal models and frameworks). The compression is lossy in exactly the way that matters: it preserves the transmissible, legible components and discards the non-transmissible components that are the actual source of expert value.

The organisation then staffs the codified function with people trained at levels one and two, who apply the frameworks diligently and produce adequate results in routine cases. The expert who was replaced looks at the framework and sees everything it is missing, every interaction effect it ignores, every perceptual feature it fails to encode. The organisation looks at the framework and sees a successfully de-skilled function running at lower cost with more consistent outputs. Both observations are correct. They are measuring different things. The organisation is measuring performance in the normal cases the framework was designed for. The expert is measuring the fragility that accumulates in the non-routine cases the framework cannot handle.

The resolution only arrives when a non-routine case produces a catastrophic failure. Then, briefly, everyone agrees that expert judgement was undervalued. A review is conducted. Recommendations are made. The recommendations invariably take the form of additional rules and frameworks, because that is the only form of knowledge the organisation can process. The rules are added to the codified system. The system becomes more complex, more brittle, and less capable of handling the next non-routine case, which will be different from the last one in ways the new rules do not anticipate. The cycle repeats.

The correct conclusion is uncomfortable for anyone invested in the project of formalising knowledge. The most valuable forms of human expertise are precisely the forms that resist formalisation. They are learnable but not teachable, acquirable but not transmissible, demonstrable but not articulable. Every attempt to compress them into a transmissible format destroys the information that makes them valuable. The only reliable method of developing them, which is prolonged calibration through direct experience, is the one method that cannot be scaled, standardised, or accelerated beyond a modest degree.

Smart people, in other words, can point you in the right direction. They cannot carry you there. The carrying is something your own nervous system must do, one feedback-rich repetition at a time.

为什么你最珍贵的知识是你无法言说的。 Why the Most Valuable Things You Know Are Things You Cannot Say

为什么你最珍贵的知识是你无法言说的。
Why the Most Valuable Things You Know Are Things You Cannot Say