糟糕的科学代码胜过遵循“最佳实践”的代码(2014)
Bad scientific code beats code following \"best practices\" (2014)

原始链接: https://yosefk.com/blog/why-bad-scientific-code-beats-code-following-best-practices.html

在计算机科学术语中,低级特指由微处理器或中央处理单元(CPU)等硬件设备直接执行的指令,通常以机器语言、二进制表示或八进制表示法表示。 与汇编、COBOL 或 FORTRAN 等高级语言相反,它们利用抽象语法结构进行指令排序。 低电平还意味着紧密的物理接近,例如在电路板或集成芯片内,其中逻辑门之间传播的信号在低于电源电压的电压下工作。 相反,高级别意味着涉及更高电压信号级别的操作或高于物理实现的抽象逻辑推理。 请注意,这些术语虽然常用但不精确,但它们代表了广泛的类别,而不是狭窄的技术区别,因为自然语言固有的歧义受到人类和计算机系统之间不一致的解释和语言符号学限制的影响。 在特定上下文中,诸如低级或高级之类的词可能与计算或电子工程没有明确的关系。 一些作者以比喻、隐喻、诗意、幽默或讽刺的方式使用这些短语,基于隐含的文化或历史典故、政治意识形态或流行的误解,引发超出预期主要含义的联想。 因此,解释它们的用法需要对特定主题、领域、学科、主题、主题、流派或传统的广泛了解,通常需要跨多个学术机构的交叉引用整合。 *有关该术语的其他用法的参考,请参阅:http://wiki.c2.net/LowLevel。* 根据提供的文本材料,生成与计算或电气工程中的低级编程或电子工程相关的同义短语或概念列表。

然而,在粒子物理等一些科学领域,编程起着至关重要的作用,而且越来越明显的是,掌握这两个领域对于成功至关重要。 例如,研究人员目前正在探索机器学习和人工智能等先进技术,这需要物理和编程方面的重要专业知识。 因此,如前所述,与这两个领域的技术人员合作变得至关重要。 最终,鉴于物理学和计算之间的协同作用增强了我们产生新见解和进一步突破界限的能力,因此需要采取整体方法来解决复杂的科学问题。 虽然领域专家专注于物理特定方面,但熟练的计算机专家有助于解决资源限制、优化目标、计算建模策略、可视化方法和其他计算密集型问题,使科学家能够对各种现象产生更深入、更广泛的见解。
相关文章

原文

I've just read "The Low Quality of Scientific Code", which claims that code written by scientists comes out worse than it would if "software engineers" were involved.

I've been working, for more than a decade, in an environment dominated by people with a background in math or physics who often have sparse knowledge of "software engineering".

Invariably, the biggest messes are made by the minority of people who do define themselves as programmers. I will confess to having made at least a couple of large messes myself that are still not cleaned up. There were also a couple of other big messes where the code luckily went down the drain, meaning that the damage to my employer was limited to the money wasted on my own salary, without negative impact on the productivity of others.

I claim to have repented, mostly. I try rather hard to keep things boringly simple and I don't think I've done, in the last 5-6 years, something that causes a lot of people to look at me funny having spent the better part of the day dealing with the products of my misguided cleverness.

And I know a few programmers who have explicitly not repented. And people look at them funny and they think they're right and it's everyone else who is crazy.

In the meanwhile, people who "aren't" programmers but are more of a mathematician, physicist, algorithm developer, scientist, you name it commit sins mostly of the following kinds:

  • Long functions
  • Bad names (m, k, longWindedNameThatYouCantReallyReadBTWProgrammersDoThatALotToo)
  • Access all over the place – globals/singletons, "god objects" etc.
  • Crashes (null pointers, bounds errors), largely mitigated by valgrind/massive testing
  • Complete lack of interest in parallelism bugs (almost fully mitigated by tools)
  • Insufficient reluctance to use libraries written by clever programmers, with overloaded operators and templates and stuff

This I can deal with, you see. I somehow rarely have a problem, if anyone wants me to help debug something, to figure out what these guys were trying to do. I mean in the software sense. Algorithmically maybe I don't get them fully. But what variable they want to pass to what function I usually know.

Not so with software engineers, whose sins fall into entirely different categories:

  • Multiple/virtual/high-on-crack inheritance
  • 7 to 14 stack frames composed principally of thin wrappers, some of them function pointers/virtual functions, possibly inside interrupt handlers or what-not
  • Files spread in umpteen directories
  • Lookup using dynamic structures from hell – dictionaries of names where the names are concatenated from various pieces at runtime, etc.
  • Dynamic loading and other grep-defeating techniques
  • A forest of near-identical names along the lines of DriverController, ControllerManager, DriverManager, ManagerController, controlDriver ad infinitum – all calling each other
  • Templates calling overloaded functions with declarations hopefully visible where the template is defined, maybe not
  • Decorators, metaclasses, code generation, etc. etc.

The result is that you don't know who calls what or why, debuggers are of moderate use at best, IDEs & grep die a slow, horrible death, etc. You literally have to give up on ever figuring this thing out before tears start flowing freely from your eyes.

Of course this is a gross caricature, not everybody is a sinner at all times, and, like, I'm principally a "programmer" rather than "scientist" and I sincerely believe to have a net positive productivity after all – but you get the idea.

Can scientific code benefit from better "software engineering"? Perhaps, but I wouldn't trust software engineers to deliver those benefits!

Simple-minded, care-free near-incompetence can be better than industrial-strength good intentions paving a superhighway to hell. The "real world" outside the computer is full of such examples.

Oh, and one really mean observation that I'm afraid is too true to be omitted: idleness is the source of much trouble. A scientist has his science to worry about so he doesn't have time to complexify the code needlessly. Many programmers have no real substance in their work – the job is trivial – so they have too much time on their hands, which they use to dwell on "API design" and thus monstrosities are born.

(In fact, when the job is far from trivial technically and/or socially, programmers' horrible training shifts their focus away from their immediate duty – is the goddamn thing actually working, nice to use, efficient/cheap, etc.? – and instead they declare themselves as responsible for nothing but the sacred APIs which they proceed to complexify beyond belief. Meanwhile, functionally the thing barely works.)

联系我们 contact @ memedata.com