用数组语言思考

用数组语言思考
Thinking in an array language

原始链接: https://github.com/razetime/ngn-k-tutorial/blob/main/12-thinking-in-k.md

在 K 编程中，在处理现有代码库时，如果需要特定功能并在网上或其他地方找到，而不是复制和粘贴整个代码，更简单的方法可能是仅提取相关部分或语法并根据要求。然而，在需要直接转换已建立的算法的情况下，例如在实现矩阵乘法时，由于修改大量代码，可能会出现一些复杂情况。这些挑战之一是存在大量嵌套循环，导致全局变量的多次赋值。此外，多个嵌套循环内的修改会导致显着的复杂性。为了优化 K 中的矩阵乘法，一种方法可能涉及利用折叠操作来用单个操作替换内部循环及其各自的变量计算。删除不必要的变量计算并用更简单的数学运算代替复杂的数学运算，例如变换矩阵 B 以消除对中间计算结果的需要以及用隐式匹配代替显式循环，可以显着减少所需循环的数量。最后，将所有必要的操作组合在一起并完全删除中间计算可以提供最大程度的优化。当用 K 编写代码时，简化对于创建更简洁、更高效的语句至关重要。通过彻底分析每个组件，例如确定哪些变量用于什么目的，用更简单的替代方案替换复杂的循环或代码块或完全删除多余的变量，K 中的最佳编码实践可以极大地提高性能。总的来说，简化代码需要熟悉K的语法和能力，深入理解具体的算法和流程，才能实现最大的效率和易维护性。替代方法的示例包括创建临时数组来存储临时计算结果、将嵌套数组转换为扁平结构以及使用转置运算进行矩阵操作。转置操作允许将行转换为列，从而减少两个矩阵之间匹配元素所需的嵌套循环次数，从而使整体计算时间更短，处理速度更快。此外，通过显式地使矩阵 B 符合矩阵 A 的维度，而不是执行单独的转换任务，可以完全消除转置矩阵 B 的需要，从而节省额外的执行时间。最终，应用最佳编码技术（例如避免过多或冗余的计算过程以及尽可能简化代码）对于最大限度提高 K 编程效率至关重要。总体而言，K中的矩阵运算需要仔细考虑各种因素，包括算法选择、计算时间考虑以及效率

至少，它似乎给 Fortran 代码增加了过度的复杂性。相比之下，APL 的核心设计理念直接融入了简洁性和易用性。在Java中也是如此。如果我有一个代表小数分数的 BigDecimal 值列表，Java 会强制我使用字符串或数字文字来在这些值之间执行算术运算，除非我创建代表这些值进行数学运算的方法或类。有些库（例如 Apache Commons Math）提供了用于执行此转换过程的静态方法，但创建支持在这些值之间本机执行计算的自定义对象也是一种选择。此外，当具体处理矩阵时，在矩阵之间定义运算符的想法对于 APL 或一般面向数组的语言来说并不是独一无二的，因为 Fortran 已经提供了用于处理矩阵的内在函数。然而，虽然可以通过中缀表示法在 APL 中的矩阵之间轻松执行矩阵运算，但使用前缀或后缀表示法可能会导致 Fortran 中的代码看起来明显难看，这些代码往往严重依赖于复杂的函数调用，这些函数调用可能会让人感觉做作，以便于矩阵操作。这些内在函数和数组样式的赋值语法可以在 APL 中发挥巨大作用，用于数值计算、统计分析、科学建模和模拟等目的，为用户提供不仅仅是简单定义自定义类或数据类型来表示数学的选项 constructs。 Existing array languages are typically not limited in terms of data types in the manner suggested above in regards to Fortran。 Rather, they usually expose native array data types and primitive constructs that allow the programmer to intuitively interact with data directly through their respective operators or function call semantics, thereby avoiding unnecessary indirection layers introduced by object oriented encapsulation schemes that hide this functionality beneath wrappers。 *、//、**、^、% 和 \| 等运算符 are examples of shorthand syntactic sugar provided for free in APL that can sometimes replace hundreds of lines of Fortran code involving the creation and invocation of various kinds of wrapper objects in order to enable these same operations。在 Fortran 中，使用数字和索引经常会导致整数和指针索引技术的混合，这对初学者或直观友好来说并不友好；而在 APL 中，整数和索引本身都可以充当

原文

You can view the full source code for this chapter at GitHub.

Since you are now properly acquainted with K, let's do some programming.\nMost K programming happens through the REPL, because it is very useful to iterate upon previous code. ngn/k with rlwrap has history with the up/down arrow keys, and that should be more than enough to begin developing bigger programs in K. Functions are tested in the REPL, and then moved to actual code. Note that ngn/k's prettyprinting always returns valid k data, and you can precompute some things beforehand to speed up your program.

A K script is always executed like it was typed in the repl, that is: Each line is executed, and its return value is printed unless it ends with a semicolon. A script also allows multiline definitions, which are convenient for readability. Oftentimes, you may save your work in a script, and want to use it in a repl. In order to use your stored data and functions, just do \\l file.k in the repl, and your file will be executed, and its data will be loaded. You can load a file into the REPL more than once, overwriting older data. The repl help accessed with \\ lists more useful commands as well.

K programming (and array programming in general), is a continuous process of simplifying your patterns. A big, unwieldy pattern has one or more ways to condense to a smaller, more declarative, easy to read pattern. This is discussed in a lot of detail in Patterns and Anti-patterns in APL: Escaping the Beginner's Plateau - Aaron Hsu - Dyalog '17, if you'd like to understand it better.

A common problem most people have in K is the need to translate a common, well known algorithm to K, usually taken from a programming website like geeksforgeeks, or a Wikipedia article. Let us take an example: Matrix Multiplication.

From this wikipedia article, the iterative algorithm for matrix multiplication is as follows:

Input: matrices A and B\nLet C be a new matrix of the appropriate size\nFor i from 1 to n:\n  For j from 1 to p:\n    Let sum = 0\n    For k from 1 to m:\n      Set sum ← sum + Aik × Bkj\n  Set Cij ← sum\nReturn C\n

If you want, you can try translating this to K. A direct translation would be:

matmul: {\n  A::x\n  B::y\n  n::#A\n  m::#*A\n  p::#*B\n  C::(n;p)#0\n  i::0\n  j::0\n  k::0\n  sum::0\n  {\n    i::x\n    {\n      j::x\n      sum::0\n      {\n        k::x\n        sum::sum+A[i;k]*B[k;j] \n      }'!m\n      C[i;j]::sum\n    }'!p\n  }'!n\n  C}\n

This is the worst K code I've ever written, because we are trying to write K like an imperative language, and K doesn't work well with that design. The main problems are:

Many, many globals are assigned
multiple nested loops
lots of modification

Luckily, there are a lot of things we can simplify here, and we can address these problems one by one.

Let us begin at the innermost loop:

sum::0\n{\n  k::x\n  sum::sum+A[i;k]*B[k;j] \n}'!m\nC[i;j]::sum\n

The first and simplest fix we can make is summing using a fold (/).

C[i;j]::+/{\n  k::x\n  A[i;k]*B[k;j] \n}'!m\n

One global down, 9 more to go.

The next global we can remove is C. Since ' (each) returns an array, C doesn't need to be modified. We can simply return the value of the nested loop.

  {\n    i::x\n    {\n      j::x\n      +/{\n        k::x\n        A[i;k]*B[k;j] }'!m }'!p }'!n}\n

Now, we have three loops with no modification, which makes our job much easier. The main variables to look at now are i, j, and k.

i indexes each row of A.
j indexes each column of B.
k indexes each column of A and row of B.

Basically, k is responsible for pairing each row of A with each column of B, which are then multiplied. Hence, we can eliminate the middle man here, and directly match them without k. This also eliminates one loop, and removes the need for m.

{\n  j::x\n  +/A[i]*B[;j] }'!p }'!n}\n

Next, to remove j, we need to take each column of B and pair it with A[i]. To do this, we transpose B and pair each element with eachright (/:).

matmul: {\n  A::x\n  B::y\n  n::#A\n  i::0\n  {\n    i::x\n      A[i]{+/x*y}/:+B}'!n }\n

In order to remove i, we do a similar thing: Use eachleft to pair each row of A with each column of B.

matmul: {\n  A::x\n  B::y\n  A{+/x*y}/:\\:+B }\n

We need no more globals!

matmul: {x{+/x*y}/:\\:+y}\n

Now that is matrix multiplication in K. This is the most direct algorithmic conversion of matrix multiplication. Now we will look at ways to shorten it, and remove more loops.

+ (transpose) is costly, and we can remove it. What we are currently doing is naive. Instead of multiplying each row of x with each column of y, we can conform each row of B to the whole of A, doing the same thing implicitly.

\n\n

Now, we have a function which can be easily made tacit. With the rules from Chapter 3, we get our final result:

\n\n

A matrix multiplication function you can be proud of. This process may seem like it has a lot of steps, but condensing code will become much easier and intuitive as you practice your skill in K.

Matrix multiplication is a simple procedure which works well with K's array support. We will be seeing more algorithms that don't play well with K, and how to handle them in future chapters.

","renderedFileInfo":null,"shortPath":null,"tabSize":8,"topBannersInfo":{"overridingGlobalFundingFile":false,"globalPreferredFundingPath":null,"repoOwner":"razetime","repoName":"ngn-k-tutorial","showInvalidCitationWarning":false,"citationHelpUrl":"https://docs.github.com/en/github/creating-cloning-and-archiving-repositories/creating-a-repository-on-github/about-citation-files","showDependabotConfigurationBanner":false,"actionsOnboardingTip":null},"truncated":false,"viewable":true,"workflowRedirectUrl":null,"symbols":{"timed_out":false,"not_analyzed":false,"symbols":[{"name":"Thinking in an array language","kind":"section_1","ident_start":2,"ident_end":31,"extent_start":0,"extent_end":5488,"fully_qualified_name":"Thinking in an array language","ident_utf16":{"start":{"line_number":0,"utf16_col":2},"end":{"line_number":0,"utf16_col":31}},"extent_utf16":{"start":{"line_number":0,"utf16_col":0},"end":{"line_number":150,"utf16_col":0}}}]}},"copilotInfo":null,"copilotAccessAllowed":false,"csrf_tokens":{"/razetime/ngn-k-tutorial/branches":{"post":"A_UdSZXM_c5Elx5rqqi7DShIlhE0FOiQ9e1rznMWgCT_8mVPF1lJcpIQoF7bOAzBm5sxP4YYalL01Zl_mE8f8Q"},"/repos/preferences":{"post":"RbS36O10oajQT1Xsj-0P_qmxu1OOOfTROgxYeL7WuBYJIpQJDrThTv59qBdlRdpEl2c5ND2jUUb0fsQsZ8Niag"}}},"title":"ngn-k-tutorial/12-thinking-in-k.md at main · razetime/ngn-k-tutorial"}

用数组语言思考 Thinking in an array language

用数组语言思考
Thinking in an array language