MOS 6502非法操作码的工作原理——迈克尔·施泰尔
How MOS 6502 Illegal Opcodes Work (2008)

原始链接: https://www.pagetable.com/?p=39

6502处理器,例如在Commodore 64和NES等经典系统中使用的处理器,除了其定义的指令集之外,还包含一些未公开的“非法”操作码。这些操作码的出现源于6502使用可编程逻辑阵列(PLA)解码指令的方式。 PLA的线路根据操作码和时钟周期触发特定的动作。未定义的操作码虽然缺乏专用的PLA线路,但由于其位模式与有效指令相似,可能会触发现有线路。这会导致意想不到的行为,通常会组合多个指令的功能。例如,“LAX”操作码($AF)实际上同时执行LDA和LDX,因为其位模式同时触发了这两种加载操作的PLA线路。 此外,某些未定义的操作码,称为“KIL”操作码,会使CPU停止运行。当PLA未能重置指令周期计数器时,就会发生这种情况,这会阻止任何进一步的指令执行并禁用中断,需要完全复位才能恢复。研究这些非法操作码可以深入了解6502的内部架构。

Hacker News 上的一个帖子讨论了 2008 年一篇关于 6502 非法指令的文章,引发了人们对复古科技,特别是 6502 时代的 renewed interest 的讨论。许多用户 fondly recall 在 Commodore 64 和 Atari 800 等计算机上学习编程的经历,并强调了那些详尽的手册是如何帮助他们深入理解计算机架构和汇编语言的。一些人惋惜现代计算变得过于封装,阻碍了新手进行类似的探索。讨论还涉及到 Ben Eater 的 6502 构建项目以及通过旧系统向孩子们介绍计算的愿望。一位用户提到了 8088、6502 和 8086 汇编语言编程的差异。帖子中还有一个链接指向 Michael 的博客“C64终极指南”。

原文

The original NMOS version of the MOS 6502, used in computers like the Commodore 64, the Apple II and the Nintendo Entertainment System (NES), is well-known for its illegal opcodes: Out of 256 possible opcodes, 151 are defined by the architecture, but many of the remaining 105 undefined opcodes do useful things.

Many articles have been written to test and document these, but I am not aware of any article that tries to explain where exactly they come from. I’ll do this here.

The Block Diagram

Every 6502 data sheet comes with a block diagram, but these are of no use, because they are oversimplified, partially incorrect, and don’t explain how instruction decoding works. The following more detailed diagram is a lot more useful:

(Original from Apple II things)

The Decode ROM (PLA)

There is no need to understand the whole diagram. The important part is on the left: The instruction register, which holds the opcode, and the current clock cycle within the instruction (T0 to T6) get fed into a 130×21 bit decode ROM, i.e. a ROM with 130 lines of 21 bits each. On the die shot, this is the green area on the bottom.

(Original from Molecular Expressions)

While other CPUs from the same era used microcode to interpret the instruction, the 6502 had this 130×21 bit PLA. All lines of the PLA compare the instruction and the current clock cycle, and if they match, the line fires. A little simplified, every line looks like this:

ON bits OFF bits timing
7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 T6 T5 T4 T3 T2 T1

(See the diagrams at http://impulzus.sch.bme.hu/6502/ for details; partial English translation of the website here).

  • “ON bits” specifies, which bits need to be set for this line to fire.
  • “OFF bits” specifies, which bits need to be clear for this line to fire.

The opcode table of the 6502 is laid out in a way that you can find easy rules to generalize the effects of similar opcodes. For example, the branch opcodes are encoded like this:

%aab10000

where “aa” is the condition (00=N, 01=V, 10=C, 11=Z) and “b” decides whether the branch is taken on a set or a clear flag.

So the following line would fire on the first cycle of any branch:

ON bits OFF bits timing
7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 T6 T5 T4 T3 T2 T1
0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 1

From now on, let’s write it differently, so that it’s more readable:

mask cycle description
XXX10000 T1 T1 of Bcc: fetch branch offset

If a line fires, it outputs a “1”. The “Random Control Logic” that can seen in the diagram then AND/OR-combines some lines and feeds the result into various components of the CPU: In the case of a branch, this would result in fetching the branch offset, for example.

One line can fire for several opcodes that are similar in their encoding and thus their behavior: For example, “LDA abs”, “ORA abs” and “AND abs” all do the same thing (fetch the low byte of the address) in T1, so there can be a line that matches all these opcodes and causes a memory fetch and a PC increment. Also, multiple lines can fire at the same time for any given cycle within an instruction, which will have the combined effect of the single lines.

LDA and LDX becomes LAX

Now there are many undefined opcodes. The designers of the 6502 have not created any specific PLA lines for them, but since their opcodes are similar to well-defined opcodes, there might be lines that fire nevertheless.

Let’s take opcode $AF for example, which is “LAX absolute”. It loads a value from an absolute address in memory and stores it in A and X at the same time. This is somewhat the combination of opcodes $AD (LDA abs) and $AE (LDX abs).

The instructions “LDA/LDX/LDY abs” ($AC/$AD/$AE) consist of four cycles:

  • The first cycle fetches the low byte of the address.
  • The second cycle fetches the hgh byte of the address.
  • The third cycle fetches the address from memory and stores it in A/X/Y.
  • The fourth cycle fetches the next instruction.

Cycles T1, T2 and T4 are identical for all three of them, and they are encoded smilarly, so the following three PLA lines can be used to detect these instructions and signal the rest of the CPU to carry out the specific tasks:

mask cycle description
101011XX T1 T1 of $AC/$AD/$AE: fetch addr/lo
101011XX T2 T2 of $AC/$AD/$AE: fetch addr/lo
101011XX T4 T4 of $AC/$AD/$AE: fetch next opcode

The mask %101011XX doesn’t only fire for $AC/$AD/$AE, but also for the undefined opcode $AF: So $AF (LAX) behaves the same as LDA/LDX/LDY in T1/T2/T4, i.e. it fetches a 16 bit address and in the end fetches the next opcode.

T3 differs in all three cases, so it has to be handled by one separate line per case:

mask cycle description
10101100 T3 T3 of $AC: read into Y
101011X1 T3 T3 of $AD: read into A
1010111X T3 T3 of $AE: read into X

(Actually, the lines in the actual PLA might be less specific, i.e. contain more X bits, since there are similar instructions like “ORA absolute” that might share this line.)

The line for $AC is only true for the exact value of $AC, but the $AD and $AE lines have one “don’t care” bit each. The bitfield of $AF, which is %10101111, is true for both masks, so in T3 of $AF, both the $AD and the $AE lines fire.

In T3, LDA/LDX/LDY have in common that they all read from memory and put the result onto the internal “SB” bus. “LDA” also sets the “SB->AC” control line to “1”, which will make the accumulator read its value from SB. Likewise, LDX causes “SB->X” to be “1” and makes X to read from the SB bus, and LDY reads SB into the Y register.

Since both the LDA and the LDX lines fire, both the accumulator and the X register will be sent the command to load their values from the SB bus, so $AF is effectively an LAX: Load Accumulator and X.

The KIL Opcodes

There are many “KIL” opcodes, i.e. opcodes that stop the CPU, so that it can only recover using a RESET, and not even an IRQ or an NMI.

In order to understand this, let’s look at the different states an instruction can be in. After the instruction fetch, the CPU is in cycle T1. It will feed the opcode and the cycle number into the PLA and cause the rest of the CPU to carry out whatever has to be done in this cycle, according to the PLA. Then it will shift the T bitfield left by one, so the T2 line will be “1”, then line T3 and so on. There are seven T lines total, T1 to T7. At the end of each instruction, the PLA causes the T bitfield to reset, so that the next instruction starts with T1=1 again.

But what happens if T does not get reset? This can happen if in all seven states of T, no line fires that actually belongs to an instruction that ends at this cycle. T gets shifted left until state T7, in which another shift left will just shift the 1 bit out of T – all bits of T will be zero then, so no PLA line can fire any more.

All interrupt and NMI requests are always delayed until the current instruction is finished, i.e. until T gets reset. But since T never gets reset, all interrupts and NMIs are effectively disabled.

What’s next?

There are many illegal opcodes, some with very weird behavior, and some that have been documented as unstable. Studying all these can reveal many interesting details about the internal design of the 6502.

联系我们 contact @ memedata.com