为什么Python的OrderedDict是有序的?
Why is Python's OrderedDict ordered?

原始链接: https://www.piglei.com/articles/en-why-is-python-ordereddict-ordered/

## Python 的 OrderedDict 如何保持顺序 虽然标准 Python 字典在 Python 3.7 中变得有序,但 `collections.OrderedDict` 仍然是一个有价值的工具,它提供向后兼容性以及独特的特性,例如将键的顺序视为相等的一部分,并提供 `move_to_end()` 等方法。 `OrderedDict` 通过结合标准字典(用于快速键值访问)和双向链表(用于跟踪插入顺序)来实现顺序保持。 新键被添加到这两个结构中:字典存储键值对,而链表维护顺序。 一个辅助字典将键映射到其对应的链表节点,从而实现链表内的 O(1) 查找。 诸如插入 (`__setitem__`) 和删除 (`__delitem__`) 之类的操作会更新字典和链表,以保持一致性。 迭代 (`__iter__`) 只是遍历链表,从而保证可预测的顺序。 该实现巧妙地使用 `weakref` 来避免链表内的引用循环,从而实现高效的垃圾回收。 它还利用 `object()` 作为 `pop()` 等方法中的唯一默认值,以可靠地区分缺失键和存在的键。 基本上,`OrderedDict` 以少量内存为代价,换取可预测的迭代和专门的功能。

这个Hacker News讨论围绕一篇名为“为什么Python的OrderedDict是有序的?”的文章展开——但评论者很快指出,文章实际上解释了*它是如何*有序的,而不是*为什么*有序。 争论的核心在于OrderedDict行为的历史。一些人认为,保持顺序最初是一个实现细节,用户依赖于它,最终导致Python 3.7对其进行了官方保证。另一些人则认为Python字典一直具有此属性,尽管不一致。 对话扩展到关于其他语言(如Swift,它出于安全原因使用随机排序,以及Go/Perl)中字典排序的更广泛讨论。一些评论者对保证的插入顺序表示沮丧,认为随机迭代是一种更有效和安全的方法,而另一些人则捍卫它提供的可预测性。标题具有误导性的一种可能解释归因于来自中文的自动翻译。
相关文章

原文

It is 2025, and debates about whether Python dictionaries preserve order have mostly vanished. Ever since Python 3.7 wrote “dictionaries preserve insertion order” into the language spec in 2018, developers have grown used to ordered dicts. The once unruly, unordered dictionary now feels as distant as Python 2.7—only recalled when veterans wax nostalgic.

Back when dictionaries could not preserve order, what did we use when we needed one that did? The answer was collections.OrderedDict.

However, now that the built-in dictionary keeps insertion order, OrderedDict might feel less essential. Still, as of version 3.14 it remains in the standard-library module collections, for a few reasons:

  1. Backward compatibility keeps legacy code unchanged.
  2. Different behavior: OrderedDict treats key order as part of equality, while the built-in dict does not.
  3. Extra features: OrderedDict offers methods such as move_to_end.
>>> d
OrderedDict([('a', 1), ('b', 2), ('c', 3)])
>>> d.move_to_end('a')
>>> d
OrderedDict([('b', 2), ('c', 3), ('a', 1)])    # 1
  1. move_to_end() moves the selected key to the end of the dictionary.

This article takes a closer look at the inner workings of OrderedDict and explains what it takes to implement an ordered dictionary in Python.

Note: The standard library ships both C and Python implementations of OrderedDict for different runtimes. The two implementations follow similar designs; this piece focuses on the Python version.

A Doubly Linked List and Another Dictionary

OrderedDict is an ordered dictionary: it behaves like a regular dict, but it preserves key order. Its implementation hinges on two ideas:

  1. Inherit from dict: the object itself stores key-value pairs, so self is just a {} with every dictionary operation ready to use.
  2. Add external data structures: maintain an ordered structure that tracks key order without slowing down dictionary operations.

Which structure should we pick to maintain order? A dictionary is a hash-table-based data structure that excels at storing and retrieving entries in O(1) time. That means our order-tracking structure must not slow down those constant-time operations. In short, maintaining order cannot make dictionary operations slower.

To meet that requirement, OrderedDict combines two structures: a doubly linked list and another dictionary.

  1. Doubly linked list: this ordered structure supports inserting and deleting nodes in O(1) time. Each node stores a key from the OrderedDict.
  2. Another dictionary: searching for a node in a linked list requires scanning nodes one by one, which averages O(n) time. To avoid that cost, OrderedDict keeps a second dictionary that maps keys to their linked-list nodes, enabling O(1) lookups.

The overall structure looks like this:

Figure: The internal structure of OrderedDict contains three parts: self (the dict that stores key-value pairs), self.root... (the doubly linked list), and self._map (the linked-list index).

Let’s examine the __setitem__ method to see how OrderedDict writes a key-value pair:

    def __setitem__(self, key, value,
                    dict_setitem=dict.__setitem__, proxy=_proxy, Link=_Link):
        'od.__setitem__(i, y) <==> od[i]=y'
        if key not in self:
            self.__map[key] = link = Link()  # 1
            root = self.__root
            last = root.prev
            link.prev, link.next, link.key = last, root, key  # 2
            last.next = link
            root.prev = proxy(link)  # 3
        dict_setitem(self, key, value)  # 4
  1. Create a linked-list node and store it in self.__map, so the key can retrieve the node quickly.
  2. Adjust the new node’s neighbors so it sits before root, effectively adding it to the tail of the list.
  3. Update the surrounding nodes—last (the old tail) and root—to finish the linked-list update.
  4. Mutate the dictionary itself.

If we run d["aa"] = 4, inserting a new entry, the structures change as follows:

Figure: Inserting the key-value pair "aa": 4 updates all three structures inside OrderedDict.

The doubly linked list, its index dictionary, and the dictionary itself all have to handle the new entry "aa": 4.

The __delitem__() method and pop() follow the same pattern: they mutate the dictionary and update the linked list plus its index. The mechanics mirror those in __setitem__, so we will not repeat them here.

To keep iteration ordered, OrderedDict also customizes __iter__:

def __iter__(self):
    'od.__iter__() <==> iter(od)'
    root = self.__root
    curr = root.next
    while curr is not root:
        yield curr.key
        curr = curr.next

Iterating over an OrderedDict is effectively iterating over the linked list. The while loop yields each node’s key in order.

Summary

By layering extra data structures on top of a regular dictionary, OrderedDict preserves key order. The combination of a doubly linked list and a dictionary index keeps insertion and deletion fast, trading a bit of memory for predictable iteration.

Interesting Details

While reading the implementation, a couple of fun details stood out.

1. Using weakref

Python’s garbage collector primarily relies on reference counting. Reference counting is simple and fast, but it cannot handle reference cycles on its own. Consider what happens when we append a node to the tail of our linked list:

  1. Set the new node’s next reference to the root (link.next = root).
  2. Update the root’s prev reference to the new node (root.prev = link).

Those assignments create a reference cycle between link and root. Each keeps the other alive by bumping the reference count, preventing timely reclamation.

To avoid that, OrderedDict uses the weakref module:

link.prev, link.next, link.key = last, root, key  # 1
last.next = link
root.prev = proxy(link)  # 2
  1. link and root establish one direction of the relationship via link.next.
  2. The other direction uses root.prev = proxy(link), where proxy comes from weakref.

Because weak references do not increase reference counts, they prevent the cycle and let the garbage collector reclaim nodes promptly.

2. Passing object() as a Default Value

Like the built-in dict, OrderedDict supports pop. The method removes a key and returns its value, or returns the provided default if the key does not exist:

>>> d = {"a": 1}
>>> d.pop("a", 42)
1
>>> d.pop("c", 42)
42  # "c" is missing, so return the default value 42.

Inside OrderedDict, pop must both remove the entry from the dictionary and update the doubly linked list. The core logic looks like this:

class OrderedDict(dict):

    __marker = object()

    def pop(self, key, default=__marker):
        marker = self.__marker
        result = dict.pop(self, key, marker)
        if result is not marker:
            # The same as in __delitem__().
            # Linked-list updates omitted ...

Notice that dict.pop(self, key, marker) uses marker as the default. marker is not magic—it is just the object() created when the class is defined.

Why choose object() here? The code needs to distinguish precisely between “key existed” and “key was missing.” A fresh object() is guaranteed not to appear in user data, making it a reliable sentinel.

Update: Change the title to "How Does Python’s OrderedDict Maintain Order?" from "Why is Python's OrderedDict ordered?"

联系我们 contact @ memedata.com