为什么不看《黑客帝国4：矩阵重启》(2023)？

为什么不看《黑客帝国4：矩阵重启》(2023)？
Why not Matrix (2023)

原始链接: https://telegra.ph/why-not-matrix-08-07

## 矩阵：对“开放网络”的批判性审视矩阵旨在成为一个去中心化的通信平台，吸引来自 IRC、Discord 和 Slack 的用户，承诺一个永久免费和开放的网络。然而，深入研究显示出显著的潜在问题，引发了对其可靠性和安全性的担忧。虽然被呈现为聊天应用，但矩阵实际上是一个分布式图数据库，其中“房间”本质上是事件日志。这种设计虽然提供了灵活性，但也造成了问题：数据无法被真正删除——只能被“撤销”，且执行不可靠——并且存在历史操纵的可能性。进一步的问题包括容易受到垃圾邮件攻击，可能导致服务器性能瘫痪、消息顺序不一致以及脆弱的端到端加密，容易出现故障和数据泄露。不同服务器实现（如 Synapse 和 Dendrite）之间的技术不一致导致签名验证失败和频繁的“状态重置”——有效地破坏房间并失去版权控制。最后，媒体处理缺乏充分的身份验证，存在未经授权访问的风险、托管非法内容的潜在法律责任以及拒绝服务漏洞。尽管多年来不断有问题的报告，核心开发者似乎反应迟钝，这使得矩阵对于寻求稳定通信平台的新社区或企业来说是一个值得怀疑的选择。

## Hacker News 讨论：为什么不选择 Matrix (2023) 最近 Hacker News 的讨论围绕一篇批评 Matrix 开放网络，用于安全、去中心化通信的博文展开。该文章提出了许多问题，从数据删除的挑战到不同服务器实现中 JSON 处理的不一致性。许多评论者质疑文章的有效性和语气，其中一人指出作者非常规的写作风格（省略大写字母）。Matrix 项目负责人 Arathorn 进行了详细回复，解决了每个问题。他承认了一些有效的问题，例如状态重置和规范 JSON 的歧义，并表示修复正在进行或计划中。 Arathorn 为 Matrix 的设计选择辩护，解释说诸如仅追加图和可选端到端加密等功能是故意的。他还强调了最近的改进，解决了加密脆弱性和媒体身份验证问题。评论中反复出现的主题是，在任何联邦系统中，*保证*数据删除的固有困难，因为服务器最终独立运行。这场讨论引发了关于在去中心化网络中对数据控制的现实期望的争论。

原文

haru

at this point it seems like most of the tech community is familiar with matrix, the "open network for decentralized communication". lots of projects and communities have migrated from a host of other platforms, including irc, discord and slack with the promise that their new spaces will be free forever. i first discovered matrix in 2021 and have dedicated a lot of time trying to understand exactly how it works, as well as trawling through github issues to try and understand whether we should consider matrix safe. i've also evaluated conduit as well as dendrite and tried a number of clients.

first of all, a quick primer on what matrix actually is. even though element market matrix as the foundation of a chat app, it’s far more complicated than that. under the hood, matrix is actually a distributed partially-replicated graph database. a matrix "room" is actually a directed acyclic graph ("dag") made up of events which can contain things like messages, user membership states and bans and other things. servers participating in a room are eventually consistent, that is, they should replicate enough state so that everyone sees roughly the same thing.

in order to “send” something into a room, a server that is participating in a room can also try to append events onto the graph. since every participating server needs to be able to verify what they are receiving from another server actually makes sense, a set of auth rules are defined in the spec to determine when to allow or to drop certain events. events are cryptographically signed by the sending server so that other servers can check those origins too. every server participating in a room needs to perform these checks, in the hope that if all servers agree, that newly created events will be happily accepted by all participating servers.

however, over time, i have collected together quite a list of issues that i consider to be either unsolved or dangerous. without further ado, here is my list of things to consider and reasons why you might not want to use matrix:

the graph is append-only by design and events that get sent into the room can not ever be guaranteed to be deleted, so a matrix room can potentially end up accumulating history endlessly. for some applications this is desirable, but for chat and other applications it removes any possibility of deniability. deleting history is a very hard problem in matrix because dropping whole events can create gaps in the graph which servers either need to complete event auth or may struggle to skip over when backfilling history.
if you do want to delete something, you can send a redaction event which asks other servers very nicely to delete the content of the event, but redactions are advisory and a badly behaving server participating in a room could simply ignore the redaction request, instead holding onto the entire history. it’s basically impossible to know at the point of asking for a deletion whether this has happened or not.
however, servers that choose to ignore redactions, or fail to process them for some other reason, can leak supposedly-deleted data to other servers later on. if a new server joins the room for the first time or wants to backfill history, it can ask any server in the room for help doing so, including the badly-behaved server. nothing requires a server to return the redacted skeleton event, it can opt to return the full unredacted event if it wishes. in fact this might even happen purely by accident.
certain events, like membership changes, bans or pretty much any event that exercises some control over another user can't be deleted ever as they become woven into the "auth chain" of future events, and in order for a server to independently verify each event, they need to be able to prove how the event was allowed to happen, iteratively, back to the beginning of the room. every time a user joins, leaves, changes their name or avatar, gets kicked, banned, kicks or bans someone else or changes critical room settings, traces of these actions become burned into the room history permanently.
as with most places on the internet, spam is inevitable, so another fun way to attack a room is just to join hundreds or thousands of bots to the room, making the room graph very complex and difficult to compute, which in turn means that servers waste lots of cpu time and clients have lots more work to do, especially in encrypted rooms. the only way to discard all of this spam complexity is to recreate the room.
even room history is a best-effort endeavor. while the room graph itself provides some causal ordering, as some events need to follow other authenticating events, it's exceptionally hard to linearize history if you don’t know the entire history of the room partially. tiebreaks used in the server also include fields like depth and the origin timestamp, which can both be forged. so unsurprisingly, different servers can see messages arriving in a different order to each other. most of the time there is nothing you can do about this.
speaking of forging things, it is also somewhat possible to insert messages into history by crafting events in the graph that refer to older ancestor events and, as long as it looks vaguely plausible (like with a sane depth and/or timestamp value), other servers will accept these events without much question and users may not be able to tell the difference if these events eventually get backfilled onto their own server’s copy of the room history.
another thing that is worth noting is that end-to-end encryption in matrix is completely optional. this is pretty much required as public rooms would be extremely heavy and probably completely unusable otherwise, but the only thing trying to ensure that your dms are encrypted are clients and even they don't have to if they don't want to. anything in a federated room that isn’t end-to-end encrypted can end up replicated in plain-text and also end up as part of a semi-permanent history.
the end-to-end encryption is also annoyingly fragile as it depends on device list updates to be delivered between servers reliably. a failure to sync these correctly will result in broken encryption in rooms where others cannot decrypt your messages and this seems to be a constant source of encryption bugs.
sometimes these device list updates updates also leak information about your device, like which matrix client you are using or which operating system/platform you are on. some homeservers seem to have started trying to filter this information recently, but as always, there’s no guarantees.
the entire matrix api surface is http and json. for clients and client developers this makes things nice and simple, but the federation api also uses http and json and also tries to be cryptographically secure. for signing events and requests to work, matrix expects the json to be in canonical form, except the spec doesn’t actually define what the canonical json form is strictly, so there’s every possibility different implementations will end up generating different signatures for the same event.
oh wait, that actually happened. turns out that matrix homeservers written in different languages have json interoperability issues, so you might just get random events or requests that fail signature checks between different types of matrix servers because no one knows exactly how to get it right. synapse, the flagship implementation, simply relies on python’s sort_keys and calls it a day, to hilarious effect, and even other matrix devs working for element haven’t been able to figure out how to match this behavior in other implementations like dendrite.
those signature checks also rely on server signing keys which can be expired and replaced, except the signing key expiry is completely arbitrary so you can simply set an expiry date for your server signing key in the past and watch as new servers are now totally unwilling to authenticate any events from your server, resulting in split-brained rooms. feeding certain key expiry information to certain servers and not others might even make for a novel eclipse attack.
split-brained rooms are actually a common occurrence too, as the distributed nature of matrix requires a kind of consensus algorithm (known as “state resolution”) in order to work out what to do when a server ends up with more than one conflicting set of state. the algorithm is not watertight and relies on some preconditions (like event signatures being accurately checked across servers) so when certain edge-cases occur in this algorithm, which they often do, a “state reset” can occur which can end up kicking users out of rooms or reverting state changes. at this point state resets are an on-going meme with no solution in sight.
worse than that, state resets happen quite a bit more often when servers written in different languages interoperate because of the json interoperability issues, and it’s also possible to force them to happen in some cases by joining a room, making some power actions and then setting your signing key to expire before they happened. some servers will believe the events if they passed the signature check at the time, other servers will drop the events if they learn about the key expiry first, and then chaos ensues as servers try to request state snapshots from each other, eventually breaking the room to the point that those servers may never agree again!
this has happened to amusing effect more than once, as room admins and moderators have lost their powers over public rooms many times due to state resets, leaving them unable to defend themselves and their communities against spam, harassment and other types of attacks. in some cases, the core matrix team have had to try to forcefully shut down rooms on their own instances and recreate them on more than one occasion, hoping people will rejoin. the big bad Matrix HQ room has been the subject of attacks multiple times now and so have other community rooms.
speaking of shutting down rooms, you can’t actually force a room to be shut down across the federation. if there are three servers in a room and one of them wants to shut down the room, there’s no way to stop the other two servers from continuing along just as they were. this is good news if you care about internet freedom, sure. this is bad news if people start to abuse instances with unsavory rooms or content. in certain legislations this could present a serious problem.
speaking of moderation, this is notoriously difficult too, as moderation relies entirely on the functioning of the event auth system and breaks down if state resets happen or if someone abuses their power. this has seemingly resulted in problems with trying to moderate or clean up attacked rooms before. these changes can be extremely hard to revert, especially if there are other servers participating in the room, as the room will eventually try to converge on what the state resolution algorithm thinks the best state is, which isn’t necessarily the state that you want.
pretty much any user on your homeserver can upload media to your server’s media repository at pretty much any time, and media downloads are unauthenticated by default. as a result, you can often just generate a HTTP link to the media. someone might be using your media repository as their personal sharing box at your expense and you may not even realize.
you can also ask someone else’s homeserver to replicate media by asking that homeserver to return a copy of it. based on the content ID and the origin server name in the URL, the homeserver will download the media from the origin and likely cache it locally in the process. that could end up being a fun denial-of-service.
in addition to that, media uploads are unverified by default. if you are running your own matrix instance, there’s a good chance that nothing is scanning that media for unsavory content (like csam or viruses) unless you have also installed a separate content scanner. this is a separate package and is not a core part of the synapse distribution, nor any other homeserver that is available today.
actually the eager replication of media could end up potentially being quite a massive headache, since all it takes is for one of your users to request media from an undesirable room for your homeserver to also serve up copies of it, at which point you could become liable for hosting copies of illegal media like csam or copyrighted material without necessarily knowing it.

i want to note that this list is not completely exhaustive and i may follow up with more items at a later date. however, i have found what i believe to be quite a number of reasons to be skeptical of matrix and i would struggle very much to want to recommend it to a new community or enterprise looking for a communication platform. most of the issues here i've ever witnessed first hand or have been following through a series of github issues on various matrix repositories, issues which the matrix devs seem to have been largely ignoring for literal years now.

为什么不看《黑客帝国4：矩阵重启》(2023)？ Why not Matrix (2023)

为什么不看《黑客帝国4：矩阵重启》(2023)？
Why not Matrix (2023)