LSP：好的、坏的和丑陋的

LSP：好的、坏的和丑陋的
LSP: The good, the bad, and the ugly

原始链接: https://www.michaelpj.com/blog/2024/09/03/lsp-good-bad-ugly.html

事实证明，用于软件开发中编辑工具的轻量级通信协议（LSP）可以成功地为开发人员以低成本为各种编辑器提供集成开发环境（IDE）功能。对 LSP 的批评主要涉及其设计，特别是关注表示与语义、向后兼容性、开放性以及并发和因果关系问题的处理。 LSP 强调表示方面，而不是底层语义结构，通常有效，尽管随着时间的推移，一些语义元素已经慢慢渗透。这种差异的示例包括“类型层次结构”请求的目的以及完成列表中伴随条目的某些语义标签，例如指示折旧。 LSP 内的向后兼容性对用户来说是有利的，因为用户严格遵守它，甚至允许较旧或维护较少的语言服务器或编辑器正常运行。用于表示服务器和客户端支持的内容的“能力”模型被认为是可接受的。然而，在配置更改等情况下，会出现有关向后兼容性的批评，其中初始配置调整是通过工作区/didChangeConfiguration 通知进行的，后来引入了动态注册要求，这给旧服务器带来了困难。尽管 LSP 有其特殊性，但由于其机器可读的格式，LSP 的类型和方法规范受到赞赏。此外，生成这些类型（包括它们的序列化）的能力极大地简化了库的维护过程。动态注册被认为是有益的，它使服务器能够在运行时改变其功能，并在某些功能由于配置更改而被禁用时相应地通知客户端。尽管实施过程往往繁琐且复杂，但其核心概念被认为是合理的。尽管不是一个公开管理的项目，LSP 仍然对开源社区发挥着重要作用。然而，缺乏开放性会带来挑战，例如外部贡献的机会有限以及实施者之间缺乏合作。批评性评估指出，LSP 难以充分解决并发问题，在处理同时运行的多个任务而不影响整个系统稳定性方面仍有改进的空间。此外，LSP 在保持因果关系、确保响应用户命令的操作的正确顺序和一致性方面面临挑战。在状态管理方面，LSP 采用了多种为同步目的而设计的机制。这些机制包含诸如推送之类的功能

语言服务器 (LS) 提供了一种通用、灵活的协议，用于将各种编程语言集成到代码编辑器中。使用 LS，开发人员可以编写小型扩展，通过标准输入/输出 (StdIO) 与语言服务器进行通信。更复杂的场景包括扩展中的独立语言服务器、在单独的计算机或容器上托管语言服务，或者通过服务工作人员或与远程服务器的 TCP 连接在浏览器中运行编辑器。由于LS独立于传输层，因此可以跨不同场景提供无缝支持。虽然 LS 被认为是“开放的”，但当它缺乏官方管理机构或与其旨在服务的社区合作时，就会出现问题。微软作为市场的重要参与者，对 LS 的开发和使用具有相当大的影响力。为了解决这个问题，LS 成立一个标准委员会或联盟似乎是确保广泛的行业共识和合作的合理步骤。尽管有其局限性，LS 仍代表着未来进步的基础。构建系统的集成、改进的调试功能和简化的内存管理技术表明了增长和改进的机会。此外，为增量第一原则设计的编译器可以促进 IDE 和代码编辑软件领域的生产力和用户体验的提高。最后，虽然持续改进 LS 的现状至关重要，但是否会很快组建正式的管理机构或联盟仍不确定。此外，JetBrains 等公司的持续努力或开源倡议等协作计划表明了对开发和推广包容性和可访问技术的承诺，为编码环境的更光明的未来铺平了道路。

For a few years now I have been working on the Haskell Language Server (HLS), and the lsp library for the LSP protocol and writing LSP servers. Unsurprisingly, I have developed some opinions about the design of the LSP!

Recently I gave a talk about HLS and LSP at the Haskell Ecosystem Workshop at Zurihac 2024. One slide featured a hastily-written table of “LSP: the good, the bad, and the ugly”. As I gave the talk I realised that there was plenty to say on that topic, hence this post.

Most of what I have to say is about the architecture or design of the protocol. I won’t have much to say about the features that the protocol supports. Other people probably have a lot to say about that (e.g the folks working on languages that use heavy editor integration, like interactive theorem provers). My perspective here is from my time implementing LSP servers, rather than my time using them.

I will repeat this a few times, but I want to be very clear that LSP is great and I am very happy that it exists. While this is going to be a mostly critical post, it is criticism that exists in the context of me being happy to be working on editor tooling that is going to Just Work for a wide spectrum of users!

Finally, I want to also mention the excellent post LSP could have been better, which is the best critical writing that I’ve read on LSP, and which inspired several of the points I’m going to make.

It addresses the problem!

The most important things about the LSP are:

It exists
It is omnipresent
It has a decent feature set
It works well enough

That is, it actually succeeds in significantly addressing the problem of providing IDE tooling to a wide variety of editors at much lower cost to tooling developers. This is huge, and not to be under-appreciated! It is now awful to remember the situation even a few years ago, where most open-source editors had poor and inconsistent support for most programming languages. Now someone can write a new editor and, with a bit of work on a LSP client, come out with best-in-class programming language support. Amazing!

Focus on presentation over semantics

As Alex says, it’s a great choice for the LSP to focus on presentation, i.e. the things that actually appear in the editor, rather than the semantic structure of the program (which is wildly different from language to language).

The presentation-first approach works pretty well, although some semantic elements have crept in over time. For example:

There is a “type hierarchy” request. Is this just a widget that represents a tree of arbitrary stuff that you can consider “types”? Or is there some implication that the relationship that the tree shows should be subtyping, making it a bit more specific to languages with inheritance? Unclear.
There are various tags that indicate the nature of entries in e.g. completion lists, and these are usually semantic rather than presentational. For example, a completion item is tagged as deprecated, rather than being tagged as non-emphasized or similar.

It’s awkward, since obviously the appropriate editor widgets for a IDE protocol will make some references to programming language constructs! But I think it’s a good direction, I wish they’d committed even more to it.

Backwards compatibility

Perhaps because it is a Microsoft project, the LSP has always hewed to pretty strict backwards compatibility. This is great for users! It means that even older or less-maintained language servers or editors continue to pretty much just work, which is a real blessing. I even think the “capability” model that they chose to indicate what servers and clients do support is fine.

Occasionally backwards compatibility is not handled well. Take for example the messy situation with configuration:

Initially, configuration was pushed from the client to the server using the workspace/didChangeConfiguration notification
Then, they added the ability for the server to pull configuration using workspace/configuration
In order to keep receiving change notifications, you now have to dynamically register for workspace/didChangeConfiguration
This broke old servers, which were not dynamically registering because they didn’t have to before

However, I think it’s pretty remarkable that this is the only real backwards compatibility break I know of!

Machine-readable specification of types

Thank all that is holy^{that there is a machine-readable specification of the LSP types and methods.}

Is it a bit weird? Yes! Is it written in their own home-rolled format? Yes! Do I care? No!

The LSP is massive. The Haskell implementation of the protocol, which I maintain, used to have all of those types and their serializations defined by hand. This was awful, tedious, and error-prone (especially given the weirdness of the types). It took me quite a long time, but this is now all generated, which has removed 90% of the toil from maintaining that library, and nearly eliminated bugs relating to the JSON serialization of types.

Dynamic registration

I’m just going to briefly disagree with Alex here. Dynamic registration is good, actually. The reason is that the LSP supports changing configuration at runtime, and that means that the server’s capabilities can change at runtime. If the user un-checks “semantic tokens” in their configuration, then the server really wants to say to the client “I can’t do semantic tokens any more!”. Otherwise the client will keep asking, and the server has to either return empty data or errors, neither of which is quite right.

It’s implemented messily^{and is a pain to work with, but I think there’s a fundamentally good idea there.}

(I know, it’s supposed to be “The bad” next, but I wanted to talk about the really interesting stuff first!)

Not a truly open project

Given how crucial LSP has become to the open-source community, you would hope that the project itself was an open one. Sadly this is not at all the case.

The LSP specification has, as far as I can tell, one committer, Dirk Bäumer, who works for Microsoft (I assume on the VSCode team).^{There have been many small contributions by outsiders, but nobody else has commit access.}

Major changes to the spec are driven by internal forces inside Microsoft. For example, the latest version of the spec adds a bunch of new content for supporting notebooks. That doesn’t look to me like something the community was particularly asking for, but I guess some PM inside Microsoft decided they wanted VSCode to support notebooks, so now it’s in the spec.

There is zero open discussion of features before they are added to the spec. Typically they are implemented in VSCode, and then the specification is updated as a fait accompli to document those changes. Implementers of open-source language servers get very influence on the development of the specification.^{There is not even a community space for implementers of language servers to get together and talk about the many tricky corners.}

Another consequence of the lack of openness is that there is no forum for agreeing on extensions to the somewhat arbitrary enumerations that the LSP specification has for things like symbol types. In theory the client and the server can agree on what types they support, and then use those. But the way it ususally works with other standards is that there is a well known set of identifiers that is agreed upon outside the main specification process. What happens in the LSP world is that we have no way of agreeing at all, so in practice the set of symbol types that gets used are exactly the ones that are in the spec.

This is not really good enough for such an important project, in my opinion. The LSP should be an open standard, like HTTP, with an open committee that represents the large community which is invested in LSP, and can offer their insight in how to evolve it.

Non-acknowledgment of concurrency

Here’s what the specification has to say about concurrency:

Responses to requests should be sent in roughly the same order as the requests appear on the server or client side. …

However, the server may decide to use a parallel execution strategy and may wish to return responses in a different order than the requests were received. The server may do so as long as this reordering doesn’t affect the correctness of the responses. …

This pretty much amounts to “yeah, you’ll want to use concurrency, but if something weird happens that’s your problem”. That’s a pretty disappointing attitude. Working out a way to make everything make sense at the protocol level in the face of concurrency is hard, but it’s really necessary.

In particular, it’s somewhat disingenuous to suggest that concurrent server processing is an unusual approach when the specification itself simply cannot work without it.^{For example:}

Requests cannot be cancelled unless the server can handle the cancellation request concurrently with processing the original request.
Progress tracking cannot work unless the server can send notifications (and in the case of window/workDoneProgress, send and handle responses to requests!) concurrently with processing a request.

Missing causality

As Alex points out, the LSP has a problem accounting for causality. In particular, both failure and asynchronous processing lead to situations where we may not be sure of the ordering of events.

Consider:

The client sends the server a document change notification for document D.
The server updates its internal state (e.g. compilation results) to account for the change to D.
The client requests code actions for D, and the server responds.

The question is: when does 2 happen in relation to 3?

If the server fails to apply the change entirely, then 2 may not ever happen.
If the server processes the change asynchronously but responds to the code action request before it finishes, then 2 may happen after 3.

So the client really has no idea whether or not the results it is getting are up-to-date or not. This matters most for applying text edits, which we will discuss shortly, but it’s a general problem. Contra Alex, I don’t think it’s enough to just avoid retain the causality that you get from message sequencing. If we expect the server to process requests asynchronously, then we are inevitably going to lose this ordering, and we need something stronger.

State synchronization

A lot of the core operations of the LSP are state synchronization processes. That is, one of the parties (server or client) has some state, and they want to keep the other party updated about what the state is. Usually this is uni-directional – meaning that one side is the source of truth, and the other side just needs to be updated – but sometimes it is (or could be) bi-directional, meaning that both sides can change the state.

Here’s a big table of the features in the LSP that I think are secretly just state synchronization:

Feature	Direction	Dependencies	Push/pull	Filtering
Configuration	C to S		Push (old), pull (new)
Text documents	C to S	Config	Push
Diagnostics	S to C	Config, documents	Push (old), pull (new)	(pull)
Symbols	S to C	Config, documents	Pull
Semantic tokens	S to C	Config, documents	Pull
Progress	S to C		Push
Code actions	S to C	Config, documents	Pull
Code lenses	S to C	Config, documents	Pull
Inlay hints	S to C	Config, documents	Pull
Inline values	S to C	Config, documents	Pull
Document link	S to C	Config, documents	Pull
Document highlight	S to C	Config, documents	Pull
Document colour	S to C	Config, documents	Pull

Let’s talk a little about each of these columns.

Direction means “which direction do updates go?”. If the server is the source of truth, then updates flow from server to client.

The LSP very much has state on both sides of the protocol. Fortunately, it is almost always synchronized in one direction only. The one exception to this is text document contents, because the server has the ability to change the state of text documents though workspace/applyEdit! This is quite interesting, and causes causality problems: the server needs to track document versions when it sends applyEdit messages so that the client knows whether they apply to its version of the state. Perhaps this ad-hoc version tracking is enough, and we can just tag it on to a primarily uni-directional synchronization. But possibly this indicates that we should be looking at text document synchronization as a truly bi-directional synchronization problem.

Dependencies lists other pieces of state which this state depends on. The state of the diagnostics managed by the server depends on the state of the text documents managed by the client. A change to the text document state may invalidate the diagnostic state, and to interpret the diagnostic state you need to know what text document state it is based on.

Dependencies complicate the causality story significantly. I don’t know how you handle this gracefully, but I’m pretty sure that you need to.

Push/pull indicates whether updates are pushed from the producer to the consumer, or pulled from the consumer to the producer.

Both methods have advantages and disadvantages:

Push
- Producer can ensure that the consumer gets up-to-date information promptly
- Producer can send updates as soon as they have computed them
- Consumer may receive updates frequently or while it is doing something else
Pull
- Consumer can avoid dealing with updates when they don’t care about them
- Consumer must take responsibility for ensuring they are up to date
- Producer may need to compute updates for the client at any time

Over time the LSP spec has moved towards having the client be in control (i.e. push client state to the server, pull server state from the server).^{But in general it makes sense to use either method for any given kind of state.}

Incremental indicates whether there is support for sending updates that only include what has changed since the previous update. This is obviously useful when updates are large. Unsurprisingly, the two features that support incremental updates are the ones that involve transferring lots of data: text document contents and semantic tokens.

However, incremental updates are in principle useful for almost any kind of state, if the state gets big enough.

Filtering indicates whether or not the synchronized state can be filtered to a subset. Often this is done using a document and range selector to only get the state in that visible region.

Filtering is a natural way to reduce the amount of data being sent. If you don’t need the diagnostics for the whole project, then you don’t have to send (and process) the diagnostics for the whole project. Filtering works naturally with a pull-based model (since you can specify the filter when you pull), but can also work perfectly well in a push-based model: the consumer just needs to keep the producer updated about what subset of the state it is interested in.

Invalidation indicates whether the producer has the means to tell the consumer to invalidate any cached state it has and re-request it. Invalidation is mostly necessary in a pull-based model, since in a push-based model the producer can usually just promptly tell the consumer what has changed. In a pull-based model, the producer needs to be able to push a notification that tells the consumer that they can’t keep using the state they currently have and must re-sync.

Whither state synchronization?

Okay, that was a lot of dimensions to consider! There are a whole bunch of problems here:

The implementations of state synchronization are inconsistent between different features.
- Pretty much every single entry in this table is implemented completely differently.
- Compare how delta updates are encoded in WorkspaceEdit versus SemanticTokensDelta, and how they are used!
The feature sets are inconsistent.
- Incrementality is only implemented for text document contents and semantic tokens, if you want it for a different state, you’re out of luck.
Many methods are required.
- In the JSON-RPC world we need a bunch of requests for each feature in order to handle the different things we want to do.
- Semantic tokens needs 4!
Dependency tracking is ad-hoc or unimplemented.
- With a few exceptions (text document versions), information about state dependencies is lost.

The other lesson is that the problem is quite complex. There are many things we might want to do, and it’s not easy to fit them all together. As usual, I don’t fault the LSP designers here: the complexity clearly emerged over time, and it’s not that surprising that they didn’t manage to design ahead of it. But with the benefit of hindsight, I think we could do better.

Specifically, I think we could have a generic state synchronization protocol as part of the LSP that would allow synchronizing many different kinds of state, and support all of the operations listed above. Then server and client implementers could implement it once, and use it for everything. While I’m not an expert and I wouldn’t want to have to draft such a thing myself^{, state synchronization is a well-studied problem in the academic literature, so we should be able to benefit from a lot of prior art.}

This is just stuff that’s kind of annoying but not a huge fundamental problem. There’s a lot of it, though.

Massive specification

The LSP specification is big. Really big. Last time I checked it had 90 (!) methods and 407 (!!) types. Printing it to a PDF gives you 285 pages (!!!).

This just makes it hard to understand and implement. Now I’m not necessarily saying that there should be fewer features in the spec, but I do believe that what is there could be significantly simplified (see for example the discussion of state synchronization). But it seems unlikely that we are going to get simplification, and instead we will just get an ever-increasing long tail of features.

Backwards compatibility

Didn’t I just list this under the good features? I did, but it’s a double-edged sword. Being backwards compatible means keeping old features and behaviours in the spec. This imposes a cost on implementers because they need to understand and support all variants of behaviour, or risk old language servers not working.

There is no clean solution to this. I think the best approach is to continue trying hard to keep backwards compatibility, and then occasionally do a large break to a new “major version” that is very noticeably different. Of course, this also has costs.

Weird types

Here is the definition of the InitializeParams.workspaceFolders field:

workspaceFolders?: WorkspaceFolder[] | null;

There are no fewer than three empty states here:

The field is absent
The field is present, and the value is the empty list
The field is present, and the value is null

What is the difference between these? Why do we have all of them? How should servers interpret them?

Certainly the spec needs to tolerate missing fields in many cases for backwards compatibility reasons: a server that does not support a feature will not send messages with empty lists, it will send messages with missing fields. But this could be handled uniformly and strictly: such fields should be missing iff the server/client states that it does not support that feature.

A lot of this is just the Typescript origin of the LSP leaking out, with it being common to allow null in lots of places it doesn’t need to be. At the very least, the specification should say what the different cases mean, or if it’s okay to treat them equivalently.

This combines badly with the relics left by backwards compatibility. It can be hard to tell if a type is just strange, or whether it is the union of an old form and a new form, which both need to be supported (and are maybe equivalent or maybe not).

Specification is imprecise and inconsistent

The LSP specification is just not very tightly written. It leaves a lot unspecified, which is a real problem.

Importantly, while I earlier praised the LSP for focussing on presentation… the specification usually does not actually specify the presentation.

Consider “code lenses”. The specification for textDocument/codeLens says:

A code lens represents a command that should be shown along with source text, like the number of references, a way to run tests, etc.

What does that mean? “A command that should be shown along with source text”, shown where?

In the absence of clear direction about how a presentation feature should be implemented, most people turn to the de facto reference implementation: VSCode. VSCode implements code lenses by rendering them inline in the buffer, triggerable by clicking.

However, since the specification doesn’t actually say where the code lens is supposed to be displayed, implementations can differ. Emacs’ lsp-mode plugin has an option to display code lenses at the end of lines. This results in odd behaviour for servers that erroneously assumed that VSCode’s implementation was normative.

While that example is arguably not the fault of the specification (it wasn’t offering normative guidance, but I think it could have been clear about that!), the spec is riddled with details that clearly are intended to affect presentation, but it is unclear how.

For example:

CompletionItems can have detail, documentation, labelDetails.detail, and labelDetails.description. I challenge you to work out what the effect of setting these various fields is intended to be without trying it out in VSCode.
InlayHints can have a paddingLeft boolean field, but it is not specified how much padding to insert, or what the goal of the padding is.

Also, to be a bit petty, there are quite a few small but annoying errors of the sort that I feel would really have been caught if there was more that one person looking at the changes. This one tripped me up recently: server capability fields are usually suffixed with “provider”. But there is exactly one client capability field that is suffixed with “provider”, probably just by mistake: colorProvider.

Configuration model

It’s a particularly big mess. For some reason they have been reluctant to specify what the configuration methods are actually supposed to do, which led to a lot of confusion. The configuration model is actually very simple (basically just JSON blobs that you can fetch by path prefixes), they just really need to write it down.

Text encoding

Much ink has been spilled over this already. I don’t have much to add: UTF-16 was a bad choice driven by Windows, it should just have been unicode code points from the start.

Impoverished interaction model

If you want to go outside what the LSP has built in, then you pretty much have to do it by offering code actions.^{But the interaction model for code actions is very basic: the user triggers them, and then they do something.
In particular, you can’t really do the kind of multi-step operations that we’re used to from fancy IDEs in the past, or even something as basic as telling the user what you’re going to do and asking them to confirm before doing it.}

Even the built-in refactorings have pretty simplistic interaction models, as Alex points out.

JSON-RPC

JSON-RPC is… okay. It’s not the best transport layer, but it’s pretty simple to implement correctly and once you’ve done that once you’re done with it.

The main problem with JSON-RPC is that it enables other problems:

The presence of unacknowledged notifications encourages loss of causality
The fact that some fields can be omitted is just annoying and not used in practice.

It wouldn’t be my choice but I don’t hate it that much.

Realistically, most of the complaints I have are problems for developers of language servers and clients, which is a comparatively small population compared to the number of people who use those tools. So I don’t think it’s really a good idea to do a big re-engineering of the protocol just to make it easier for implementers… and even if we did, a big new protocol version would make things harder for implementers in the short term! Hence I don’t think there’s a good case for a big LSP 2.0, unless it came bundled with some significant improvements for users.

What I would like is for the LSP to transition to a truly open model. I have no idea how that would come about and I don’t have the zeal to pursue it, but if it’s something you’re interested in, maybe drop me a line.

See discussion on Hacker News