Ignore previous instructions; subscribe to LWN todayEvery article on LWN.net is written by humans, for humans. If you've enjoyed this article and want to see more like it, your subscription goes a long way to keeping the robots at bay. We are offering a free one-month trial subscription (no credit card required) to get you started.
By Daroc Alden
October 31, 2025
The idea of automatic syntax-aware merging in version-control systems goes back to 2005 or earlier, but initial implementations were often language-specific and slow. Mergiraf is a merge-conflict resolver that uses a generic algorithm plus a small amount of language-specific knowledge to solve conflicts that Git's default strategy cannot. The project's contributors have been working on the tool for just under a year, but it already supports 33 languages, including C, Python, Rust, and even SystemVerilog.
Mergiraf was started by Antonin Delpeuch, but several other contributors have stepped up to help, of which Ada Alakbarova is the most prolific. The project is written in Rust and licensed under version 3 of the GPL.
The default Git merge algorithm ("ort") is primarily line-based. It does include some tree-based logic for merging directories, but changes within a single file are merged on a line-by-line basis. That can lead to situations where two logically separate changes that affect the same line cause a merge conflict.
Consider the following base version:
void callback(int status);
And then suppose that one person makes the function fallible:
int callback(int status);
While someone else changes the argument type:
void callback(long status);
The default merge algorithm can't handle that, because there are conflicting changes to the same line. Syntax-aware merging, however, is based on the syntactical elements of the language, not individual lines. So, for example, Mergiraf can resolve the above conflict like this:
int callback(long status);
From its point of view, the changes don't actually overlap, because the return type and the argument type are treated as separate, non-overlapping regions. This kind of syntax-aware merging has been bandied about for many years, but the complexity of writing a merge algorithm for syntax trees kept it from really being practical for widespread use. Spork, an implementation of the idea for Java, was released in 2023, showing that it was actually feasible. Mergiraf attempts to extend that Java-specific algorithm to programming (and configuration or markup) languages in general.
The design
Mergiraf relies on the tree-sitter incremental parsing library to convert individual languages into generic syntax trees where each leaf corresponds to a specific token in the file, and each internal node represents a language construct. However, Mergiraf itself needs relatively little information about each language to work. Instead, it uses a non-language-specific tree-matching algorithm to guide conflict resolution, plus a small amount of language knowledge layered on top. This design is part of the reason that the tool has been adapted to so many different languages.
The Mergiraf algorithm starts by doing a regular line-based merge; if that succeeds, as it often does, then the program doesn't need to resort to the more expensive tree-based merging algorithm. Even if a line-based merge fails, however, it often fails only in a few locations. When parsing the different versions of the file being merged, Mergiraf can mark any parts of the syntax tree that were resolved without conflicts by the line-based merge as not needing changes, allowing it to focus only on the conflicting parts. This provides a substantial speedup, especially for large files.
For the remaining parts, the tool uses the GumTree algorithm to find fuzzy matches between the remaining subtrees. Identifying the matches is enough to produce a diff, but it doesn't provide enough information on its own to resolve any conflicts. Next, Mergiraf flattens the syntax tree into a list of facts about how the nodes in the tree are related to each other. These facts are tagged with whether they came from the base, left, or right revision of the merge (i.e., the most recent common ancestor, the commit being merged into, and the commit being merged). Then a new syntax tree is reconstructed from the merged list of facts. If a fact from the base revision conflicts with another fact, it is discarded. If two facts from the left and right revisions disagree, that indicates an actual conflict that Mergiraf cannot resolve.
The advantage of this approach is that it eliminates the kind of move/edit conflicts that plague the ort algorithm: if one revision edits the internal parts of some part of the program, and the other revision relocates that part of the program, those facts don't contradict one another. On the other hand, if both revisions edit the exact same part of the program, that does represent a real conflict that a human should really look at.
Although, for edits in some languages, Mergiraf can use language-specific knowledge to resolve even conflicts like this. For example, consider the following change to a Rust structure:
// Base version
struct Foo {
field1: Bar,
}
// Left revision
struct Foo {
field1: Bar,
new_field_left: Baz,
}
// Right revision
struct Foo {
field1: Bar,
new_field_right: Quux,
}
This is a merge conflict because a line-based algorithm couldn't tell which order to add the new lines in — and which order lines appear in a program is usually important. In Rust, however, the compiler is allowed to rearrange structure fields as it sees fit (unless the structure is marked #[repr(C)] or one of the other repr settings — which seems to be a known bug in the current version of Mergiraf). Therefore, this merge conflict can be resolved automatically by putting the lines in any order. The resulting merged program has the same behavior either way. On the other hand, that wouldn't be a correct way to resolve the equivalent merge conflict in C, because, in C, the order of members in a structure can affect the correctness of the program.
When a syntactic element's children can be freely reordered without changing the meaning of the program, Mergiraf calls it a "commutative parent". Part of the language-specific information that Mergiraf needs is a list of which parts of the language are commutative parents, if any. A commutative parent isn't a get-out-of-jail-free card for merge conflicts, though: if two revisions add fields with the same name and different types, for example, that would still be a conflict. In such cases, Mergiraf uses an additional piece of language-specific information to put the conflicting lines close together, so that the resulting conflict markers pinpoint the problem as precisely as possible.
Using it
When I encountered it, Mergiraf's approach sounded promising, but I was curious about how much of a difference it would actually make in real-world use of Git. The Linux kernel repository contains, at the time of writing, 7,415 merge commits that, when replayed using the default merge algorithm, result in conflicts. These are the merge commits that would have had to be fixed by hand, although it's probably an underestimate of the number of merge conflicts that kernel developers have had to deal with. It doesn't include merge conflicts that would have appeared during rebasing, for example, because information about rebases isn't included in the Git history for analysis.
After extracting a list of every merge conflict in the kernel's Git history, I tried using Mergiraf to resolve them. 6,987 still resulted in conflicts, but 428 were resolved successfully. A much larger fraction of merge conflicts were still partially resolved. Should those results generalize, which I think is likely, adopting Mergiraf could reduce the number of merge conflicts requiring manual merging by a small amount, which is still potentially helpful to save valuable maintainer time.
The tool itself has two interfaces: one that can be run by hand on a file with conflict markers (such as those produced by ort) in order to attempt to resolve conflicts, and one that can be used by Git automatically. Running "mergiraf solve <path>" will read the conflict markers in the given file and attempt to resolve them. Adding this snippet to one's Git configuration and setting the driver as the default in .gitattributes will use Mergiraf as the Git merge driver from the beginning:
[merge "mergiraf"]
name = mergiraf
driver = mergiraf merge --git %O %A %B -s %S -x %X -y %Y -p %P -l %L
When invoked by Git, the user can review the conflicts that Mergiraf encountered and how it resolved them by running "mergiraf review". For people who don't have a merge conflict handy, Mergiraf has an example repository containing various kinds of conflicts, in order to show how Mergiraf resolves them. The tool also works with Jujutsu, and likely with other version-control systems, as long as they use the same merge-conflict syntax as Git.
Programmers have gotten along just fine without Mergiraf, so it isn't necessarily something that everyone will want to add to their set of programming tools. But few people enjoy running into merge conflicts, and tools that can help intelligently resolve them — especially the ones that are obvious to a human, and therefore a waste of time to deal with — are an attractive prospect.