The thing about how merges are presented seems orthogonal to how to represent history. I also hate the default in git, but that is why I just use p4merge as a merge tool and get a proper 4-pane merge tool (left, right, common base, merged result) which shows everything needed to figure out why there is a conflict and how to resolve it. I don't understand why you need to switch out the VCS to fix that issue.
crote 41 minutes ago [-]
Seconding the use of p4merge for easy-to-use three-pane merging. Just like most other issues with Git, if your merges are painful it's probably due to terrible native UX design - not due to anything conceptually wrong with Git.
TacticalCoder 34 minutes ago [-]
Thirding it except I do it from Emacs. Three side-by-side pane with left / common ancestor / right and then below the merge result. By default it's not like that but then it's Emacs so anything is doable. I hacked some elisp code a great many years ago and I've been using it ever since.
No matter the tool, merges should always be presented like that. It's the only presentation that makes sense.
jwr 13 minutes ago [-]
Isn't that what ediff does?
MarsIronPI 22 minutes ago [-]
What tool do you use? Does Magit support it natively?
skydhash 11 minutes ago [-]
I think you need to enable 3 way merge by default in git's configuration, and both smerge (minor mode for solving conflicts) and ediff (major mode that encompass diff and patch) will pick it up. In the case of the latter you will have 4 panes, one for version A, another for version B, a third for the result C, and the last is the common ancestor of A and B.
radarsat1 1 hours ago [-]
Is it a good thing to have merges that never fail? Often a merge failure indicates a semantic conflict, not just "two changes in the same place". You want to be aware of and forced to manually deal with such cases.
I assume the proposed system addresses it somehow but I don't see it in my quick read of this.
conradludgate 3 minutes ago [-]
My understanding of the way this is presented is that merges don't _block_ the workflow. In git, a merge conflict is a failure to merge, but in this idea a merge conflict is still present but the merge still succeeds. You can commit with conflicts unresolved. This allows you to defer conflict resolution to later. I believe jj does this as well?
Technically you could include conflict markers in your commits but I don't think people like that very much
jwilliams 8 minutes ago [-]
Indeed. And plenty of successful merges end up with code that won't compile.
FWIW I've struggled to get AI tools to handle merge conflicts well (especially rebase) for the same underlying reason.
layer8 6 minutes ago [-]
Code not compiling is still the good case, because you’ll notice before deployment. The dangerous cases are when it does compile.
gojomo 35 minutes ago [-]
Should you be counting on confusion of an underpowered text-merge to catch such problems?
It'll fire on merge issues that aren't code problems under a smarter merge, while also missing all the things that merge OK but introduce deeper issues.
Post-merge syntax checks are better for that purpose.
And imminently: agent-based sanity-checks of preserved intent – operating on a logically-whole result file, without merge-tool cruft. Perhaps at higher intensity when line-overlaps – or even more-meaningful hints of cross-purposes – are present.
skydhash 3 minutes ago [-]
> It'll fire on merge issues that aren't code problems under a smarter merge, while also missing all the things that merge OK but introduce deeper issues.
That has not been my experience at all. The changes you introduced is your responsibility. If you synchronizes your working tree to the source of truth, you need to evaluate your patch again whether it introduces conflict or not. In this case a conflict is a nice signal to know where someone has interacted with files you've touched and possibly change their semantics. The pros are substantial, and it's quite easy to resolve conflicts that's only due to syntastic changes (whitespace, formatting, equivalent statement,...)
hungryhobbit 50 minutes ago [-]
They address this; it's not that they don't fail, in practice...
the key insight is that changes should be flagged as conflicting when they touch each other, giving you informative conflict presentation on top of a system which never actually fails.
recursivecaveat 50 minutes ago [-]
It says that merges that involve overlap get flagged to the user. I don't think that's much more than a defaults difference to git really. You could have a version of git that just warns on conflict and blindly concats the sides.
rectang 38 minutes ago [-]
I agree. Nevertheless I wonder if this approach can help with certain other places where Git sometimes struggles, such as whether or not two commits which have identical diffs but different parents should be considered equivalent.
In the general case, such commits cannot be considered the same — consider a commit which flips a boolean that one branch had flipped in another file. But there are common cases where the commits should be considered equivalent, such as many rebased branches. Can the CRDT approach help with e.g. deciding that `git branch -d BRANCH` should succeed when a rebased version of BRANCH has been merged?
dfhvneoieno 3 minutes ago [-]
[dead]
mikey-k 1 hours ago [-]
This
a-dub 3 minutes ago [-]
doesn't the side by side view in github diff solve this?
conflict free merging sounds cool, but doesn't that just mean that that a human review step is replaced by "changes become intervals rather than collections of lines" and "last set of intervals always wins"? seems like it makes sense when the conflicts are resolved instantaneously during live editing but does it still make sense with one shot code merges over long intervals of time? today's systems are "get the patch right" and then "get the merge right"... can automatic intervalization be trusted?
lasgawe 4 minutes ago [-]
This is a really interesting and well thought out idea, especially the way it turns conflicts into something informative instead of blocking. The improved conflict display alone makes it much easier to understand what actually happened. I think using CRDTs to guarantee merges always succeed while still keeping useful history feels like a strong direction for version control. Looks like a solid concept!
simonw 1 hours ago [-]
This thing is really short. https://github.com/bramcohen/manyana/blob/main/manyana.py is 473 lines of dependency-free Python (that file only imports difflib, itertools and inspect) and of that ~240 lines are implementation and the rest are tests.
zahlman 2 minutes ago [-]
It's really impressive what can be done in a few hundred lines of well-thought-out Python without resorting to brutal hacks. People complain about left-pad incidents etc. in the JS world but I honestly feel like the Python ecosystem could do with more, smaller packages on balance. They just have to be put forward by responsible people who aren't trying to make a point or inflate artificial metrics.
bos 2 hours ago [-]
This is sort of a revival and elaboration of some of Bram’s ideas from Codeville, an earlier effort that dates back to the early 2000s Cambrian explosion of DVCS.
Codeville also used a weave for storage and merge, a concept that originated with SCCS (and thence into Teamware and BitKeeper).
Codeville predates the introduction of CRDTs by almost a decade, and at least on the face of it the two concepts seem like a natural fit.
It was always kind of difficult to argue that weaves produced unambiguously better merge results (and more limited conflicts) than the more heuristically driven approaches of git, Mercurial, et al, because the edit histories required to produce test cases were difficult (at least for me) to reason about.
I like that Bram hasn’t let go of the problem, and is still trying out new ideas in the space.
ZoomZoomZoom 1 hours ago [-]
The key insight in the third sentence?
> ... CRDTs for version control, which is long overdue but hasn’t happened yet
Pijul happened and it has hundreds - perhaps thousands - of hours of real expert developer's toil put in it.
Not that Bram is not one of those, but the post reads like you all know what.
... and of course it is, because Pijul uses Pijul for development, not Git and GitHub!
idoubtit 57 minutes ago [-]
The canonical website is https://pijul.org. The homepage has a link to the pijul source repository.
codethief 53 minutes ago [-]
> I hadn't heard of Pijul
I'm surprised! Pijul has been discussed here on HN many, many times. My impression is that many people here were hoping that Pijul might eventually become a serious Git contender but these days people seem to be more excited about Jujutsu, likely because migration is much easier.
jedberg 1 minutes ago [-]
I too am here all the time and have never heard of it. But it looks interesting.
Interesting idea. While conflicts can be improved, I personally don't see it as a critical challenge with VCS.
What I do think is the critical challenge (particularly with Git) is scalability.
Size of repository & rate of change of repositories are starting to push limits of git, and I think this needs revisited across the server, client & wire protocols.
What exactly, I don't know. :). But I do know that in my current role (mid-size well-known tech company) is hitting these limits today.
rectang 32 minutes ago [-]
[dead]
lemonwaterlime 40 minutes ago [-]
See vim-mergetool[1]. I use it to manage merge conflicts and it's quite intuitive. I've resolved conflicts that other people didn't even want to touch.
I think something like this needs to be born out of analysis of gradations of scales of teams using version control systems.
- What kind of problems do 1 person, 10 person, 100 person, 1k (etc) teams really run into with managing merge conflicts?
- What do teams of 1, 10, 100, 1k, etc care the most about?
- How does the modern "agent explosion" potentially affect this?
For example, my experience working in the 1-100 regime tells me that, for the most part, the kind of merge conflict being presented here is resolved by assigning subtrees of code to specific teams. For the large part, merge conflicts don't happen, because teams coordinate (in sprints) to make orthogonal changes, and long-running stale branches are discouraged.
However, if we start to mix in agents, a 100 person team could quickly jump into a 1000 person team, esp if each person is using subagents making micro commits.
It's an interesting idea definitely, but without real-world data, it kind of feels like this is just delivering a solution without a clear problem to assign it to. Like, yes merge-conflicts are a bummer, but they happen infrequently enough that it doesn't break your heart.
CuriouslyC 34 minutes ago [-]
Team scale doesn't tend to impact this that much, since as teams grow they naturally specialize in parts of the codebase. Shared libs can be hotspots, I've heard horror stories at large orgs about this sort of thing, though usually those shared libs have strong gatekeeping that makes the problem more one of functionality living where it shouldn't to avoid gatekeeping than a shared lib blowing up due to bad change set merges.
mentalgear 23 minutes ago [-]
> [CRDT] This means merges don’t need to find a common ancestor or traverse the DAG. Two states go in, one state comes out, and it’s always correct.
Well, isn't that what the CRDT does in its own data structure ?
Also keep in mind that syntactic correctness doesn't mean functional correctness.
logicprog 1 hours ago [-]
This seems like an excellent idea. I'm sure a lot of us have been idly wondering why CRDTs aren't used for VCS for some time, so it's really cool to see someone take a stab at it! We really do need an improvement over git; the question is how to overcome network effects.
vishvananda 1 hours ago [-]
This is actually a very interesting moment to potentially overcome network effects, because more and more code is going to be written by agents. If a crdt approach is measurably better for merging by agent swarms then there is incentive to make the switch. It also much easier to get an agent to change its workflow than a human. The only tricky part is how much git usage is in the training set so some careful thought would need to be given to create a compatibility layer in the tooling to help agents along.
NetOpWibby 1 hours ago [-]
Overcoming network effects cannot be the goal; otherwise, work will never get done.
The goal should be to build a full spec and then build a code forge and ecosystem around this. If it’s truly great, adoption will come. Microsoft doing a terrible job with GitHub is great for new solutions.
righthand 1 hours ago [-]
Well over half of all people can’t tell you the difference between git and Github. The latter being owned by a corporation that needs the network effect to keep existing.
jFriedensreich 56 minutes ago [-]
starts with “based on the fundamentally sound approach of using CRDTs for version control”. How on earth is crdt a sound base for a version control system? This makes no sense fundamentally, you need to reach a consistent state that is what you intended not what some crdt decided and jj shows you can do that also without blocking on merges but with first level conflicts that need to be resolved. ai and language aware merge drivers are helping so much here i really wonder if the world these “replace version control” projects were made for still exists at all.
miloignis 41 minutes ago [-]
The rest of the article shows exactly how a CRDT is a sound base for a version control system, with "conflicts" and all.
skydhash 16 minutes ago [-]
But the presentation does not show how it resolves conflicts. For the first example, Git has the 3 way-merge that shows the same kind of info. And a conflict is not only to show that two people have worked on a file. More often than not, it highlight a semantic changes that happened differently in two instances and it's a nice signal to pay attention to this area. But a lot of people takes merge conflicts as some kind of nuisance that prevents them from doing their job (more often due to the opinion that their version is the only good one).
WCSTombs 25 minutes ago [-]
For the conflicts, note that in Git you can do
git config --global merge.conflictstyle diff3
to get something like what is shown in the article.
phtrivier 17 minutes ago [-]
A suggestion : is there any info to provide in diffs that is faster to parse than "left" and "right" ? Can the system have enough data to print "bob@foo.bar changed this" ?
alunchbox 22 minutes ago [-]
Jujutsu honestly is the future IMO, it already does what you have outlined but solved in a different way with merges, it'll let you merge but outline you have conflicts that need to be resolved for instance.
It's been amazing watching it grow over the last few years.
skybrian 27 minutes ago [-]
It sounds interesting but the main selling point doesn’t really reasonate:
If you haven’t resolved conflicts then it probably doesn’t compile and of course tests won’t pass, so I don’t see any point in publishing that change? Maybe the commit is useful as a temporary state locally, but that seems of limited use?
Nowadays I’d ask a coding agent to figure out how to rebase a local branch to the latest published version before sending a pull request.
2 minutes ago [-]
lifeformed 1 hours ago [-]
My issue with git is handling non-text files, which is a common issue with game development. git-lfs is okay but it has some tricky quirks, and you end up with lots of bloat, and you can't merge. I don't really have an answer to how to improve it, but it would be nice if there was some innovation in that area too.
rectang 26 minutes ago [-]
Has there ever been a consideration for the git file format to allow storage of binary blobs uncompressed?
When I was screwing around with the Git file format, tricks I would use to save space like hard-linking or memory-mapping couldn't work, because data is always stored compressed after a header.
A general copy-on-write approach to save checkout space is presumably impossible, but I wonder what other people have traveled down similar paths have concluded.
miloignis 42 minutes ago [-]
I really think something like Xet is a better idea to augment Git than LFS, though it seems to pretty much only be used by HuggingFace for ML model storage, and I think their git plugin was deprecated? Too bad if it ends up only serving the HuggingFace niche.
jauntywundrkind 53 minutes ago [-]
In case the name doesn't jump out at you, this is Bram Cohen, inventory of Bittorrent. And Chia proof-of-storage (probably better descriptions available) cryptocurrency. https://en.wikipedia.org/wiki/Bram_Cohen
It's not the same as capturing it, but I would also note that there are a wide wide variety of ways to get 3-way merges / 3 way diffs from git too. One semi-recent submission (2022 discussing a 2017) discussed diff3 and has some excellent comments (https://news.ycombinator.com/item?id=31075608), including a fantastic incredibly wide ranging round up of merge tools (https://www.eseth.org/2020/mergetools.html).
No matter the tool, merges should always be presented like that. It's the only presentation that makes sense.
I assume the proposed system addresses it somehow but I don't see it in my quick read of this.
Technically you could include conflict markers in your commits but I don't think people like that very much
FWIW I've struggled to get AI tools to handle merge conflicts well (especially rebase) for the same underlying reason.
It'll fire on merge issues that aren't code problems under a smarter merge, while also missing all the things that merge OK but introduce deeper issues.
Post-merge syntax checks are better for that purpose.
And imminently: agent-based sanity-checks of preserved intent – operating on a logically-whole result file, without merge-tool cruft. Perhaps at higher intensity when line-overlaps – or even more-meaningful hints of cross-purposes – are present.
That has not been my experience at all. The changes you introduced is your responsibility. If you synchronizes your working tree to the source of truth, you need to evaluate your patch again whether it introduces conflict or not. In this case a conflict is a nice signal to know where someone has interacted with files you've touched and possibly change their semantics. The pros are substantial, and it's quite easy to resolve conflicts that's only due to syntastic changes (whitespace, formatting, equivalent statement,...)
the key insight is that changes should be flagged as conflicting when they touch each other, giving you informative conflict presentation on top of a system which never actually fails.
In the general case, such commits cannot be considered the same — consider a commit which flips a boolean that one branch had flipped in another file. But there are common cases where the commits should be considered equivalent, such as many rebased branches. Can the CRDT approach help with e.g. deciding that `git branch -d BRANCH` should succeed when a rebased version of BRANCH has been merged?
conflict free merging sounds cool, but doesn't that just mean that that a human review step is replaced by "changes become intervals rather than collections of lines" and "last set of intervals always wins"? seems like it makes sense when the conflicts are resolved instantaneously during live editing but does it still make sense with one shot code merges over long intervals of time? today's systems are "get the patch right" and then "get the merge right"... can automatic intervalization be trusted?
Codeville also used a weave for storage and merge, a concept that originated with SCCS (and thence into Teamware and BitKeeper).
Codeville predates the introduction of CRDTs by almost a decade, and at least on the face of it the two concepts seem like a natural fit.
It was always kind of difficult to argue that weaves produced unambiguously better merge results (and more limited conflicts) than the more heuristically driven approaches of git, Mercurial, et al, because the edit histories required to produce test cases were difficult (at least for me) to reason about.
I like that Bram hasn’t let go of the problem, and is still trying out new ideas in the space.
> ... CRDTs for version control, which is long overdue but hasn’t happened yet
Pijul happened and it has hundreds - perhaps thousands - of hours of real expert developer's toil put in it.
Not that Bram is not one of those, but the post reads like you all know what.
... and of course it is, because Pijul uses Pijul for development, not Git and GitHub!
I'm surprised! Pijul has been discussed here on HN many, many times. My impression is that many people here were hoping that Pijul might eventually become a serious Git contender but these days people seem to be more excited about Jujutsu, likely because migration is much easier.
What I do think is the critical challenge (particularly with Git) is scalability.
Size of repository & rate of change of repositories are starting to push limits of git, and I think this needs revisited across the server, client & wire protocols.
What exactly, I don't know. :). But I do know that in my current role (mid-size well-known tech company) is hitting these limits today.
[1]: https://github.com/samoshkin/vim-mergetool
- What kind of problems do 1 person, 10 person, 100 person, 1k (etc) teams really run into with managing merge conflicts?
- What do teams of 1, 10, 100, 1k, etc care the most about?
- How does the modern "agent explosion" potentially affect this?
For example, my experience working in the 1-100 regime tells me that, for the most part, the kind of merge conflict being presented here is resolved by assigning subtrees of code to specific teams. For the large part, merge conflicts don't happen, because teams coordinate (in sprints) to make orthogonal changes, and long-running stale branches are discouraged.
However, if we start to mix in agents, a 100 person team could quickly jump into a 1000 person team, esp if each person is using subagents making micro commits.
It's an interesting idea definitely, but without real-world data, it kind of feels like this is just delivering a solution without a clear problem to assign it to. Like, yes merge-conflicts are a bummer, but they happen infrequently enough that it doesn't break your heart.
Well, isn't that what the CRDT does in its own data structure ?
Also keep in mind that syntactic correctness doesn't mean functional correctness.
The goal should be to build a full spec and then build a code forge and ecosystem around this. If it’s truly great, adoption will come. Microsoft doing a terrible job with GitHub is great for new solutions.
It's been amazing watching it grow over the last few years.
If you haven’t resolved conflicts then it probably doesn’t compile and of course tests won’t pass, so I don’t see any point in publishing that change? Maybe the commit is useful as a temporary state locally, but that seems of limited use?
Nowadays I’d ask a coding agent to figure out how to rebase a local branch to the latest published version before sending a pull request.
When I was screwing around with the Git file format, tricks I would use to save space like hard-linking or memory-mapping couldn't work, because data is always stored compressed after a header.
A general copy-on-write approach to save checkout space is presumably impossible, but I wonder what other people have traveled down similar paths have concluded.
It's not the same as capturing it, but I would also note that there are a wide wide variety of ways to get 3-way merges / 3 way diffs from git too. One semi-recent submission (2022 discussing a 2017) discussed diff3 and has some excellent comments (https://news.ycombinator.com/item?id=31075608), including a fantastic incredibly wide ranging round up of merge tools (https://www.eseth.org/2020/mergetools.html).
However/alas git 2.35's (2022) fabulous zdiff3 doesn't seems to have any big discussions. Other links welcome but perhaps https://neg4n.dev/blog/understanding-zealous-diff3-style-git...? It works excellently for me; enthusiastically recommended!