NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Patch applies fake diffs from commit messages (samizdat.dev)
LiamPowell 3 days ago [-]
This has come up multiple times before [1], and more generally it's come up hundreds of times with Unix style tools in general. It's always been a stupid idea for every tool to have its own barely documented file format.

This wouldn't be an issue if patches were XML or JSON with a well defined schema, but everything must be a boutique undocumented format in the world of Unix tools.

Maybe the worst part about this is that it can entirely come from a patch being exported by git and then imported straight back in to git. If you can't even handle your own undocumented format then what hope do other tools have that want to work with it?

[1]: https://mas.to/@zekjur/116022397626943871

nyrikki 3 days ago [-]
While patch[0] has problems, the issue here is not that it is undocumented.

Git recently added this doc on roundtripping, and the problem is with git.

     Any line that is of the form:
     * three-dashes and end-of-line, or
     * a line that begins with "diff -", or
     * a line that begins with "Index: "

     is taken as the beginning of a patch, and the commit log message is terminated before the first occurrence of such a line.

The patch isn't even the complicated forms with RCS, ClearCase, Perforce, or SCCS support, it is just doing what the pre-POSIX spec says.

The argument is if git should do input sanitation etc...

But `patch -p1` is doing exactly what was documented, even in the original Larry Wall usenet post of the program.

[0] https://pubs.opengroup.org/onlinepubs/9799919799/utilities/p... [1] https://github.com/git/git/blob/94f057755b7941b321fd11fec1b2...

yjftsjthsd-h 3 days ago [-]
> This wouldn't be an issue if patches were XML or JSON with a well defined schema, but everything must be a boutique undocumented format in the world of Unix tools.

Patch files are readable by humans. Replacing them with XML or JSON would fix this problem, but at the expense of removing a core feature.

phoe-krk 3 days ago [-]
If, by "readable by humans", you mean "it would reliably fool humans as well", I'd say it's an ambiguity bug regardless of whether it's "a core feature" or not. A patch format, human-readable or not, should clearly indicate which part is the commit message and which part is an actual diff; it's not the case here.
yjftsjthsd-h 3 days ago [-]
Alright, allow me to disambiguate in your preferred format.

  <?xml version="1.0" encoding="UTF-8"?> <claims> <claims_I_did_not_make description='Claims that I did not make or defend.'> <claim>Patch is perfect.</claim> <claim>Ambiguity is good.</claim> <claim>There are no better formats for conveying patches.</claim> </claims_I_did_not_make> <claims_I_did_make description='What I actually said.'> <claim>Patch files are readable by humans.</claim> <claim>Being readble by humans is useful.</claim> <claim>XML is painful for humans to read and write.</claim> <claim>JSON is painful for humans to read and write.</claim> <claim caveat='Actually this would require all parties to handle JSON or XML correctly which on further reflection I am not sure about. Still, it is a claim I initially made.'>JSON or XML would actually fix this problem in the format.</claim> </claims_I_did_make> <claims_I_did_not_make_but_am_open_to description='Things that were never specified but that I do not actually disagree with.'> <claim>The patch format could be improved.</claim> <claim>Formats should be unambiguous.</claim> <claim>Separating sections is good.</claim> </claims_I_did_not_make_but_am_open_to> </claims>
em-bee 2 days ago [-]
that's not the preferred format for writing XML, this is:

    <?xml version="1.0" encoding="UTF-8"?>
    <claims>
      <claims_I_did_not_make description='Claims that I did not make or defend.'>
        <claim>Patch is perfect.</claim>
        <claim>Ambiguity is good.</claim>
        <claim>There are no better formats for conveying patches.</claim>
      </claims_I_did_not_make>
      <claims_I_did_make description='What I actually said.'>
        <claim>Patch files are readable by humans.</claim>
        <claim>Being readble by humans is useful.</claim>
        <claim>XML is painful for humans to read and write.</claim>
        <claim>JSON is painful for humans to read and write.</claim>
        <claim caveat='Actually this would require all parties to handle JSON or XML correctly which on further reflection I am not sure about. Still, it is a claim I initially made.'>JSON or XML would actually fix this problem in the format.</claim>
      </claims_I_did_make>
      <claims_I_did_not_make_but_am_open_to description='Things that were never specified but that I do not actually disagree with.'>
        <claim>The patch format could be improved.</claim>
        <claim>Formats should be unambiguous.</claim>
        <claim>Separating sections is good.</claim>
      </claims_I_did_not_make_but_am_open_to>
    </claims>
yjftsjthsd-h 2 days ago [-]
What I posted is valid XML. And even prettified, it's a pain to read.
em-bee 2 days ago [-]
it was valid but not the way XML is written or read by humans which is what we are discussing. how much of a pain it is to read is a matter of taste. i won't deny that. but XML can be made more readable without fail because it is a structured format. i would not have been ale to reformat a patch text the way i reformatted this XML example. XML is also more powerful. it could handle word based changes, as opposed to patch which can only do line based changes. same goes for JSON. patch could potentially be improved, but i don't see how it could handle word based changes without extra syntax to mark line breaks.
Cpoll 3 days ago [-]
That's really not that bad, especially with indentation and color coding. You're kind of cheating by putting it into HN, which is terrible for code.

> XML is painful for humans to read and write.

Speaking of claims no-one made; no-one's talking about writing patch files by hand.

antiframe 3 days ago [-]
If that's good enough to be human readable than patch is even better.

People do write patch files be hand.

Cpoll 2 days ago [-]
If the only thing we're concerned about is human readability, we can do better than patch files with their pesky @@ lines and plusses and minuses. But we're talking about a compromise between readability and parseability/schemas.
antiframe 2 days ago [-]
Yep and the XML goes too far into unreadability. Also parsing XML is heavier weight.
wnoise 3 days ago [-]
More commonly, edit them.
syntheticnature 3 days ago [-]
Haha, good one. Much like Makefiles, patch format precedes a lot of more modern things (by decades!) and is good enough to stick around. Unlike Makefiles, I've never seen tool gain any acceptance at all to replace patch.
tetha 3 days ago [-]
And a lot of these older tools are not meant to be fed untrusted, unvetted input. The patch shown there confused me for quite a bit.

Or, more snarky: tee is also a huge security problem if you pipe untrusted input into `tee -a /etc/passwd`, such as `curl | tee -a /etc/passwd`. Not many things are safe with a `curl |` in front of them. I think yes might be?

Avamander 3 days ago [-]
This is where I kind-of like the idea of PowerShell, it's just that I dislike almost all other aspects of it and around it.
bombcar 3 days ago [-]
Same - psh has one good idea and it’s this. The next evolution of shells needs to include it.
em-bee 2 days ago [-]
can either of you elaborate what you mean? are you talking about support for structured data passing between scripts/programs?
bombcar 2 days ago [-]
Yes - https://devblogs.microsoft.com/scripting/working-with-json-d...

Tons of bugs in scripting in Unix come from the fact that data and metadata are interspersed in the same stream (you can mitigate somewhat with stderr vs stdout but hardly anyone does). Examples include things like trying to handle random filenames from * expansions.

It’s a bit more annoying to deal with sometimes, but for actual scripts it’s much more foolproof.

xargs is one of the programs that is designed to work around the original issue.

Avamander 2 days ago [-]
Yes, structured data between scripts and programs. No xargs, tee, awk, sed, grep mangling. No "argument list too long" errors.

So many problems are avoided, but at the same time the Windows ecosystem is just so far from providing an properly usable terminal experience. Things are still really not designed to be used from PowerShell.

em-bee 2 days ago [-]
right, see my response to the sibling comment.
chrishill89 3 days ago [-]
> Maybe the worst part about this is that it can entirely come from a patch being exported by git and then imported straight back in to git.

No one wants to apply diffs in commit messages. But some people use this technique via email:

    Finally fix it

    ---

    Changes in v2:

    - Proper formatting
    - Remove irrelevant typo fix
They’ve used the `---` commit message delimiter in the commit message itself so that everything after it won’t be applied by git-am(1). So that’s intentional loss of round tripping.

I would personally use Git notes instead though.

    Finally fix it

    ---

    Notes:
        Changes in v2: ...
kevin_thibedeau 3 days ago [-]

  Patch: 1985
  SGML: 1986
thesz 3 days ago [-]
XML: 1996
em-bee 2 days ago [-]
what are you suggesting? XML is a simplified form of SGML. an SGML parser can parse XML so it was already possible to write an XML like document before XML was defined.
pwdisswordfishs 3 days ago [-]
> This wouldn't be an issue if patches were XML or JSON

Or MIME, even.

jolmg 3 days ago [-]
> It matters (to me) because `wget`/`curl` plus `patch` is not some exotic lab setup.

If the point is to be able to do `curl https://...deadbeef.patch | patch -p1`, you can just change the extension provided to Github from `.patch` to `.diff`. That way, it just includes the hunks. E.g.

https://github.com/torvalds/linux/commit/dca922e019dd758b4c1...

I don't see it as a problem with the email format, because I can't imagine someone just patching from an email without looking at the email first.

jolmg 3 days ago [-]
I've changed my mind regarding the email format not being a problem. I was thinking of emailed git commits as this ad-hoc thing, but I forgot that `git-format-patch` and `git-am` exist. It's not just an incompatibility between them and `patch`. If you have a line `---` in your git commit message, `git-format-patch` will not somehow escape it, resulting in git-am truncating your git commit message. The email commit format is kinda bad. Github `.patch` exports are just being compatible with `git-am`, so I don't think it's a bug with them.

This is Re:

> I do not yet know whether the bug belongs to GNU patch, GitHub’s .patch export, or the broader patch-format contract.

I don't think this is a problem with GNU patch or the patch format per se, just the emailed commit format. I think the patch format's good because it allows it to be embedded in other texts and also allows comments or extended syntax between hunks. For example, the lines

  diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
  index 15ba592236e845..725a49a0eee72e 100644
not being part of the hunk and probably being seen as a comment by `patch`.

Simply changing `git-am` to accept commit messages being indented in the email format would allow Github fix the issue of their .patch exports being incompatible with `patch`, in addition to fixing that bug about commit messages being truncated when they have a `---` line.

Groxx 3 days ago [-]
Seems like this would probably be solved if github returned a patch file formatted like `git show` provides, specifically with the commit message indented? I do see that `git format-patch` doesn't do this indentation though.

In any case, agreed that it's not a great "feature" to use in-band signaling of when patch data starts, with no escaping. Confusion and misbehavior is pretty much guaranteed.

thaumasiotes 3 days ago [-]
> Seems like this would probably be solved if github returned a patch file formatted like `git show` provides, specifically with the commit message indented? I do see that `git format-patch` doesn't do this indentation though.

This would be "solved" if the patch file only included the patch. That's pretty straightforward. The file github provides includes fake email headers for no particular reason. The commit message appears to be part of the subject header. The subject header is never terminated, so arguably applying this patch shouldn't do anything. (Because the actual patch data is also part of the email headers.) The other headers aren't terminated either, so actually there is no subject header. This shouldn't really matter, because the patch file isn't email, but it does seem to want to pretend to be.

The usual question to ask here would be "why are you applying patch files from an untrusted source?". If patch(1) was stricter about the format of its input files... applying patches from an untrusted source would still be a good way to get owned. If you think I can get you to patch inappropriate files by writing a fake diff into my commit message... wait until you see what I can do by writing those same changes into the real diff.

quuxplusone 3 days ago [-]
> fake email headers

That's the output you get from `git format-patch --stdout -1 dd28283`. The idea is that it's suitable for emailing to a mailing list for review (hence the subject line beginning with [PATCH], and so on).

If you ask for colorized output with `git format-patch --color=always --stdout -1 dd28283` you'll see that `git format-patch` itself knows which bits are the commit message and which bits are the diff. (Well, of course it does, I guess.)

I suspect that if you sent a patch like this to the mailing list, they'd get mad at you. So `git format-patch` is working OK for its intended use-case. Arguably it's GitHub causing the problem here by "misusing" `git format-patch` as a way to deliver patches that are (these days) expected to be machine-readable — something you can just curl and pipe into `patch`. `git format-patch` doesn't do that.

That said, yeah, it's amusing that (as TFA says)

    git format-patch -1 HEAD --stdout > 0001 ;
    git checkout HEAD~ ;
    git am 0001
isn't a clean round-trip. `git am` applies the fake diff.

> If you think I can get you to patch inappropriate files by writing a fake diff into my commit message... wait until you see what I can do by writing those same changes into the real diff.

Well put. :)

thaumasiotes 2 days ago [-]
> The idea is that it's suitable for emailing to a mailing list for review

It doesn't comply with RFC 5322 ( https://www.rfc-editor.org/rfc/rfc5322#section-2.2 ), which requires that email headers terminate in CRLF.

quuxplusone 2 days ago [-]
jolmg points out that if you use a GitHub URL ending in .diff instead of .patch, you get something much more suitable to feed mechanically into `patch`. (And probably not so exploitable.)

Therefore I retract my claim that this is even a "misuse" of `git format-patch` by GitHub. Seems like GitHub provides both a git-am-able endpoint and a (less exploitable) patch-able endpoint, and the issue is just that OP chose the less suitable one of those two endpoints.

chrishill89 3 days ago [-]
git-am(1) (apply patches) delimits the commit message from the patch/diff by looking for (1) a line `---` or (2) a line that starts with `diff -` or (3) a line that starts with `Index:SP` (SP is space). Only the first rule is necessary for patches generated git-format-patch(1). But git-am(1) is for applying patches, and you are free to bring patches from some other system. That’s why, I suppose, there are multiple options.

This means that it will try to apply any unindented diffs in the commit message. But you’re fine if you indent the diff. (Newschool code fencers will have a worse time here.)

I imagine that this worked fine for changes that were authored by one person and submitted by another person via email, or by their friend, or by someone trying to resurrect a previous attempt at getting something upstreamed. Someone is likely to notice that examples diffs are getting applied. But it won’t work well at all if you are some software distributor who is using patch files to apply modifications to packages.

Recall that git-am(1) will not apply indented diffs. Well have a look at my GNU patch 2.7.6:

    If the entire diff is indented by a consistent amount, if lines end in
    CRLF, or if a diff is encapsulated one or more times by prepending "- "
    to lines starting with "-" as specified by Internet RFC 934, this is
    taken into account.
Some may say that patch(1) should work like a more straightforward importer. But I’ve been itching to point out something else.

    Larry Wall wrote the original version of patch.
Is it surprising if patch(1) is a bit DWIM?
Schnitz 3 days ago [-]
But what problem does this actually introduce? If you are applying a patch you must already trust the source anyways and this isn’t harder to spot than a rogue file anywhere else in the patch as it looks the same.
lostmsu 2 days ago [-]
Field confusion. If you manually reviewed the diff in GitHub but did not pay attention to the commit message (which GitHub already collapses when long or may start doing any day they want), you are screwed.

Imagine the diff only has

  if (someIntParamThatShouldHaveBeenUInt < 0)
    throw new ArgumentOutOfRangeException();
Would you care who wrote it?
d--b 2 days ago [-]
This looks like a feature.

Not sure why somebody thought it was a good idea. There probably was a use case at some point where it seemed smart to allow for things like this. Maybe it’s for CI, but I really can’t think of why.

But I really doubt this could be a bug. I mean why would patch even need to read the commit message if it wasn’t to scan it for diff?

badc0ffee 3 days ago [-]
I like that the site offers your choice of two fonts, but they're both ugly and hard to read. Mono is too narrow, and OpenDyslexic is cartoonish and o v e r k e r n e d.
holotherapper 2 days ago [-]
Diff format being too forgiving and commit messages being too permissive are each fine on their own. The bug is the composition.
jasonmp85 3 days ago [-]
[dead]
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 14:07:50 GMT+0000 (Coordinated Universal Time) with Vercel.