1668: Diffs not applying cleanly circa 1.0.7 with odd line endings

AaronJ*******@gmai***** (Google Code) (Is this you? Claim this profile.)
chipx86
chipx86
July 18, 2011
What version are you running? 1.0.7


What's the URL of the page containing the problem? N/A


What steps will reproduce the problem?
1. Create a file with mixed DOS and Unix line endings
2. post-review
3. view diffs

What is the expected output? What do you see instead?

The diff page produces an error such as "Error: The patch to '...' didn't
apply cleanly."

What operating system are you using? What browser?

Linux. Any browser.

Please provide any additional information below.

In one case, I found that a user had a file containing line endings that
were solely carriage returns. In one I found the file contained mixed CR/LF
and LF. While I realize that this is not ideal, at least providing the user
with some feedback on what went wrong would help them solve their own
problems (e.g. changing the line endings or marking the file as binary).
chipx86
#1 chipx86
We have a lot of logic in place to handle mixed line endings. I'm sad to see that
something is breaking. I need an example diff that demonstrates this problem,
preferably one that applies against the Review Board repository so that we can test it.
  • +NeedInfo
#2 AaronJ*******@gmai***** (Google Code) (Is this you? Claim this profile.)
I'll see what I can do tonight. I didn't have anything clean that I could hand off 
outside of the company when I filed this, but I think I should be able to create 
something trivially enough. One file that triggered it simply had:

a
b^M
^M
c
d

So reproducing it shouldn't take long.

#3 AaronJ*******@gmai***** (Google Code) (Is this you? Claim this profile.)
Ah, OK, here's the easiest way to reproduce. This worked on a file we had checked in:

Open an *existing* file (my post above was wrong, new files seem not to trigger this).

Modify an existing line, adding ^M at the end of the line (which places it before the
NL).

Now join the following two lines (deleting the newline) and insert a ^M between them.
I'm going to upload a pair of files to which I've done this. This should trigger the
bug on your side when you check in the first file and then try to publish a review
using the second file as the edit.
#4 jtsh****@gmai***** (Google Code) (Is this you? Claim this profile.)
Folks were able to post the review using UI after using this workaround:
$<our p4 wrapper> diff --strip-trailing-cr > your.html
$ upload your html to <RB server>/r/new/

chipx86
#5 chipx86
One thing I'm not sure from the repro case here. Is this the source file (pre-patch)
that has the funky newlines? Or does the patch add them?
chipx86
#6 chipx86
  • -Priority-Medium
    +Priority-Critical
    +Milestone-Release1.0.x
    +Component-DiffViewer
  • +chipx86
chipx86
#7 chipx86
Nevermind, I have a test case now. I just don't have a fix that's working.

The problem is that we do newline normalization so that files with lines ending in
\r, \n, \r\n, or \r\r\n are processed correctly, but this completely breaks with the
patch files.

If you have:

foo
bar^M
foo^Mbaz
foo

And then you add a couple lines, diff will give you:

 foo
+hello
 bar^M
 foo^Mbaz
 foo
+world

See how diff considers the ^M to be part of the string, not a newline.

However, our parser will see that ^M and try to make it a newline, in order to work
around other file formats and oddities. This then breaks us, since we'll get a
normalized diff that looks like:

 foo
+hello
 bar
 foo
baz
 foo
+world


We may have to process diffs separately, but it's going to be hard to do so without
causing new problems.

I'd be really interested in finding out how these files are getting so messed up.
  • -NeedInfo
    +Confirmed
chipx86
#8 chipx86
Of course, if we then change the diff so that the ^M is preserved, it'll break
against the source input where the ^M is not preserved, and there's no great way to
preserve that there. If we do change the ^M to "\n " in order to get a new line in
the diff, the lines referenced in the diff will no longer be correct and it may fail
to apply. Rock and a hard place.
#9 AaronJ*******@gmai***** (Google Code) (Is this you? Claim this profile.)
There's, of course, no question that this is outside of what one would typically 
expect, but as for "how these files are getting so messed up," imagine that you're 
reviewing a change to ReviewBoard itself which addresses this problem. It might well 
include a test case with such a file and subsequent changes to the test case... well, 
you see where this is going.

It is, in fact, purely in a QA context that we're seeing these issues (though that 
might be partially a result of the versioning differences we have between some parts 
of our QA environment and other environments using ReviewBoard.
chipx86
#10 chipx86
Okay. Well, as it currently stands, I don't know that we can do anything about this
without severely breaking other things. As I look into it and think about it, I'm
less and less optimistic. Will keep trying for a while though.
chipx86
#11 chipx86
Let me ask.. Is the intent of the ^M in these files to be newlines, or are they just
ending up this way from broken editors? Are they intended to be there for the sake of
"I want a ^M" ?
#12 AaronJ*******@gmai***** (Google Code) (Is this you? Claim this profile.)
I believe that, yes, these are intentional in some cases. I also believe that they're
unintentional in others. If post-review choked and said "I just can't handle these,
mark them as binary or fix the line-endings" that would not be a great solution, but
it would at least let users recover. Currently, the way it works, the review ends up
with an error on the diffs page after it's been uploaded (and very often, after it's
been published as users rarely check the diffs to see if they showed up correctly).
This leads to two problems:

1) Reviews that focus on the diff failure rather than the changes
2) User confusion because the error doesn't indicate what went wrong

chipx86
#13 chipx86
Okay, so what I can do, potentially, is convert only \r\r\n and \r\n to \n.

However, your case above where the file is mostly \n characters, but with a couple
extra \r characters immediately preceding the \n characters will still break. Those
will be seen as \r\n and converted to \n, which will still break.

So, pick your poison, I guess. :/
#14 AaronJ*******@gmai***** (Google Code) (Is this you? Claim this profile.)
Is there any way to identify the problem at post-review time and let the user decide 
what to do about it?
chipx86
#15 chipx86
I know your question is now a year old, but I missed it. Sorry :/

Basically, yes, you could in theory have post-review ask, but there's no guarantee of getting it right. This is such a tricky problem and every fix I've tried has caused new problems. I spent a great deal of time on this, and researched how other tools solved it. To summarize: They haven't. At least not when I was working on this last. There were bug reports and complaints for other tools, all involving this same sort of issue, with no great solutions.

I think what we need to do is just provide a better error response for broken diffs. Allow the user to see the diff file itself and download it. It may not be obvious that it's the line endings, but it's better than "The patch doesn't apply."

As for this bug, I don't think there's anything we can do about it. People should be careful about line endings :)
  • -Confirmed
    +WontFix