2980: Diff viewer has problems with control characters
- Fixed
- Review Board
pekka.t.*********@gmai***** (Google Code) (Is this you? Claim this profile.) | |
June 1, 2015 |
What version are you running? 1.7.6 What's the URL of the page containing the problem? - What steps will reproduce the problem? 1. Create change in a file that contains Form Feed control character. Make the change BELOW the FF character. 2. Submit review request. What is the expected output? What do you see instead? ReviewBoard diff viewer highlites wrong row. What operating system are you using? What browser? ReviewBoard is running on Linux. Using Windows7 + Chrome for viewing review requests. Our source control system is SVN 1.7.8 (Windows). Please provide any additional information below. Our codebase is ancient. It seems to contain some control characters (besides CR and LF) for example in C source files. ReviewBoard diff viewer seems to struggle with them, I have so far identified at least Form Feed (FF) control character to cause problems. Change highliting gets displaced by one line per FF. When the FFs accumulate, the highlited part can get pretty far from the actual change. See attached picture. I also attached the actual diff.
Hi Team, We are also facing the same issue. Any input on this. The strange thing is it used to work on ReviewBoard 1.6.3. Thanks, Satish
The cause is that splitlines() behaves differently for unicode vs regular strings. For unicode, it considers form-feed a line breaking character and splits on that. But for regular strings it does not. For example: 41 zara:~/> python -c 'print "A\n\f\nB\n".splitlines()' ['A', '\x0c', 'B'] # good 42 zara:~/> python -c 'print u"A\n\f\nB\n".splitlines()' [u'A', u'', u'', u'B'] # ate \f and added extra empty elem References: http://bugs.python.org/issue12855 http://stackoverflow.com/questions/24453713/python-string-splitlines-removes-certain-unicode-control-characters This means the lines will be off by the number of ^L chars above it in the diff and won't match the diff. The diff ends up getting stored in the DB w/o the ^L chars. Sometimes it won't patch, and sometimes it does but renders incorrectly. There are a few places where splitlines() is used on the diff. As an experiment I modified DiffParser, DiffChunkGenerator, TextBasedReviewUI, and filter_interdiff_opcodes to use a hacked splitlines() routine that first swaps ^L with NUL, splits, then swaps back and that makes things work (attached). Not sure that is the best strategy, but it ends up being a simple change. A similar approach would be to cons together a random number or UUID along with "reviewboard" and use that as the swap string, or use some other non-printable. It would be nice to see this fixed, but I'm aware that the group of Lisp-influenced Emacs-lovers who employ ^L in this manner might not be a growing population. However, if the RB developers think this is worth persuing I can spend more time on a fix.
-
+
There are a number of popular open-source repositories and projects with e.g. ^L form-feed in their sources. For example, GCC.
I believe this issue is the root cause of bug 3609.
Any movement on this issue? This is a real bug, even if using ^L in source code is less common than it used to be. In our case, we have no choice, as the code in question is third-party code, not our own. Thanks.
This seems to be fixed in 2.0.13? Briefly tested - looking good now.