4371: Diff move detection incorrectly matches source and target

smacleod
chipx86
chipx86

What version are you running?

2.5.2

What's the URL of the page containing the problem?

https://reviewboard.mozilla.org/r/35163/diff/4#

What steps will reproduce the problem?

  1. Create a new mercurial repository using provided diff-move-issue.bundle
  2. Point Review Board at the new mercurial repository
  3. Create a new review request using provided problem.diff
  4. View the diff in Review Board

What is the expected output? What do you see instead?

Review Board should flag only identical code blocks as moved. Instead, there are some issues with the move blocks, such as:
- pointing at similar but different code
- multiple moved from line blocks on the patched file pointing at the same moved to line block, one of which doesn't match the code.

Particular line numbers to inpect are original:4115, which points to patched:6942, but is also pointed at from patched:6732.

I wouldn't be surprised if there were more issues in this file due to the vast number of move blocks and the confusion already present.

Please provide any additional information below.

Originally filed as https://bugzilla.mozilla.org/show_bug.cgi?id=1251455

# HG changeset patch
# User Test User <test@example.com>
# Date 1458239948 14400
#      Thu Mar 17 14:39:08 2016 -0400
# Node ID e69f3dd970aaa4a8528ea39fc2a8a78a039e808f
# Parent  355e93cf4e01962c993c784a49178491e168e737
Problem modification
diff --git a/property_database.js b/property_database.js
--- a/property_database.js
+++ b/property_database.js
@@ -4106,225 +4106,16 @@ var gCSSProperties = {
   "marker-start": {
     domProp: "markerStart",
     inherited: true,
     type: CSS_TYPE_LONGHAND,
     initial_values: [ "none" ],
     other_values: [ "url(#mysym)" ],
     invalid_values: []
   },
-  "mask": {
-    domProp: "mask",
-    inherited: false,
-    type: CSS_TYPE_SHORTHAND_AND_LONGHAND,
-    /* FIXME: All mask-border-* should be added when we implement them. */
-    subproperties: ["mask-clip", "mask-image", "mask-mode", "mask-origin", "mask-position", "mask-repeat", "mask-size" , "mask-composite"],
-    initial_values: [ "match-source", "none", "repeat", "add", "0% 0%", "to
#1 smacleod
  • -New
    +Confirmed
  • -Priority:Medium
    +Component:DiffViewer
    +OpSys:All
    +Priority:High
#2 zalun

I started to work on this issue.
I've been debugging the example above. It's definitely not finished, but I've been asked to brain dump what I know for now. The interdiff is reproduced using steps from above. Lines numbers might be not exact as in code as I've been adding logging.

Iterating in loop opcode_generator.py line ~394

i_move_cur 6804
<MoveRange(4114, 4186, [((u'delete', 4113, 4322, 4113, 4113, {u'whitespace_chunk': False, u'whitespace_lines': []}), 1), ((u'delete', 4113, 4322, 4113, 4113, {u'whitespace_chunk': False, u'whitespace_lines': []}), 1), ((u'delete', 4113, 4322, 4113, 4113, {u'whitespace_chunk': False, u'whitespace_lines': []}), 1), ((u'delete', 4113, 4322, 4113, 4113, {u'whitespace_chunk': False, u'whitespace_lines': []}), 1), ((u'delete', 4113, 4322, 4113, 4113, {u'whitespace_chunk': False, u'whitespace_lines': []}), 1), ((u'delete', 4113, 4322, 4113, 4113, {u'whitespace_chunk': False, u'whitespace_lines': []}), 1), ((u'delete', 4113, 4322, 4113, 4113, {u'whitespace_chunk': False, u'whitespace_lines': []}), 1), ((u'delete', 4113, 4322, 4113, 4113, {u'whitespace_chunk': False, u'whitespace_lines': []}), 1), ((u'delete', 4113, 4322, 4113, 4113, {u'whitespace_chunk': False, u'whitespace_lines': []}), 1), ((u'delete', 4113, 4322, 4113, 4113, {u'whitespace_chunk': False, u'whitespace_lines': []}), 1), ((u'delete', 4113, 4322, 4113, 4113, {u'whitespace_chunk': False, u'whitespace_lines': []}), 1), ((u'delete', 4113, 4322, 4113, 4113, {u'whitespace_chunk': False, u'whitespace_lines': []}), 1), ((u'delete', 4113, 4322, 4113, 4113, {u'whitespace_chunk': False, u'whitespace_lines': []}), 1), ((u'delete', 4113, 4322, 4113, 4113, {u'whitespace_chunk': False, u'whitespace_lines': []}), 1), ((u'delete', 4113, 4322, 4113, 4113, {u'whitespace_chunk': False, u'whitespace_lines': []}), 1), ((u'delete', 4113, 4322, 4113, 4113, {u'whitespace_chunk': False, u'whitespace_lines': []}), 1), ((u'delete', 4113, 4322, 4113, 4113, {u'whitespace_chunk': False, u'whitespace_lines': []}), 1), ((u'delete', 4113, 4322, 4113, 4113, {u’whitesp [...]

i_move_cur 6805, ri 4186 (no changes)

MOVED_TO showed up, 4115: 6732 <---> 4186: 6803
i_move_cur 6812
<MoveRange(4189, 4194, [((u'delete', 4113, 4322, 4113, 4113, {u'whitespace_chunk': False, u'moved-to': {4115: 6732, 4116: 6733, 4117: 6734, 4118: 6735, 4119: 6736, 4120: 6737, 4121: 6738, 4122: 6739, 4123: 6740, 4124: 6741, 4125: 6742, 4126: 6743, 4127: 6744, 4128: 6745, 4129: 6746, 4130: 6747, 4131: 6748, 4132: 6749, 4133: 6750, 4134: 6751, 4135: 6752, 4136: 6753, 4137: 6754, 4138: 6755, 4139: 6756, 4140: 6757, 4141: 6758, 4142: 6759, 4143: 6760, 4144: 6761, 4145: 6762, 4146: 6763, 4147: 6764, 4148: 6765, 4149: 6766, 4150: 6767, 4151: 6768, 4152: 6769, 4153: 6770, 4154: 6771, 4155: 6772, 4156: 6773, 4157: 6774, 4158: 6775, 4159: 6776, 4160: 6777, 4161: 6778, 4162: 6779, 4163: 6780, 4164: 6781, 4165: 6782, 4166: 6783, 4167: 6784, 4168: 6785, 4169: 6786, 4170: 6787, 4171: 6788, 4172: 6789, 4173: 6790, 4174: 6791, 4175: 6792, 4176: 6793, 4177: 6794, 4178: 6795, 4179: 6796, 4180: 6797, 4181: 6798, 4182: 6799, 4183: 6800, 4184: 6801, 4185: 6802, 4186: 6803}, u'whitespace_lines': []}), 1), ((u'delete', [...]

i_move_cur 6813, ri 4194

i_move_cur 6831
MoveRange(4197, 4213, [((u'delete', 4113, 4322, 4113, 4113, {u'whitespace_chunk': False, u'moved-to': {4115: 6732, 4116: 6733, 4117: 6734, 4118: 6735, 4119: 6736, 4120: 6737, 4121: 6738, 4122: 6739, 4123: 6740, 4124: 6741, 4125: 6742, 4126: 6743, 4127: 6744, 4128: 6745, 4129: 6746, 4130: 6747, 4131: 6748, 4132: 6749, 4133: 67 [...] (just more numbers)

i_move_cur 6832, ri 4213

i_move_cur 6839
<MoveRange(4216, 4221, [((u'delete', 4113, 4322, 4113, 4113, {u'whitespace_chunk': False, u'moved-to': {4115: 6732, 4116: 6733, 4117: 6734, 4118: 6735, 4119: 6736, 4120: 6737, 4121: 6738, 4122: 6739, 4123: 6740, 4124: 6741, 4125: 6742, 4126: 6743, 4127: 6744, 4128: 6745, 4129: 6746, 4130: 6747, 4131: 6748, 4132: 6749, 4133: [...] (just more numbers)

i_move_cur 6840

i_move_cur 6847
<MoveRange(4224, 4229, [((u'delete', 4113, 4322, 4113, 4113, {u'whitespace_chunk': False, u'moved-to': {4115: 6732, 4116: 6733, 4117: 6734, 4118: 6735, 4119: 6736, 4120: 6737, 4121: 6738, 4122: 6739, 4123: 6740, 4124: 6741, 4125: 6742, 4126: 6743, 4127: 6744, 4128: 6745, 4129: 6746, 4130: 6747, 4131: 6748, [...] (just more numbers)

[...]
iline: inherited: false,
going deeper for i_move_cur 6943
For ri == 4306 self.removes for iline got finished
i_move_cur -= 1
i_move_cur += 1

452

i_move_cur 6943, updated_range False, ij2 6950
"moved-to section" of r_move_range is 4115: 6732 <--> 4321:6938 after _find_longest_move_range

The same after _determine_move_range

in #491 rmeta is set so MoveRange has line 4115: 6942

This shouldn’t happen as this move is only two lines long and is overwriting a pretty long one

#3 zalun

The problem is within algorithm of detecting a group move.
From opcode_generator.py#~350

# The way we do that is to find each removed line that matches
# this inserted line, and for each of those find out if there's
# an existing move range that the found removed line
# immediately follows. If there is, we update the existing
# range.

If there is a MoveRange which starts with the same (2+) lines of code last choice is prefered.
I think that the solution could work along the following lines:
* create a temp MoveRange
* continue adding to it until the following lines are the same
* if the range is longer than existing one replace them entirely.

#4 zalun

I've written a failing test.
I couldn't make it any shorter I'm afraid, hence JSON files.
It's just for illustrative purpose.

Changes to the code from above example:
* Removed lines from inside of the moved block.
* Removed some of the beginning and ending lines.
* Removed all blocks except of the first one with the issue.

line 4115 becomes 16, 6732 -> 2633, 6943 -> 2669

I couldn't remove lines between new and old code (after old, before new), as the issue is no longer present in test.

  • +
    [
    "    inherited: true,\n",
    "    type: CSS_TYPE_LONGHAND,\n",
    "    initial_values: [ \"none\" ],\n",
    "    other_values: [ \"url(#mysym)\" ],\n",
    "    invalid_values: []\n",
    "  },\n",
    "  \"marker-start\": {\n",
    "    domProp: \"markerStart\",\n",
    "    inherited: true,\n",
    "    type: CSS_TYPE_LONGHAND,\n",
    "    initial_values: [ \"none\" ],\n",
    "    other_values: [ \"url(#mysym)\" ],\n",
    "    invalid_values: []\n",
    "  },\n",
    "  \"mask\": {\n",
    "    domProp: \"mask\",\n",
    "    inherited: false,\n",
    "    type: CSS_TYPE_SHORTHAND_AND_LONGHAND,\n",
    "    /* FIXME: All mask-border-* should be added when we implement them. */\n",
    "    subproperties: [\"mask-clip\", \"mask-image\", \"mask-mode\", \"mask-origin\", \"mask-position\", \"mask-repeat\", \"mask-size\" , \"mask-composite\"],\n",
    "    initial_values: [ \"match-source\", \"none\", \"repeat\", \"add\", \"0% 0%\", \"top left\", \"left top\", \"0% 0% / auto\", \"top left / auto\", \"left top / auto\", \"0% 0% / auto auto\",\n",
    "      \"top 
    +
    [
    "    inherited: true,\n",
    "    type: CSS_TYPE_LONGHAND,\n",
    "    initial_values: [ \"none\" ],\n",
    "    other_values: [ \"url(#mysym)\" ],\n",
    "    invalid_values: []\n",
    "  },\n",
    "  \"marker-start\": {\n",
    "    domProp: \"markerStart\",\n",
    "    inherited: true,\n",
    "    type: CSS_TYPE_LONGHAND,\n",
    "    initial_values: [ \"none\" ],\n",
    "    other_values: [ \"url(#mysym)\" ],\n",
    "    invalid_values: []\n",
    "  },\n",
    "  \"shape-rendering\": {\n",
    "    domProp: \"shapeRendering\",\n",
    "    inherited: true,\n",
    "    type: CSS_TYPE_LONGHAND,\n",
    "    initial_values: [ \"auto\" ],\n",
    "    other_values: [ \"optimizeSpeed\", \"crispEdges\", \"geometricPrecision\" ],\n",
    "    invalid_values: []\n",
    "  },\n",
    "  \"stop-color\": {\n",
    "    domProp: \"stopColor\",\n",
    "    inherited: false,\n",
    "    type: CSS_TYPE_LONGHAND,\n",
    "    prerequisites: { \"color\": \"blue\" },\n",
    "    initial_values: [ \"black\", \"#000\", \"#000000\", \"rgb(0,0,0)\", \"rgba(0,0,0,1)\" ],\n",
    "    other_values
    +
    diff --git a/reviewboard/diffviewer/tests.py b/reviewboard/diffviewer/tests.py
    index b1d8722f0..4b94d8cb4 100644
    --- a/reviewboard/diffviewer/tests.py
    +++ b/reviewboard/diffviewer/tests.py
    @@ -927,6 +927,48 @@ class DiffParserTest(TestCase):
                 ]
             )
     
    +    def test_move_detection_similar_blocks(self):
    +        """Testing diff viewer move detection with similar lines in code.
    +
    +        bug https://hellosplat.com/s/beanbag/tickets/4371/ """
    +        import json
    +        with open('fixturea.json') as f:
    +            a = json.load(f)
    +        with open('fixtureb.json') as f:
    +            b = json.load(f)
    +        self._test_move_detection(a, b, [{
    +            2633: 16, 2634: 17,
    +            2635: 18, 2636: 19, 2637: 20, 2638: 21, 2639: 22, 2640: 23,
    +            2642: 24, 2643: 25, 2644: 26, 2645: 27, 2646: 28, 2649: 31,
    +            2650: 32, 2651: 33, 2652: 34, 2653: 35, 2654: 36, 2655: 37,
    +            2656: 38, 2657: 39, 2658: 40, 2659: 41, 2660: 42, 2661: 43,
    +   
chipx86
#5 chipx86

Thanks for looking into all this, and all the information you've provided. I'm going to go through what you have and refresh my memory on the details of the algorithm.

  • +chipx86
chipx86
#6 chipx86

The unit test was very helpful, thank you for that! I've spent some time today on this, and while I don't have a solution yet, it's becoming more clear. (I've had a lot distracting me today, and I'm a bit groggy from a rough night's sleep, but hey, progress is progress.)

I think I have a handle on what's going on, but want to verify some of my findings before I go into it too much. I'm hoping to have something helpful, maybe a fix, this weekend.

chipx86
#7 chipx86

I have a patch that still needs some additional testing, but if you could give it a try and see if it works for any cases you've hit (and standard cases that appeared to work fine), that would really help :)

I did test it with the review request in the description above, and it seems to produce the right results (I haven't inspected every character of move range but I did look for the obvious problems that were previously hit).

The attached patch contains the test data you supplied as well. However, I noticed that one of the lines in the test case you supplied was still pointing to an incorrect move line (33 -> 2671 should have been 33 -> 2615). I've fixed that in the attached diff.

  • +
    From f84fda71842997046d41b2ea3022397ecf46a99d Mon Sep 17 00:00:00 2001
    From: Christian Hammond <christian@beanbaginc.com>
    Date: Fri, 3 Mar 2017 01:21:53 -0800
    Subject: [PATCH] Fix reuse of previously-used move detection ranges.
    When attempting to detect moved ranges in diffs, it was possible for
    multiple remove groups to be found for a line, and for the wrong group
    to be used. When this happened, an existing move range would also be
    found and used, and this could overshoot and mark incorrect lines as
    moved. Even when this didn't happen, the same block could appear to have
    moved more than once, without any proper cross-referencing on the other
    end of the move.
    This change makes an attempt to fix this by tracking inserted lines that
    have already been used in a move and making them ineligible for future
    moves. That prevents the incorrect existing move ranges from being
    reused and overshooting, and it prevents lines from having multiple move
    blocks they're a part of.
    This still needs ad
#8 zalun

We're definitely closer. Thank you.
Applying the patch made the first issue disappear (I've checked on "live" db on my local machine).
However, the second one - https://bugzilla.mozilla.org/show_bug.cgi?id=1251455#c3 remaines unchanged.
I think it might be a separate issue...

chipx86
#9 chipx86

Awesome :) Glad it seems to work. The more testing we can get on this fix, the better. I think we'll be able to incorporate that into the next 2.0.x and 2.5.x releases.

Yeah, this other bug is a separate issue, and a very silly one. This one is pure JavaScript. It's not differentiating between the left-hand side and right-hand side when finding the correct anchor. It just so happens that there's a line 220 on the right-hand side with a comment flag, which is taking precedence over the line 220 on the left-hand side for the anchor matching. I'll get this fixed.

I really appreciate all your help on this!

chipx86
#10 chipx86

Latest version of this change (with an off-by-two fix) is up at https://reviews.reviewboard.org/r/8795/.

  • -Confirmed
    +PendingReview
  • +Release-2.0.x
    +Release-2.5.x
chipx86
#11 chipx86

The move flag anchor bug should be fixed at https://reviews.reviewboard.org/r/8796/.

chipx86
#12 chipx86

Pushed to release-2.0.x (151e98622de268c38548859569ce2a7715391f55)

  • -PendingReview
    +Fixed
#13 zalun

This doesn't fix all of the issue. I'll create a new test. Do you like this ticket to be updated or should I create a new one?

chipx86
#14 chipx86

There's a series of fixes up. Try the latest changes from release-2.5.x, and if it's still broken, I'll need a new test to work off of.

#15 zalun

Are they in any different file? I checked out the 3.0 and I haven't found anything different.
I'm unable to fully pull 3.0.x into the MozReview. I'm adding these changes manually.

chipx86
#16 chipx86

These are in 2.5.x. There are changes in a few files across I think 4 fairly recent commits. They haven't yet shipped in a release.

#17 zalun

I've implemented the changes placed in chunk_generator.py [1], but the error from bug 1251455#c3 [2] still partially exists. I'm talking about the double "moved from 220" on the right hand side without the corresponding "moved to 220". I will create a test case in the style created here.
I've created a separate bug for the issue in bugzilla [3] as it seems to be a related, but different issue.

[1] https://reviewboard.mozilla.org/r/123930/diff/2
[2] https://bugzilla.mozilla.org/show_bug.cgi?id=1251455#c3
[3] https://bugzilla.mozilla.org/show_bug.cgi?id=1352340

chipx86
#18 chipx86

The diff you linked to in the most recent Bugzilla isn't running the line number fix we put in. The data-line="..." is used the old format of showing just the line number, rather than moved-to-<linenum> or moved-from-<linenum>. Same with target=. Can you verify the patch was applied? If this is a diff uploaded before the patch went in, then you're probably looking at a stale version in the cache.

  • -Fixed
    +NeedInfo
chipx86
#19 chipx86

That commit is 2f9f515a198deb8f07070f97be24f60fce9f6842, btw.

#20 zalun

When I created the test in plain reviewboard I don't see anything wrong. I might have some change missing as it is applied by copy & paste.

It is not about linking two lines on the same side.
It is about displaying two "moved-from-220" (in line 76 and 103) and none of "moved-to-76" or "moved-to-103".

chipx86
#21 chipx86

I've confirmed the bug locally. I'll aim to get a fix put together next week. Thanks!

chipx86
#22 chipx86

Small thing, but it would help going forward with this bug report and others. Rather than linking to Bugzilla, can you ensure that any information useful to us goes directly here? It's harder to follow when information is on an external bug tracker (particularly for users/admins/contributors), since it sort of splinters the discussion and makes it harder to search/triage/look for duplicates. We've had problems with people being confused about where discussion should go in the past, and important things getting missed.

This actually isn't an uncommon problem for us. We sometimes get people even storing details on a private bug tracker we don't have access to (not a problem here, obviously) and not updating us with any details we can use. We've had to make it policy that all details about a bug involving our components be posted here instead, to keep things sane.

I know you've been providing fantastic information here as well, so I super appreciate that :) Just want to get it all in one place. I hope that makes sense.

Thanks again!

  • -NeedInfo
    +Confirmed
#23 zalun

You're right - just browsed this ticket now and there was no mention of the commit 3 content

#24 zalun

@chipx86.
Please share if/when you've got a failing test.