4620: Diff fragments do not load correctly when the fragment contains non-BMP characters

jhominal
4617

What version are you running?

3.0.0, with the patch described in #4617

What's the URL of the page containing the problem?

/r/{review-request-id}/

What steps will reproduce the problem?

  1. File a review request with a new file with at least two lines containing non-BMP characters (e.g. emoji)
  2. Create a review with two diff comments, one for each line containing non-BMP characters
  3. Publish the review

What is the expected output? What do you see instead?

Expected: The page works as expected.

Actual output: One of the comments does not load at all, and the other comment's html has some missing characters.

What operating system are you using? What browser?

Windows, on Chrome and Firefox.

Please provide any additional information below.

The root cause of this bug is a mismatch in the reported HTML length on the route /r/{review_request_id}/_fragments/diff-comments/{diff_comment_id}/ - that is, with the bugfix for #6217, the written length is a unicode character count, but Javascript indexes strings according to 16-byte units, which means that non-BMP characters have a length of 2. (Sorry for forgetting that in #6217).

My patch for fixing has been to use a jsStringLen function instead of len, with the following implementation:

# I am assuming that data is a unicode object
def jsStringLen(data):
    return (len(data.encode('utf_16')) - 2) / 2
chipx86
#1 chipx86

Moving discussion over to bug #4617, since the bug isn't truly fixed.