3513: Unicode characters in source file prevent review requests from being posted correctly (or at all)

Ray.L.******@gmai***** (Google Code) (Is this you? Claim this profile.)
Aug. 3, 2014
What version are you running?
2.0.2

What's the URL of the page containing the problem?

https://hopper.rose.hp.com/r/new/ (Page behind firewall)

What steps will reproduce the problem?
1. Find a file with at least one non-ASCII character (e.g. single quote ’ instead of ' in a code comment).
2. Check it in into the repository.
3. Try to post a review request for the file in question.

What is the expected output? What do you see instead?

If using the browser, then instead of creating a new review request, it gets stuck with the loading animation.
If using rbt post, then a new review requests is created, but it always fails to upload the diff.

What operating system are you using? What browser?
Confirmed on Ubuntu 14.04 GNU/Linux, Firefox 31.0, Chromium 34.0, Windows 7 with the same browsers and the rbt post command in RHEL 6.5.

Please provide any additional information below.

The issue is related to the *content* of the file and how RB reacts to non-ASCII characters in it. The workaround I found was to:
1. Remove the non-ASCII characters from the files.
2. Check in the files into the repository *without* posting any review requests.
3. Now that the non-ASCII chars have been removed, review requests can be successfully posted.

This was found when trying to post a review request for a file that had been reviewed and checked in 1+ year(s) ago in an older version of RB (*likely* on the 1.6.xx series) without incident.

So it was particularly time-consuming to track down the problem because RB will encounter the issue whenever it tries to read/process the remote content in the repository host. This makes it non-obvious because this happens even if the diff you're trying to upload is 100% ASCII, but as mentioned above, it fails when RB tries to process what's already checked in, which is not immediately clear to user.

While reviewing the rbt post source code, I did not find any explicit statement where the decode('ascii') method is invoked on a string or character --at least not explicitly.

I've attached a simple Python script I wrote while I was trying to reproduce the error message displayed by rbt post, which is how I was able to better confirm what I suspected. It tries to decode('ascii') each character in a file.


Error displayed on Firefox's JavaScript console after pressing 'OK' to create the request with selected diff and base directory:

  SyntaxError: JSON.parse: unexpected character at line 1 column 1 of the JSON data


Debug output and stack trace of rbt post command on RHEL system:

$ rbt post --diff-filename=$HOME/current.diff -d
>>> RBTools 0.6.2
>>> Python 2.6.6 (r266:84292, Nov 21 2013, 10:50:32) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)]
>>> Running on Linux-2.6.32-431.5.1.el6.x86_64-x86_64-with-redhat-6.5-Santiago
>>> Home = /home/ray
>>> Current directory = /home/ray/cti/windjammer/common/lib/Connector
>>> Checking for a Subversion repository...
>>> Running: svn info --non-interactive
>>> Running: diff --version
>>> repository info: Path: https://csvnhou-pro.houston.hp.com:18490/svn/cfe-cti, Base path: /trunk/cti/windjammer/common/lib/Connector, Supports changesets: False
>>> Making HTTP GET request to http://hopper.rose.hp.com/api/
>>> Making HTTP GET request to https://hopper.rose.hp.com/api/review-requests/
>>> Making HTTP POST request to https://hopper.rose.hp.com/api/review-requests/
>>> Making HTTP GET request to https://hopper.rose.hp.com/api/review-requests/8600/diffs/
>>> Making HTTP POST request to https://hopper.rose.hp.com/api/review-requests/8600/diffs/
>>> Got API Error 105 (HTTP code 400): One or more fields had errors
>>> Error data: {u'fields': {u'path': [u"'ascii' codec can't decode byte 0xe2 in position 2154: ordinal not in range(128)"]}, u'stat': u'fail', u'err': {u'msg': u'One or more fields had errors', u'code': 105}}
Traceback (most recent call last):
  File "/usr/bin/rbt", line 9, in <module>
    load_entry_point('RBTools==0.6.2', 'console_scripts', 'rbt')()
  File "/usr/lib/python2.6/site-packages/RBTools-0.6.2-py2.6.egg/rbtools/commands/main.py", line 134, in main
    command.run_from_argv([RB_MAIN, command_name] + args)
  File "/usr/lib/python2.6/site-packages/RBTools-0.6.2-py2.6.egg/rbtools/commands/__init__.py", line 416, in run_from_argv
    exit_code = self.main(*args) or 0
  File "/usr/lib/python2.6/site-packages/RBTools-0.6.2-py2.6.egg/rbtools/commands/post.py", line 784, in main
    submit_as=self.options.submit_as)
  File "/usr/lib/python2.6/site-packages/RBTools-0.6.2-py2.6.egg/rbtools/commands/post.py", line 551, in post_request
    raise CommandError(u'\n'.join(error_msg))
rbtools.commands.CommandError: Error uploading diff


One or more fields had errors (HTTP 400, API Error 105)

    path: 'ascii' codec can't decode byte 0xe2 in position 2154: ordinal not in range(128)

Your review request still exists, but the diff is not attached.

https://hopper.rose.hp.com/r/8600/
#!/usr/bin/python
f = open('<your-filename-here>', 'r')
lines = f.readlines()
i = 0
j = 0
k = 0
for line in lines:
    i += 1
    j  = 0
    for c in line:
        j += 1
        k += 1
        s = "Decoding Line " + str(i) + ", Char " + str(j) + ", File Char " + str(k) + ": " + c
        char = c.decode('ascii')
        s += " -> " + char
        print s
david
#1 david
I'm pretty sure these are fixed in 2.0.5. Can you try that version?
  • +NeedInfo
#2 Ray.L.******@gmai***** (Google Code) (Is this you? Claim this profile.)
I'm not sure I'm able to. I'm not the sys admin for the server and persuading the admin to update the one and only live production environment is unlikely.

Perhaps you could try it with the steps I provided above?
david
#3 david
Can you confirm which version control system you're using? That will help me identify whether fixes subsequent to 2.0.2 include this one or not.
#4 Ray.L.******@gmai***** (Google Code) (Is this you? Claim this profile.)
The VCS is Subversion.
david
#5 david
There are several fixes for SVN and unicode in 2.0.3 and 2.0.4.
  • -NeedInfo
    +Fixed
#6 Ray.L.******@gmai***** (Google Code) (Is this you? Claim this profile.)
Just to verify, but was this actually confirmed as fixed, or was this simply assumed to be fixed on the basis that there are "several fixes for SVN and unicode"?

If this was tested and verified as fixed, then I'll pass along that info. Otherwise, I think this issue should be re-opened and tested first.
david
#7 david
There are a hundred moving parts in this particular code, and as part of the aforementioned fixes, we've done a lot of testing. It's theoretically possible that you're encountering something that we haven't seen, in which case, our testing probably wouldn't help.

One thing that would help is some debugging on your end to get the traceback from the server side (there's a UnicodeDecodeError occurring, which then gets translated into an error code on the API).
#8 Ray.L.******@gmai***** (Google Code) (Is this you? Claim this profile.)
I've included the stack trace from the server's log file. Please see it below:

--------------------------------
2014-07-29 23:26:41,524 - ERROR -  - Exception thrown for user <email-removed> at https://hopper.rose.hp.com/api/validation/diffs/

'ascii' codec can't decode byte 0xe2 in position 2154: ordinal not in range(128)
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/Django-1.6.5-py2.7.egg/django/core/handlers/base.py", line 112, in get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/Django-1.6.5-py2.7.egg/django/views/decorators/cache.py", line 52, in _wrapped_view_func
    response = view_func(request, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/Django-1.6.5-py2.7.egg/django/views/decorators/vary.py", line 19, in inner_func
    response = func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/Djblets-0.8.5-py2.7.egg/djblets/webapi/resources.py", line 493, in __call__
    result = view(request, api_format=api_format, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/Djblets-0.8.5-py2.7.egg/djblets/webapi/resources.py", line 726, in post
    return self.create(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/Djblets-0.8.5-py2.7.egg/djblets/webapi/decorators.py", line 117, in _call
    return view_func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/ReviewBoard-2.0.2-py2.7.egg/reviewboard/webapi/decorators.py", line 110, in _check
    return view_func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/Djblets-0.8.5-py2.7.egg/djblets/webapi/decorators.py", line 117, in _call
    return view_func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/Djblets-0.8.5-py2.7.egg/djblets/webapi/decorators.py", line 138, in _checklogin
    return view_func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/Djblets-0.8.5-py2.7.egg/djblets/webapi/decorators.py", line 117, in _call
    return view_func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/Djblets-0.8.5-py2.7.egg/djblets/webapi/decorators.py", line 117, in _call
    return view_func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/Djblets-0.8.5-py2.7.egg/djblets/webapi/decorators.py", line 287, in _validate
    return view_func(*args, **new_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/ReviewBoard-2.0.2-py2.7.egg/reviewboard/webapi/resources/validate_diff.py", line 135, in create
    save=False)
  File "/usr/local/lib/python2.7/dist-packages/ReviewBoard-2.0.2-py2.7.egg/reviewboard/diffviewer/managers.py", line 156, in create_from_upload
    save=save)
  File "/usr/local/lib/python2.7/dist-packages/ReviewBoard-2.0.2-py2.7.egg/reviewboard/diffviewer/managers.py", line 182, in create_from_data
    check_existence=(not parent_diff_file_contents)))
  File "/usr/local/lib/python2.7/dist-packages/ReviewBoard-2.0.2-py2.7.egg/reviewboard/diffviewer/managers.py", line 300, in _process_files
    request=request))):
  File "/usr/local/lib/python2.7/dist-packages/ReviewBoard-2.0.2-py2.7.egg/reviewboard/scmtools/models.py", line 239, in get_file_exists
    base_commit_id, request)
  File "/usr/local/lib/python2.7/dist-packages/ReviewBoard-2.0.2-py2.7.egg/reviewboard/scmtools/models.py", line 434, in _get_file_exists_uncached
    exists = self.get_scmtool().file_exists(path, revision)
  File "/usr/local/lib/python2.7/dist-packages/ReviewBoard-2.0.2-py2.7.egg/reviewboard/scmtools/core.py", line 156, in file_exists
    self.get_file(path, revision)
  File "/usr/local/lib/python2.7/dist-packages/ReviewBoard-2.0.2-py2.7.egg/reviewboard/scmtools/svn/__init__.py", line 117, in get_file
    return self.client.get_file(path, revision)
  File "/usr/local/lib/python2.7/dist-packages/ReviewBoard-2.0.2-py2.7.egg/reviewboard/scmtools/svn/pysvn.py", line 106, in get_file
    return self._do_on_path(self._get_file_data, path, revision)
  File "/usr/local/lib/python2.7/dist-packages/ReviewBoard-2.0.2-py2.7.egg/reviewboard/scmtools/svn/pysvn.py", line 72, in _do_on_path
    return cb(normpath, normrev)
  File "/usr/local/lib/python2.7/dist-packages/ReviewBoard-2.0.2-py2.7.egg/reviewboard/scmtools/svn/pysvn.py", line 100, in _get_file_data
    data = self.collapse_keywords(data, keywords[normpath])
  File "/usr/local/lib/python2.7/dist-packages/ReviewBoard-2.0.2-py2.7.egg/reviewboard/scmtools/svn/base.py", line 118, in collapse_keywords
    repl, data)
  File "/usr/lib/python2.7/re.py", line 151, in sub
    return _compile(pattern, flags).sub(repl, string, count)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 2154: ordinal not in range(128)
#9 Ray.L.******@gmai***** (Google Code) (Is this you? Claim this profile.)
Please let me know if this is not what you meant. As for the API Error, it's on the original post:


    One or more fields had errors (HTTP 400, API Error 105)

        path: 'ascii' codec can't decode byte 0xe2 in position 2154: ordinal not in range(128)

    Your review request still exists, but the diff is not attached.
david
#10 david
That's exactly what I was looking for.

This is the same as bug 3425, which was fixed in 2.0.3.
#11 Ray.L.******@gmai***** (Google Code) (Is this you? Claim this profile.)
Thanks for the follow up :) I'll pass this info along.