2047: UnicodeDecodeError: 'utf8' codec can't decode byte with non-utf8 characters in perforce description
- Fixed
- Review Board
m.mil*****@gmai***** (Google Code) (Is this you? Claim this profile.) | |
Aug. 23, 2013 | |
2200 |
What version are you running? 1.5.4.1 What's the URL of the page containing the problem? https://<server>/api/json/reviewrequests/217417/update_from_changenum/: What steps will reproduce the problem? 1. On Windows or OSX add a typographical double quote to the Perforce description of the CLN you are attempting to post ( e.g. on windows Alt + 0147 ). 2. post the review using post-review <CLN> What is the expected output? What do you see instead? Expected to post a review, but get error message User Message <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title>500 - Internal Server Error | Review Board</title> </head> <body> <h1>Something broke! (Error 500)</h1> <p> It appears something broke when you tried to go to here. This is either a bug in Review Board or a server configuration error. Please report this to your administrator. </p> </body> </title> Server Message: .... File "/build/toolchain/lin32/python-2.6.1/lib/python2.6/json/encoder.py", line 294, in _iterencode yield encoder(o) UnicodeDecodeError: 'utf8' codec can't decode byte 0x93 in position 0: unexpected code byte What operating system are you using? What browser? Centos 5.5 x86_64 Please provide any additional information below. From what I can tell by default Windows and OSx will encode these type of characters as ASCII using one byte. The Django framework will attempt to decode these using utf-8 but fails because not encoded as expected. This primarily appears to have when users have these type of characters within the change description on Perforce. It appears potentially that scmtool/perforce.py could be patched to replace these characters within function parse_change_desc ( e.g. <str>.decode('utf-8', 'replace'). Haven't given it much thought yet in case there are any side-effects to this but at least the review appears to be successfully posted.
This is also causing issues for me. Users copy paste descriptions from word documents, and the double quotes in word creates this issue with post-review.
Not sure if it's the exact same code path or not, but... Traceback (most recent call last): File "/usr/lib/pymodules/python2.6/django/core/handlers/base.py", line 178, in get_response response = middleware_method(request, response) File "/usr/lib/pymodules/python2.6/django/middleware/http.py", line 15, in process_response response['Content-Length'] = str(len(response.content)) File "/usr/lib/pymodules/python2.6/djblets/webapi/core.py", line 276, in _get_content content = adapter.encode(self.api_data, request=self.request) File "/usr/lib/pymodules/python2.6/djblets/webapi/core.py", line 88, in encode return super(JSONEncoderAdapter, self).encode(o) File "/usr/lib/pymodules/python2.6/simplejson/encoder.py", line 214, in encode chunks = self.iterencode(o, _one_shot=True) File "/usr/lib/pymodules/python2.6/simplejson/encoder.py", line 282, in iterencode return _iterencode(o, 0) File "/usr/lib/pymodules/python2.6/djblets/webapi/core.py", line 96, in default result = self.encoder.encode(o, *self.encode_args, **self.encode_kwargs) File "/usr/lib/pymodules/python2.6/djblets/webapi/core.py", line 257, in encode result = encoder.encode(*args, **kwargs) File "/usr/lib/pymodules/python2.6/djblets/webapi/encoders.py", line 48, in encode return resource.serialize_object(o, *args, **kwargs) File "/usr/lib/pymodules/python2.6/djblets/webapi/resources.py", line 722, in serialize_object 'title': unicode(value), UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 22: ordinal not in range(128)
I will reproduce it and send you stack today. You can try to reproduce it locally by using the back tick ` character inside the review description during post-review. This breaks the json library and produces error 500.
-
+ UnicodeDecodeError: 'utf8' codec can't decode byte with non-utf8 characters in perforce description
As I said in error 2200 it is not only the backtick, but any characters not found in the current code page of the server seem to have the problem.