2047: UnicodeDecodeError: 'utf8' codec can't decode byte with non-utf8 characters in perforce description

m.mil*****@gmai***** (Google Code) (Is this you? Claim this profile.)
Aug. 23, 2013
2200
What version are you running?
1.5.4.1

What's the URL of the page containing the problem?
https://<server>/api/json/reviewrequests/217417/update_from_changenum/:


What steps will reproduce the problem?
1. On Windows or OSX add a typographical double quote to the Perforce description of the CLN you are attempting to post ( e.g. on windows Alt + 0147 ).

2. post the review using post-review <CLN>

What is the expected output? What do you see instead?

Expected to post a review, but get error message

User Message

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
 <head>
  <title>500 - Internal Server Error | Review Board</title>
 </head>
 <body>
  <h1>Something broke! (Error 500)</h1>
  <p>
   It appears something broke when you tried to go to here. This is either
   a bug in Review Board or a server configuration error. Please report
   this to your administrator.
  </p>
 </body>
</title>

Server Message:

....
  File "/build/toolchain/lin32/python-2.6.1/lib/python2.6/json/encoder.py", line 294, in _iterencode
    yield encoder(o)

UnicodeDecodeError: 'utf8' codec can't decode byte 0x93 in position 0: unexpected code byte


What operating system are you using? What browser?

Centos 5.5 x86_64

Please provide any additional information below.

From what I can tell by default Windows and OSx will encode these type of characters as ASCII using one byte.   The Django framework will attempt to decode these using utf-8 but fails because not encoded as expected.

This primarily appears to have when users have these type of characters within the change description on Perforce.

It appears potentially that scmtool/perforce.py could be patched to replace these characters within function parse_change_desc ( e.g. <str>.decode('utf-8', 'replace').  Haven't given it much thought yet in case there are any side-effects to this but at least the review appears to be successfully posted.
#1 guney*****@gmai***** (Google Code) (Is this you? Claim this profile.)
This is also causing issues for me. Users copy paste descriptions from word documents, and the double quotes in word creates this issue with post-review.
david
#2 david
I know it's been a while, but does anyone have a full traceback for this?
  • +NeedInfo
#3 amber*****@gmai***** (Google Code) (Is this you? Claim this profile.)
Not sure if it's the exact same code path or not, but...

Traceback (most recent call last):

 File "/usr/lib/pymodules/python2.6/django/core/handlers/base.py", line 178, in get_response
   response = middleware_method(request, response)

 File "/usr/lib/pymodules/python2.6/django/middleware/http.py", line 15, in process_response
   response['Content-Length'] = str(len(response.content))

 File "/usr/lib/pymodules/python2.6/djblets/webapi/core.py", line 276, in _get_content
   content = adapter.encode(self.api_data, request=self.request)

 File "/usr/lib/pymodules/python2.6/djblets/webapi/core.py", line 88, in encode
   return super(JSONEncoderAdapter, self).encode(o)

 File "/usr/lib/pymodules/python2.6/simplejson/encoder.py", line 214, in encode
   chunks = self.iterencode(o, _one_shot=True)

 File "/usr/lib/pymodules/python2.6/simplejson/encoder.py", line 282, in iterencode
   return _iterencode(o, 0)

 File "/usr/lib/pymodules/python2.6/djblets/webapi/core.py", line 96, in default
   result = self.encoder.encode(o, *self.encode_args, **self.encode_kwargs)

 File "/usr/lib/pymodules/python2.6/djblets/webapi/core.py", line 257, in encode
   result = encoder.encode(*args, **kwargs)

 File "/usr/lib/pymodules/python2.6/djblets/webapi/encoders.py", line 48, in encode
   return resource.serialize_object(o, *args, **kwargs)

 File "/usr/lib/pymodules/python2.6/djblets/webapi/resources.py", line 722, in serialize_object
   'title': unicode(value),

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 22: ordinal not in range(128)
#4 guney*****@gmai***** (Google Code) (Is this you? Claim this profile.)
I will reproduce it and send you stack today.

You can try to reproduce it locally by using the back tick `
character inside the review description during post-review. This
breaks the json library and produces error 500.
david
#5 david
  • -NeedInfo
    +New
david
#6 david
  • +UnicodeDecodeError: 'utf8' codec can't decode byte with non-utf8 characters in perforce description
#8 Faller******@gmai***** (Google Code) (Is this you? Claim this profile.)
As I said in error 2200 it is not only the backtick, but any characters not found in the current code page of the server seem to have the problem.
david
#9 david
  • +Component-API
    +Component-SCMTools
david
#10 david
WebAPI encoding has changed quite a bit. I haven't seen this recently with 1.7.x versions, except in the case of database tables with incorrect encodings.
  • -New
    +Fixed