classification
Title: test_xmlrpc fails with non-ascii path
Type: behavior Stage: needs patch
Components: Library (Lib), Tests Versions: Python 3.1, Python 3.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: flox, loewis, pitrou, r.david.murray, vstinner
Priority: normal Keywords: patch

Created on 2009-12-30 21:46 by pitrou, last changed 2010-04-17 00:35 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
xmlrpc_server_ascii_traceback.patch vstinner, 2010-01-31 02:31
Messages (13)
msg97063 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-12-30 21:46
I configured my buildbot to use a non-ascii path to the interpreter and
test_xmlrpc fails as follows:

----------------------------------------
Exception happened during processing of request from ('127.0.0.1', 59091)
Traceback (most recent call last):
  File
"/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/xmlrpc/server.py",
line 448, in do_POST
    size_remaining = int(self.headers["content-length"])
ValueError: invalid literal for int() with base 10: 'I am broken'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File
"/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/socketserver.py",
line 281, in _handle_request_noblock
    self.process_request(request, client_address)
  File
"/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/socketserver.py",
line 307, in process_request
    self.finish_request(request, client_address)
  File
"/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/socketserver.py",
line 320, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File
"/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/socketserver.py",
line 614, in __init__
    self.handle()
  File
"/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/http/server.py",
line 352, in handle
    self.handle_one_request()
  File
"/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/http/server.py",
line 346, in handle_one_request
    method()
  File
"/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/xmlrpc/server.py",
line 472, in do_POST
    self.send_header("X-traceback", traceback.format_exc())
  File
"/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/http/server.py",
line 410, in send_header
    self.wfile.write(("%s: %s\r\n" % (keyword, value)).encode('ASCII',
'strict'))
UnicodeEncodeError: 'ascii' codec can't encode character '\u20ac' in
position 93: ordinal not in range(128)
----------------------------------------

======================================================================
FAIL: test_fail_with_info (test.test_xmlrpc.FailingServerTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File
"/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/test/test_xmlrpc.py",
line 555, in test_fail_with_info
    p.pow(6,8)
xmlrpc.client.ProtocolError: <ProtocolError for 127.0.0.1:57828/RPC2:
500 Internal Server Error>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File
"/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/test/test_xmlrpc.py",
line 562, in test_fail_with_info
    self.assertTrue(e.headers.get("X-traceback") is not None)
AssertionError: False is not True

----------------------------------------------------------------------
msg97064 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-12-30 22:03
>     self.send_header("X-traceback", traceback.format_exc())

That's fairly tricky. send_header expects two strings (bytes are
not acceptable), and also requires these strings to be ASCII.
This is why it breaks: format_exc returns a non-ASCII string.

I see two options:
a) allow non-Unicode values for keyword and value in send_header,
   and have xmlrpc.server encode the header itself, or
b) properly MIME-encode value if it contains non-ASCII characters
   (keyword really must be ASCII, I think).

Not sure whether there is any precedence for UTF-8 in HTTP
headers.
msg97068 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009-12-30 23:30
A little googling came up with this page:

http://publib.boulder.ibm.com/infocenter/tivihelp/v2r1/topic/com.ibm.itame.doc/am61_webseal_admin570.htm

Their solution is to uri encode the UTF8 encoded data.

However, this article references the RFCs, which look like they call for
rfc2047 (MIME) encoded words:

http://stackoverflow.com/questions/324470/http-headers-encoding-decoding-in-java
msg97069 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-12-30 23:38
If it's only about transmitting the string representation of the
traceback, perhaps we can simply use "replace" or "ignore" as the error
handler?
msg97071 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-12-30 23:49
David: I think it's a little bit more complicated. RFC 2616 says that
the value of a header is *TEXT, which is defined as

   The TEXT rule is only used for descriptive field contents and values 
   that are not intended to be interpreted by the message parser. Words 
   of *TEXT MAY contain characters from character sets other than 
   ISO-8859-1 only when encoded according to the rules of RFC 2047

So I think send_header should change in the following way:

a) if isinstance(value, bytes): send value as-is
b) if value can be encoded in latin-1: encode in latin-1, then send as-is
c) otherwise: MIME-encode as UTF-8, using the following algorithm
   1. count the number of non-ascii characters, by encoding with
      ascii, ignore, and comparing result lengths
   2. if there are less than 10% non-ascii character, use the Q encoding
   3. otherwise, use the B encoding

The purpose of the algorithm in c) would be that text containing a few
non-latin characters still comes out right even if the receiver fails to
decode the header.

The same change would also apply to the client-side of sending headers.
On the receiving side, we should offer an option to decode headers (both
for client and server); this should be an option because senders may not
comply with RFC 2616. Reading should then proceed as follows:
1. check whether there are MIME markers in the text
2. if so, MIME-decode
3. if not, decode as latin-1
msg97072 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-12-30 23:51
Antoine: sure, to fix the issue at hand, we can work-around.

However, the issue of sending non-ASCII headers in HTTP remains, and
should also be fixed.
msg98593 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-01-31 02:31
#7608 was a duplicate issue. Copy of my message (msg98091):
-----
SimpleXMLRPCRequestHandler.do_POST() writes the traceback in the HTTP header "X-traceback". But an HTTP header value is ASCII only, whereas a traceback can contain any character (eg. an non-ASCII character from a directory name for this issue).

A simple fix would be to use the ASCII charset with the backslashreplace error handler. Attached patch uses:

   trace = str(trace.encode('ASCII', 'backslashreplace'), 'ASCII')

Is there an easier method to escape non-ASCII characters without double conversion (unicode->bytes and bytes->unicode)?
-----
I also copied my patch to this issue.
msg98594 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-01-31 02:39
pitrou> If it's only about transmitting the string representation of the
pitrou> traceback, perhaps we can simply use "replace" or "ignore" as the error
pitrou> handler?

Both replace and ignore loose information. My patch keeps all information by using backslashreplace. It's consistent with Python behaviour: Python writes a backtrace to stderr which uses also the backslashreplace error handler.
msg103275 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-04-15 23:20
What do you think about my solution (convert the traceback to ASCII to avoid the encoding issue)? If you would like to support non-ASCII characters in HTTP headers, you should open a new issue. For the compatibility, I prefer to use pure ASCII headers because I fear that third party programs doesn't support non-ASCII headers.
msg103322 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-04-16 13:27
> What do you think about my solution (convert the traceback to ASCII to
> avoid the encoding issue)?

It's fine for me. Perhaps you should add a comment to explain why this is necessary.
msg103323 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-04-16 13:28
Commited: r80112 (py3k). Waiting for the buildbots before te backport to 3.1.
msg103335 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-04-16 15:48
> Commited: r80112 (py3k)

Looks good: r80118 (3.1).
msg103382 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-04-17 00:35
If anyone would like to work on non-ASCII HTTP header, please open a new issue with a pointer to this one.
History
Date User Action Args
2010-04-17 00:35:53vstinnersetmessages: + msg103382
2010-04-16 15:48:46vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg103335
2010-04-16 13:28:37vstinnersetmessages: + msg103323
2010-04-16 13:27:50pitrousetmessages: + msg103322
2010-04-15 23:20:24vstinnersetmessages: + msg103275
2010-04-13 23:37:47vstinnerlinkissue8242 dependencies
2010-02-27 14:43:50floxsetnosy: + flox
2010-01-31 02:39:27vstinnersetmessages: + msg98594
2010-01-31 02:31:06vstinnersetfiles: + xmlrpc_server_ascii_traceback.patch

nosy: + vstinner
messages: + msg98593

keywords: + patch
2009-12-30 23:51:05loewissetmessages: + msg97072
2009-12-30 23:49:24loewissetmessages: + msg97071
2009-12-30 23:38:03pitrousetmessages: + msg97069
2009-12-30 23:30:32r.david.murraysetnosy: + r.david.murray
messages: + msg97068
2009-12-30 22:03:05loewissetmessages: + msg97064
2009-12-30 21:46:35pitroucreate