Issue7606
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2009-12-30 21:46 by pitrou, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
xmlrpc_server_ascii_traceback.patch | vstinner, 2010-01-31 02:31 |
Messages (13) | |||
---|---|---|---|
msg97063 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2009-12-30 21:46 | |
I configured my buildbot to use a non-ascii path to the interpreter and test_xmlrpc fails as follows: ---------------------------------------- Exception happened during processing of request from ('127.0.0.1', 59091) Traceback (most recent call last): File "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/xmlrpc/server.py", line 448, in do_POST size_remaining = int(self.headers["content-length"]) ValueError: invalid literal for int() with base 10: 'I am broken' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/socketserver.py", line 281, in _handle_request_noblock self.process_request(request, client_address) File "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/socketserver.py", line 307, in process_request self.finish_request(request, client_address) File "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/socketserver.py", line 320, in finish_request self.RequestHandlerClass(request, client_address, self) File "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/socketserver.py", line 614, in __init__ self.handle() File "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/http/server.py", line 352, in handle self.handle_one_request() File "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/http/server.py", line 346, in handle_one_request method() File "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/xmlrpc/server.py", line 472, in do_POST self.send_header("X-traceback", traceback.format_exc()) File "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/http/server.py", line 410, in send_header self.wfile.write(("%s: %s\r\n" % (keyword, value)).encode('ASCII', 'strict')) UnicodeEncodeError: 'ascii' codec can't encode character '\u20ac' in position 93: ordinal not in range(128) ---------------------------------------- ====================================================================== FAIL: test_fail_with_info (test.test_xmlrpc.FailingServerTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/test/test_xmlrpc.py", line 555, in test_fail_with_info p.pow(6,8) xmlrpc.client.ProtocolError: <ProtocolError for 127.0.0.1:57828/RPC2: 500 Internal Server Error> During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/test/test_xmlrpc.py", line 562, in test_fail_with_info self.assertTrue(e.headers.get("X-traceback") is not None) AssertionError: False is not True ---------------------------------------------------------------------- |
|||
msg97064 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2009-12-30 22:03 | |
> self.send_header("X-traceback", traceback.format_exc()) That's fairly tricky. send_header expects two strings (bytes are not acceptable), and also requires these strings to be ASCII. This is why it breaks: format_exc returns a non-ASCII string. I see two options: a) allow non-Unicode values for keyword and value in send_header, and have xmlrpc.server encode the header itself, or b) properly MIME-encode value if it contains non-ASCII characters (keyword really must be ASCII, I think). Not sure whether there is any precedence for UTF-8 in HTTP headers. |
|||
msg97068 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2009-12-30 23:30 | |
A little googling came up with this page: http://publib.boulder.ibm.com/infocenter/tivihelp/v2r1/topic/com.ibm.itame.doc/am61_webseal_admin570.htm Their solution is to uri encode the UTF8 encoded data. However, this article references the RFCs, which look like they call for rfc2047 (MIME) encoded words: http://stackoverflow.com/questions/324470/http-headers-encoding-decoding-in-java |
|||
msg97069 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2009-12-30 23:38 | |
If it's only about transmitting the string representation of the traceback, perhaps we can simply use "replace" or "ignore" as the error handler? |
|||
msg97071 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2009-12-30 23:49 | |
David: I think it's a little bit more complicated. RFC 2616 says that the value of a header is *TEXT, which is defined as The TEXT rule is only used for descriptive field contents and values that are not intended to be interpreted by the message parser. Words of *TEXT MAY contain characters from character sets other than ISO-8859-1 only when encoded according to the rules of RFC 2047 So I think send_header should change in the following way: a) if isinstance(value, bytes): send value as-is b) if value can be encoded in latin-1: encode in latin-1, then send as-is c) otherwise: MIME-encode as UTF-8, using the following algorithm 1. count the number of non-ascii characters, by encoding with ascii, ignore, and comparing result lengths 2. if there are less than 10% non-ascii character, use the Q encoding 3. otherwise, use the B encoding The purpose of the algorithm in c) would be that text containing a few non-latin characters still comes out right even if the receiver fails to decode the header. The same change would also apply to the client-side of sending headers. On the receiving side, we should offer an option to decode headers (both for client and server); this should be an option because senders may not comply with RFC 2616. Reading should then proceed as follows: 1. check whether there are MIME markers in the text 2. if so, MIME-decode 3. if not, decode as latin-1 |
|||
msg97072 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2009-12-30 23:51 | |
Antoine: sure, to fix the issue at hand, we can work-around. However, the issue of sending non-ASCII headers in HTTP remains, and should also be fixed. |
|||
msg98593 - (view) | Author: STINNER Victor (vstinner) * | Date: 2010-01-31 02:31 | |
#7608 was a duplicate issue. Copy of my message (msg98091): ----- SimpleXMLRPCRequestHandler.do_POST() writes the traceback in the HTTP header "X-traceback". But an HTTP header value is ASCII only, whereas a traceback can contain any character (eg. an non-ASCII character from a directory name for this issue). A simple fix would be to use the ASCII charset with the backslashreplace error handler. Attached patch uses: trace = str(trace.encode('ASCII', 'backslashreplace'), 'ASCII') Is there an easier method to escape non-ASCII characters without double conversion (unicode->bytes and bytes->unicode)? ----- I also copied my patch to this issue. |
|||
msg98594 - (view) | Author: STINNER Victor (vstinner) * | Date: 2010-01-31 02:39 | |
pitrou> If it's only about transmitting the string representation of the pitrou> traceback, perhaps we can simply use "replace" or "ignore" as the error pitrou> handler? Both replace and ignore loose information. My patch keeps all information by using backslashreplace. It's consistent with Python behaviour: Python writes a backtrace to stderr which uses also the backslashreplace error handler. |
|||
msg103275 - (view) | Author: STINNER Victor (vstinner) * | Date: 2010-04-15 23:20 | |
What do you think about my solution (convert the traceback to ASCII to avoid the encoding issue)? If you would like to support non-ASCII characters in HTTP headers, you should open a new issue. For the compatibility, I prefer to use pure ASCII headers because I fear that third party programs doesn't support non-ASCII headers. |
|||
msg103322 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2010-04-16 13:27 | |
> What do you think about my solution (convert the traceback to ASCII to > avoid the encoding issue)? It's fine for me. Perhaps you should add a comment to explain why this is necessary. |
|||
msg103323 - (view) | Author: STINNER Victor (vstinner) * | Date: 2010-04-16 13:28 | |
Commited: r80112 (py3k). Waiting for the buildbots before te backport to 3.1. |
|||
msg103335 - (view) | Author: STINNER Victor (vstinner) * | Date: 2010-04-16 15:48 | |
> Commited: r80112 (py3k) Looks good: r80118 (3.1). |
|||
msg103382 - (view) | Author: STINNER Victor (vstinner) * | Date: 2010-04-17 00:35 | |
If anyone would like to work on non-ASCII HTTP header, please open a new issue with a pointer to this one. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:55 | admin | set | github: 51855 |
2010-04-17 00:35:53 | vstinner | set | messages: + msg103382 |
2010-04-16 15:48:46 | vstinner | set | status: open -> closed resolution: fixed messages: + msg103335 |
2010-04-16 13:28:37 | vstinner | set | messages: + msg103323 |
2010-04-16 13:27:50 | pitrou | set | messages: + msg103322 |
2010-04-15 23:20:24 | vstinner | set | messages: + msg103275 |
2010-04-13 23:37:47 | vstinner | link | issue8242 dependencies |
2010-02-27 14:43:50 | flox | set | nosy:
+ flox |
2010-01-31 02:39:27 | vstinner | set | messages: + msg98594 |
2010-01-31 02:31:06 | vstinner | set | files:
+ xmlrpc_server_ascii_traceback.patch nosy: + vstinner messages: + msg98593 keywords: + patch |
2009-12-30 23:51:05 | loewis | set | messages: + msg97072 |
2009-12-30 23:49:24 | loewis | set | messages: + msg97071 |
2009-12-30 23:38:03 | pitrou | set | messages: + msg97069 |
2009-12-30 23:30:32 | r.david.murray | set | nosy:
+ r.david.murray messages: + msg97068 |
2009-12-30 22:03:05 | loewis | set | messages: + msg97064 |
2009-12-30 21:46:35 | pitrou | create |