Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_xmlrpc fails with non-ascii path #51855

Closed
pitrou opened this issue Dec 30, 2009 · 13 comments
Closed

test_xmlrpc fails with non-ascii path #51855

pitrou opened this issue Dec 30, 2009 · 13 comments
Labels
stdlib Python modules in the Lib dir tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error

Comments

@pitrou
Copy link
Member

pitrou commented Dec 30, 2009

BPO 7606
Nosy @loewis, @pitrou, @vstinner, @bitdancer, @florentx
Files
  • xmlrpc_server_ascii_traceback.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2010-04-16.15:48:46.218>
    created_at = <Date 2009-12-30.21:46:35.787>
    labels = ['tests', 'type-bug', 'library']
    title = 'test_xmlrpc fails with non-ascii path'
    updated_at = <Date 2010-04-17.00:35:53.724>
    user = 'https://github.com/pitrou'

    bugs.python.org fields:

    activity = <Date 2010-04-17.00:35:53.724>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2010-04-16.15:48:46.218>
    closer = 'vstinner'
    components = ['Library (Lib)', 'Tests']
    creation = <Date 2009-12-30.21:46:35.787>
    creator = 'pitrou'
    dependencies = []
    files = ['16063']
    hgrepos = []
    issue_num = 7606
    keywords = ['patch']
    message_count = 13.0
    messages = ['97063', '97064', '97068', '97069', '97071', '97072', '98593', '98594', '103275', '103322', '103323', '103335', '103382']
    nosy_count = 5.0
    nosy_names = ['loewis', 'pitrou', 'vstinner', 'r.david.murray', 'flox']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'needs patch'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue7606'
    versions = ['Python 3.1', 'Python 3.2']

    @pitrou
    Copy link
    Member Author

    pitrou commented Dec 30, 2009

    I configured my buildbot to use a non-ascii path to the interpreter and
    test_xmlrpc fails as follows:

    ----------------------------------------

    Exception happened during processing of request from ('127.0.0.1', 59091)
    Traceback (most recent call last):
      File
    "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/xmlrpc/server.py",
    line 448, in do_POST
        size_remaining = int(self.headers["content-length"])
    ValueError: invalid literal for int() with base 10: 'I am broken'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File
    "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/socketserver.py",
    line 281, in _handle_request_noblock
        self.process_request(request, client_address)
      File
    "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/socketserver.py",
    line 307, in process_request
        self.finish_request(request, client_address)
      File
    "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/socketserver.py",
    line 320, in finish_request
        self.RequestHandlerClass(request, client_address, self)
      File
    "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/socketserver.py",
    line 614, in __init__
        self.handle()
      File
    "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/http/server.py",
    line 352, in handle
        self.handle_one_request()
      File
    "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/http/server.py",
    line 346, in handle_one_request
        method()
      File
    "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/xmlrpc/server.py",
    line 472, in do_POST
        self.send_header("X-traceback", traceback.format_exc())
      File
    "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/http/server.py",
    line 410, in send_header
        self.wfile.write(("%s: %s\r\n" % (keyword, value)).encode('ASCII',
    'strict'))
    UnicodeEncodeError: 'ascii' codec can't encode character '\u20ac' in
    position 93: ordinal not in range(128)
    ----------------------------------------
    
    

    ======================================================================
    FAIL: test_fail_with_info (test.test_xmlrpc.FailingServerTestCase)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File
    "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/test/test_xmlrpc.py",
    line 555, in test_fail_with_info
        p.pow(6,8)
    xmlrpc.client.ProtocolError: <ProtocolError for 127.0.0.1:57828/RPC2:
    500 Internal Server Error>
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File
    "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/test/test_xmlrpc.py",
    line 562, in test_fail_with_info
        self.assertTrue(e.headers.get("X-traceback") is not None)
    AssertionError: False is not True

    @pitrou pitrou added stdlib Python modules in the Lib dir tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error labels Dec 30, 2009
    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Dec 30, 2009

    self.send_header("X-traceback", traceback.format_exc())
    

    That's fairly tricky. send_header expects two strings (bytes are
    not acceptable), and also requires these strings to be ASCII.
    This is why it breaks: format_exc returns a non-ASCII string.

    I see two options:
    a) allow non-Unicode values for keyword and value in send_header,
    and have xmlrpc.server encode the header itself, or
    b) properly MIME-encode value if it contains non-ASCII characters
    (keyword really must be ASCII, I think).

    Not sure whether there is any precedence for UTF-8 in HTTP
    headers.

    @bitdancer
    Copy link
    Member

    A little googling came up with this page:

    http://publib.boulder.ibm.com/infocenter/tivihelp/v2r1/topic/com.ibm.itame.doc/am61_webseal_admin570.htm

    Their solution is to uri encode the UTF8 encoded data.

    However, this article references the RFCs, which look like they call for
    rfc2047 (MIME) encoded words:

    http://stackoverflow.com/questions/324470/http-headers-encoding-decoding-in-java

    @pitrou
    Copy link
    Member Author

    pitrou commented Dec 30, 2009

    If it's only about transmitting the string representation of the
    traceback, perhaps we can simply use "replace" or "ignore" as the error
    handler?

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Dec 30, 2009

    David: I think it's a little bit more complicated. RFC 2616 says that
    the value of a header is *TEXT, which is defined as

    The TEXT rule is only used for descriptive field contents and values
    that are not intended to be interpreted by the message parser. Words
    of *TEXT MAY contain characters from character sets other than
    ISO-8859-1 only when encoded according to the rules of RFC 2047

    So I think send_header should change in the following way:

    a) if isinstance(value, bytes): send value as-is
    b) if value can be encoded in latin-1: encode in latin-1, then send as-is
    c) otherwise: MIME-encode as UTF-8, using the following algorithm

    1. count the number of non-ascii characters, by encoding with
      ascii, ignore, and comparing result lengths
    2. if there are less than 10% non-ascii character, use the Q encoding
    3. otherwise, use the B encoding

    The purpose of the algorithm in c) would be that text containing a few
    non-latin characters still comes out right even if the receiver fails to
    decode the header.

    The same change would also apply to the client-side of sending headers.
    On the receiving side, we should offer an option to decode headers (both
    for client and server); this should be an option because senders may not
    comply with RFC 2616. Reading should then proceed as follows:

    1. check whether there are MIME markers in the text
    2. if so, MIME-decode
    3. if not, decode as latin-1

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Dec 30, 2009

    Antoine: sure, to fix the issue at hand, we can work-around.

    However, the issue of sending non-ASCII headers in HTTP remains, and
    should also be fixed.

    @vstinner
    Copy link
    Member

    bpo-7608 was a duplicate issue. Copy of my message (msg98091):
    -----
    SimpleXMLRPCRequestHandler.do_POST() writes the traceback in the HTTP header "X-traceback". But an HTTP header value is ASCII only, whereas a traceback can contain any character (eg. an non-ASCII character from a directory name for this issue).

    A simple fix would be to use the ASCII charset with the backslashreplace error handler. Attached patch uses:

       trace = str(trace.encode('ASCII', 'backslashreplace'), 'ASCII')

    Is there an easier method to escape non-ASCII characters without double conversion (unicode->bytes and bytes->unicode)?
    -----
    I also copied my patch to this issue.

    @vstinner
    Copy link
    Member

    pitrou> If it's only about transmitting the string representation of the
    pitrou> traceback, perhaps we can simply use "replace" or "ignore" as the error
    pitrou> handler?

    Both replace and ignore loose information. My patch keeps all information by using backslashreplace. It's consistent with Python behaviour: Python writes a backtrace to stderr which uses also the backslashreplace error handler.

    @vstinner
    Copy link
    Member

    What do you think about my solution (convert the traceback to ASCII to avoid the encoding issue)? If you would like to support non-ASCII characters in HTTP headers, you should open a new issue. For the compatibility, I prefer to use pure ASCII headers because I fear that third party programs doesn't support non-ASCII headers.

    @pitrou
    Copy link
    Member Author

    pitrou commented Apr 16, 2010

    What do you think about my solution (convert the traceback to ASCII to
    avoid the encoding issue)?

    It's fine for me. Perhaps you should add a comment to explain why this is necessary.

    @vstinner
    Copy link
    Member

    Commited: r80112 (py3k). Waiting for the buildbots before te backport to 3.1.

    @vstinner
    Copy link
    Member

    Commited: r80112 (py3k)

    Looks good: r80118 (3.1).

    @vstinner
    Copy link
    Member

    If anyone would like to work on non-ASCII HTTP header, please open a new issue with a pointer to this one.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants