Author nadeem.vawda
Recipients nadeem.vawda
Date 2012-01-27.10:23:37
SpamBayes Score 0.0
Marked as misclassified No
Message-id <1327659819.71.0.475160045664.issue13886@psf.upfronthosting.co.za>
In-reply-to
Content
I've recently come across a strange failure in the tests for the input()
built-in function:

    $ ./python -E -m test -v test_readline test_builtin

    [... snip ...]

    ======================================================================
    FAIL: test_input_tty_non_ascii (test.test_builtin.BuiltinTest)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/home/nadeem/src/cpython/def/Lib/test/test_builtin.py", line 1079, in test_input_tty_non_ascii
        self.check_input_tty("prompté", b"quux\xe9", "utf-8")
      File "/home/nadeem/src/cpython/def/Lib/test/test_builtin.py", line 1070, in check_input_tty
        self.assertEqual(input_result, expected)
    AssertionError: 'quux' != 'quux\udce9'
    - quux
    + quux\udce9
    ?     +


    ======================================================================
    FAIL: test_input_tty_non_ascii_unicode_errors (test.test_builtin.BuiltinTest)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/home/nadeem/src/cpython/def/Lib/test/test_builtin.py", line 1083, in test_input_tty_non_ascii_unicode_errors
        self.check_input_tty("prompté", b"quux\xe9", "ascii")
      File "/home/nadeem/src/cpython/def/Lib/test/test_builtin.py", line 1070, in check_input_tty
        self.assertEqual(input_result, expected)
    AssertionError: 'quux' != 'quux\udce9'
    - quux
    + quux\udce9
    ?     +

The failure only manifests itself if the readline module is loaded before
test_builtin runs (hence the presence of test_readline above). It will
not occur if regrtest is run with either of the -j or -W flags (which is
why it hasn't been seen on the buildbots).

The problem seems to be that readline assumes that its input should use
the locale encoding, and silently strips out any undecodable chars. This
breaks the tests mentioned above, since they set up sys.stdin to use the
surrogateescape error handler, expecting invalid characters to be escaped
rather than discarded.

This problem doesn't crop up if readline is *not* loaded, because in that
case PyOS_Readline() falls back to a stdio-based implementation
(PyOS_StdioReadline()) that preserves invalid characters, allowing them
to be handled properly by sys.stdin's encoding and error handler.

I have been able to fix the test failures with the attached patch, which
stops readline from eating invalid characters, making it consistent with
the stdio-based fallback. Can someone with more knowledge of readline
and/or locale issues advise whether the change is a good idea?
History
Date User Action Args
2012-01-27 10:23:39nadeem.vawdasetrecipients: + nadeem.vawda
2012-01-27 10:23:39nadeem.vawdasetmessageid: <1327659819.71.0.475160045664.issue13886@psf.upfronthosting.co.za>
2012-01-27 10:23:39nadeem.vawdalinkissue13886 messages
2012-01-27 10:23:37nadeem.vawdacreate