classification
Title: readline-related test_builtin failure
Type: behavior Stage: patch review
Components: Extension Modules Versions: Python 3.3, Python 3.2
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: nadeem.vawda
Priority: normal Keywords: patch

Created on 2012-01-27 10:23 by nadeem.vawda, last changed 2012-01-27 10:34 by nadeem.vawda.

Files
File name Uploaded Description Edit
rl-locale.diff nadeem.vawda, 2012-01-27 10:23 Fix readline to not discard chars that can't be decoded with sys.stdin.encoding review
rl-test.diff nadeem.vawda, 2012-01-27 10:34 Ensure that input() tty tests always use readline review
Messages (2)
msg152080 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2012-01-27 10:23
I've recently come across a strange failure in the tests for the input()
built-in function:

    $ ./python -E -m test -v test_readline test_builtin

    [... snip ...]

    ======================================================================
    FAIL: test_input_tty_non_ascii (test.test_builtin.BuiltinTest)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/home/nadeem/src/cpython/def/Lib/test/test_builtin.py", line 1079, in test_input_tty_non_ascii
        self.check_input_tty("prompté", b"quux\xe9", "utf-8")
      File "/home/nadeem/src/cpython/def/Lib/test/test_builtin.py", line 1070, in check_input_tty
        self.assertEqual(input_result, expected)
    AssertionError: 'quux' != 'quux\udce9'
    - quux
    + quux\udce9
    ?     +


    ======================================================================
    FAIL: test_input_tty_non_ascii_unicode_errors (test.test_builtin.BuiltinTest)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/home/nadeem/src/cpython/def/Lib/test/test_builtin.py", line 1083, in test_input_tty_non_ascii_unicode_errors
        self.check_input_tty("prompté", b"quux\xe9", "ascii")
      File "/home/nadeem/src/cpython/def/Lib/test/test_builtin.py", line 1070, in check_input_tty
        self.assertEqual(input_result, expected)
    AssertionError: 'quux' != 'quux\udce9'
    - quux
    + quux\udce9
    ?     +

The failure only manifests itself if the readline module is loaded before
test_builtin runs (hence the presence of test_readline above). It will
not occur if regrtest is run with either of the -j or -W flags (which is
why it hasn't been seen on the buildbots).

The problem seems to be that readline assumes that its input should use
the locale encoding, and silently strips out any undecodable chars. This
breaks the tests mentioned above, since they set up sys.stdin to use the
surrogateescape error handler, expecting invalid characters to be escaped
rather than discarded.

This problem doesn't crop up if readline is *not* loaded, because in that
case PyOS_Readline() falls back to a stdio-based implementation
(PyOS_StdioReadline()) that preserves invalid characters, allowing them
to be handled properly by sys.stdin's encoding and error handler.

I have been able to fix the test failures with the attached patch, which
stops readline from eating invalid characters, making it consistent with
the stdio-based fallback. Can someone with more knowledge of readline
and/or locale issues advise whether the change is a good idea?
msg152082 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2012-01-27 10:34
Here's another patch that ensures the test always exercises the GNU
readline code path (rather than the stdio fallback). This will cause the
failure to occur when running just test_builtin (no need to also run
test_readline before it).

Ideally we'd want to test both code paths, but I'm not sure how to
accomplish that reliably, short of running the test in a subprocess.
History
Date User Action Args
2012-01-27 10:34:45nadeem.vawdasetfiles: + rl-test.diff

messages: + msg152082
2012-01-27 10:23:39nadeem.vawdacreate