classification
Title: test_readline.test_nonascii fails on Android
Type: behavior Stage: patch review
Components: Tests Versions: Python 3.7, Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: martin.panter, xdegaye
Priority: normal Keywords: patch

Created on 2016-12-17 09:02 by xdegaye, last changed 2017-01-18 15:24 by xdegaye.

Files
File name Uploaded Description Edit
tracing_the_completer.patch xdegaye, 2016-12-18 15:27 review
tracing_the_completer_2.patch xdegaye, 2016-12-18 17:54 review
readline_multibyte.patch xdegaye, 2016-12-20 19:36 review
readline_multibyte.v2.patch martin.panter, 2016-12-23 22:51 review
readline.patch xdegaye, 2017-01-10 10:15 review
Messages (13)
msg283476 - (view) Author: Xavier de Gaye (xdegaye) * (Python committer) Date: 2016-12-17 09:02
test_nonascii has been implemented in issue 16182.

The error message:
======================================================================
FAIL: test_nonascii (test.test_readline.TestReadline)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/sdcard/org.bitbucket.pyona/lib/python3.7/test/test_readline.py", line 203, in test_nonascii
    self.assertIn(b"text 't\\xeb'\r\n", output)
AssertionError: b"text 't\\xeb'\r\n" not found in bytearray(b"^A^B^B^B^B^B^B^B\t\tx\t\r\n[\\303\\257nserted]|t\x07\x08\x08\x08\x08\x08\x08\x08\x07\x07xrted]|t\x08\x08\x08\x08\x08\x08\x08\x07\r\nresult \'[\\xefnsexrted]|t\'\r\nhistory \'[\\xefnsexrted]|t\'\r\n")

----------------------------------------------------------------------
msg283544 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-12-18 07:24
This is a breakdown of running the test script on my Linux setup (UTF-8 locale):

^A^B^B^B^B^B^B^B\t\tx\t\r\n
    Input echoed back (before Readline disables echo)
[\xc3\xafnserted]
    Inserted by pre_input_hook()
|t\xc3\xab[after]
    Inserted by the Ctrl+A macro
\x08\x08\x08\x08\x08\x08\x08
    Response to moving the cursor left of [after]
text \'t\\xeb\'\r\n
    Encoded completer(text=...) parameter for testing
line \'[\\xefnserted]|t\\xeb[after]\'\r\n
    get_line_buffer() result for testing
indexes 11 13\r\n
    get_begidx(), get_endidx()
\x07
    Rings terminal bell due to multiple possible completions
text . . . line . . . indexes . . .\r\n
    From second round of completer() calls due to second Tab input
substitution \'t\\xeb\'\r\n
matches [\'t\\xebnt\', \'t\\xebxt\']\r\n
    display() parameters
x[after]\x08\x08\x08\x08\x08\x08\x08
    Response to inserting “x”
t[after]\x08\x08\x08\x08\x08\x08\x08
    Completion of "t\xEBxt"
\r\n
    Response to inputting Return
result \'[\\xefnserted]|t\\xebxt[after]\'\r\n
    input() function call return value
history \'[\\xefnserted]|t\\xebxt[after]\'\r\n
    Line as retrieved from history

The problem is that the Ctrl+A macro seems to have only inserted the two ASCII characters "|t" and has stopped at the non-ASCII character "\xEB". Everthing else relies on the macro working properly to get the right cursor positioning and completion text.

Xavier: If you run the following code in an interactive Python session, what does pressing $ print out?

import readline
readline.parse_and_bind('Control-a: "|t\xEB[after]"')
readline.parse_and_bind('"$": dump-macros')

For me, it prints this: (includes “te” with an umlaut)

\C-a outputs |të[after]

What locale encoding does Python use for you? The parse_and_bind() call should be encoding "\xEB" with PyUnicode_EncodeLocale(), and passing the resulting string to Readline. Then Readline is supposed to insert the string when we invoke the macro. Somewhere the string is getting truncated.
msg283555 - (view) Author: Xavier de Gaye (xdegaye) * (Python committer) Date: 2016-12-18 15:27
> For me, it prints this: (includes “te” with an umlaut)
I got this same result when testing on Android with the interactive interpreter.

See the content of the 'completer' list below, when run with tracing_the_completer.patch:
On linux:
test_nonascii (test.test_readline.TestReadline) ... bytearray(b"^A^B^B^B^B^B^B^B\t\tx\t\r\n[\xc3\xafnserted]|t\xc3\xab[after]\x08\x08\x08\x08\x08\x08\x08text \'t\\xeb\'\r\nline \'[\\xefnserted]|t\\xeb[after]\'\r\nindexes 11 13\r\ntext \'t\\xeb\'\r\nline \'[\\xefnserted]|t\\xeb[after]\'\r\nindexes 1113\r\nsubstitution \'t\\xeb\'\r\nmatches [\'t\\xebnt\', \'t\\xebxt\']\r\nx[after]\x08\x08\x08\x08\x08\x08\x08t[after]\x08\x08\x08\x08\x08\x08\x08\r\nresult \'[\\xefnserted]|t\\xebxt[after]\'\r\nhistory \'[\\xefnserted]|t\\xebxt[after]\'\r\ncompleter [\'t\xc3\xab-0\', \'GOT 1\', \'t\xc3\xab-1\', \'GOT 2\', \'t\xc3\xab-2\', \'t\xc3\xab-0\', \'GOT 1\', \'t\xc3\xab-1\', \'GOT 2\', \'t\xc3\xab-2\', \'t\xc3\xabx-0\', \'GOT 3\', \'t\xc3\xabx-1\']\r\n")

On Android:
AssertionError: b"text 't\\xeb'\r\n" not found in bytearray(b"^A^B^B^B^B^B^B^B\t\tx\t\r\n[\\303\\257nserted]|t\x07\x08\x08\x08\x08\x08\x08\x08\x07\x07xrted]|t\x08\x08\x08\x08\x08\x08\x08\x07\r\nresult \'[\\xefnsexrted]|t\'\r\nhistory \'[\\xefnsexrted]|t\'\r\ncompleter [\'\xc3\xafnse-0\', \'\xc3\xafnse-0\', \'\xc3\xafnsex-0\']\r\n")

b'\xc3\xab' is the UTF-8 code for '\xEB'
b'\xc3\xaf' is the UTF-8 code for '\xEF'
It seems that the completer is triggered by the wrong key.
msg283563 - (view) Author: Xavier de Gaye (xdegaye) * (Python committer) Date: 2016-12-18 17:38
FWIW on Android we have:

>>> readline.__doc__ and "libedit" in readline.__doc__
False
>>> getattr(readline, "set_pre_input_hook", None)
<built-in function set_pre_input_hook>
msg283564 - (view) Author: Xavier de Gaye (xdegaye) * (Python committer) Date: 2016-12-18 17:54
Checking that the the problem is indeed an encoding problem on Android with readline.set_pre_input_hook(): with tracing_the_completer_2.patch that replaces '\xEB' with 'A', the test now succeeds and prints:

test_nonascii (test.test_readline.TestReadline) ... bytearray(b"^A^B^B^B^B^B^B^B\t\tx\t\r\n[\\303\\257nserted]|tA[after]\x08\x08\x08\x08\x08\x08\x08text \'tA\'\r\nline \'[\\xefnserted]|tA[after]\'\r\nindexes 11 13\r\n\x07text \'tA\'\r\nline \'[\\xefnserted]|tA[after]\'\r\nindexes 11 13\r\nsubstitution \'tA\'\r\nmatches [\'tAnt\', \'tAxt\']\r\nx[after]\x08\x08\x08\x08\x08\x08\x08t[after]\x08\x08\x08\x08\x08\x08\x08\r\nresult \'[\\xefnserted]|tAxt[after]\'\r\nhistory \'[\\xefnserted]|tAxt[after]\'\r\ncompleter [\'tA-0\', \'GOT 1\', \'tA-1\', \'GOT 2\', \'tA-2\', \'tA-0\', \'GOT 1\', \'tA-1\', \'GOT 2\', \'tA-2\', \'tAx-0\', \'GOT 3\', \'tAx-1\']\r\n")
ok
msg283708 - (view) Author: Xavier de Gaye (xdegaye) * (Python committer) Date: 2016-12-20 19:36
The test is ok on Android when LANG=en_US.UTF-8.
When LANG is not set, the check made by the first statement in test_nonascii() should skip the test but fails to skip it on Android as Py_EncodeLocale() always encode to utf8 whatever the locale (same as with darwin).
This patch adds a second check to verify that readline does handle multi bytes characters and does skip the test if not. So on Android the test is skipped now when LANG is not set.  Another solution would have been to set LANG=en_US.UTF-8 in the environment of the process forked by run_pty(), but it does not sound robust as there may be other setups where readline does not handle multi bytes characters.
msg283906 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-12-23 22:51
The basic idea of your patch may be reasonable, but something is not right. Imagine the locale is something other than UTF-8. The input code will now contain mojibake print("\xC3\xAB"), although the decode() call will translate the result back to the expected "\xEB".

I suggest this 2nd version of the patch. I used io.TextIOWrapper to use the locale encoding, and incorporated the other test character "\xEF". I also changed the preliminary test to call input() instead of relying on the interactive interpreter, quiet mode, etc.

Just to clarify, is the problem that Python (correctly) assumes UTF-8 encoding on Android, but Readline does not unless you tweak the environment variable? I.e. is Readline assuming ASCII or something and ignoring the test characters? If so, it sounds like this may be a more general problem with Gnu tools and libraries on Android.
msg283953 - (view) Author: Xavier de Gaye (xdegaye) * (Python committer) Date: 2016-12-24 14:04
With your patch Martin, test_nonascii is skipped on Android when LANG is not set and the issue is fixed.

> Just to clarify, is the problem that Python (correctly) assumes UTF-8 encoding on Android, but Readline does not unless you tweak the environment variable?

Readline (correctly) uses the locale environment variables to set eight-bit mode (see function _rl_init_eightbit() in Readline nls.c) and when the LC_CTYPE category is C or POSIX, eight-bit characters are discarded on input and a bell character ('\x07') is echoed instead.
msg284976 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2017-01-08 11:13
.
Thanks for the explanation. It sounds like the Readline library assumes an ASCII-only locale and sets its “convert-meta” variable to “on”. But Python assumes UTF-8 and inputs b"\xC3\xAB" to the terminal. Readline converts the input to two escape sequences: "\N{ESC}\x43" == "\N{ESC}C" (Alt + Capital C), which probably runs the “capitalize-word” command, and "\N{ESC}\x2B" == "\N{ESC}+" (Alt + Plus), which presumably generates the bell character.

I don’t understand why you say Readline is “correctly” using the C or Posix locale (ASCII), while my understanding is Python on Android always uses UTF-8 as the locale encoding. It seems there is an inconsistency with the locale or encodings being used.

Or is this just an obscure case that you choose not to support on Android, and therefore skip the test?
msg285103 - (view) Author: Xavier de Gaye (xdegaye) * (Python committer) Date: 2017-01-10 10:15
Trying to track the 'char * text' pointer that is returned to Python by readline in rlhandler() and using gdb 'set follow-fork-mode child' and 'set detach-on-fork off' to stop (with a breakpoint in rlhandler) in the child spawned by run_pty() in test_nonascii, unfortunately this fails with gdb crashing in a debug session on Android. So I use the attached readline.patch with good old printf instead. The results:

(1) UTF-8 locale:
    command: LANG=en_US.UTF-8 python -m test -v -m test_nonascii test_readline
    stderr output:
        char *text: "[ïnserted]|tëxt[after]"
(2) C locale
    command: python -m test -v -m test_nonascii test_readline
    stderr output:
        char *text: "[ïnsexrted]|t"
(3) C locale and a change in 'script' to not use "\xEB" in 'macro':
    s/macro = "|t\xEB[after]"/macro = "|te[after]"/
    command: python -m test -v -m test_nonascii test_readline
    stderr output:
        char *text: "[ïnserted]|tex[after]"

Case (2) confirms your comment in msg283544: "the Ctrl+A macro seems to have only inserted the two ASCII characters "|t" and has stopped at the non-ASCII character "\xEB".
So readline with the C locale on Android:
* Accepts eight-bits characters in rl_pre_input_hook().
* Discards all the characters after the first eight-bits character in macro expansion.
* Discards eight-bits characters on input.
It seems reasonable to consider that readline is broken with the C locale on Android and to skip the test in that case whether using your last patch Martin, or when test.support.is_android is True.


I don't know if this is relevant, but on archlinux (not on debian), when I run bash and therefore readline, with the C locale (on xterm or on the linux console, using my french keyboard) I have the following results:

[xavier@bilboquet ~]$ LANG= bash
[xavier@bilboquet ~]$                   # first prompt
bash: $'\303': command not found
[xavier@bilboquet ~]$ A                 # second prompt
bash: $'\303A\251': command not found
[xavier@bilboquet                       # third prompt
[xavier@bilboquet ~]$


On the first prompt I type 'é' followed by the <backspace> key and the <return> key.
On the second prompt I type 'é' followed by <left-arrow>, followed by 'A' and <return>.
On the third prompt I type 4 times 'é' followed by 8 times <backspace> and <return>, note how the prompt has been eaten by the backspaces.
msg285533 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2017-01-16 02:07
So the problem seems to be that Python assumes Readline’s encoding is UTF-8, but Readline actually uses ASCII (depending on locale variables). The code at the start of the test is supposed to catch when add_history() calls PyUnicode_EncodeLocale() and fails.

I don’t understand the details of UTF-8 vs locale on Android, but maybe we could adjust the encode() and decode() implementations in Modules/readline.c, to account for the Readline library’s idea of the locale encoding. Or maybe we could adjust the temporary setlocale() calls in Modules/readline.c.

If you are happy to declare the Readline library is broken on Android, I now think I would prefer to skip the test based on support.is_android, rather than the previous patches. Otherwise, we risk masking genuine test failures on other platforms. Something like:

@unittest.skipIf(is_android,
    "Gnu Readline disagrees about the locale encoding on Android")
def test_nonascii(self):
    try:
        readline.add_history("\xEB\xEF")
    ...

When you run “LANG= bash”, it is only Bash and Readline that gets the C locale; the terminal is unchanged. I presume the terminal inputs é as two UTF-8 bytes, but Readline with the C locale is not aware of UTF-8, and assumes the two bytes are two separate characters.
msg285728 - (view) Author: Xavier de Gaye (xdegaye) * (Python committer) Date: 2017-01-18 14:38
PEP 538 [1] coerces the C locale to UTF-8 by setting the locale environment variables (LC_ALL, LANG).  The PEP has an implementation at issue 28180 as pep538_coerce_legacy_c_locale_v3.diff, and the patch fixes test_nonascii when run on the Android emulators (api 21 and 24).  This is as expected as now both Python and Readline are in agreement and use the same encoding.
I think we should wait for the resolution of PEP 538 and of issue 28684. If this does not happen, I agree we should skip the test with the message you are proposing Martin.

[1] https://www.python.org/dev/peps/pep-0538
msg285736 - (view) Author: Xavier de Gaye (xdegaye) * (Python committer) Date: 2017-01-18 15:24
> I think we should wait for the resolution of PEP 538 and of issue 28684

s/issue 28684/issue 28180/
History
Date User Action Args
2017-01-18 15:24:02xdegayesetmessages: + msg285736
2017-01-18 14:38:11xdegayesetmessages: + msg285728
2017-01-16 02:07:48martin.pantersetmessages: + msg285533
2017-01-10 10:21:20xdegayesetassignee: xdegaye ->
2017-01-10 10:15:21xdegayesetfiles: + readline.patch

messages: + msg285103
2017-01-08 11:13:45martin.pantersetmessages: + msg284976
2016-12-24 14:04:57xdegayesetmessages: + msg283953
2016-12-23 22:51:46martin.pantersetfiles: + readline_multibyte.v2.patch

messages: + msg283906
2016-12-20 19:37:02xdegayesetstage: needs patch -> patch review
2016-12-20 19:36:42xdegayesetfiles: + readline_multibyte.patch

messages: + msg283708
2016-12-18 17:54:43xdegayesetfiles: + tracing_the_completer_2.patch

messages: + msg283564
2016-12-18 17:38:29xdegayesetmessages: + msg283563
2016-12-18 15:27:49xdegayesetfiles: + tracing_the_completer.patch
keywords: + patch
messages: + msg283555
2016-12-18 07:24:25martin.pantersetnosy: + martin.panter
messages: + msg283544
2016-12-17 09:06:07xdegayelinkissue26865 dependencies
2016-12-17 09:02:49xdegayecreate