This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: readline interferes with characters beginning with byte \xe9
Type: behavior Stage: needs patch
Components: Unicode Versions: Python 3.2, Python 3.3, Python 3.4, Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: eric.araujo, petri.lehtinen, serhiy.storchaka, takluyver, vstinner
Priority: normal Keywords:

Created on 2011-03-26 00:28 by takluyver, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (6)
msg132192 - (view) Author: Thomas Kluyver (takluyver) * Date: 2011-03-26 00:28
To replicate, in Python 3.1 on Linux (utf-8 console):

>>> print(chr(0x9000))
退

Copy and paste this character into the prompt. It appears correctly (as a Chinese character). Then:

>>> import readline
>>> readline.parse_and_bind('"\M-i":"    "')

Now try to paste the character again: it appears as "    ��"
 (four spaces, two unknown character symbols), and if you press return, you get a SyntaxError.

This happens with all characters beginning with \xe9: In UTF-8, that's 0x9000-0x9fff. If the terminal encoding is changed to cp1252, I'm told that the same thing can be achieved with é, which is \xe9 there.
msg141435 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2011-07-30 10:41
You're binding the M-i keyboard sequence. Could it be that the \xe9 byte is translated by the terminal to M-i, and that causes the interference? In this case, it's not really a bug.
msg175836 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012-11-18 00:15
Original bug report: https://github.com/ipython/ipython/issues/58
msg175837 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-11-18 00:26
I confirm that the issue exists, but I don't think that it comes from Python. I bet that the readline library uses *byte* string, not *character* string, and so is unable to handle correctly multibyte characters like the chinese character U+9000.
msg175854 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-18 10:06
Yes, this is a readline issue.

Add '"\M-i":"    "' line to ~/.inputrc, run 'rlwrap cat' command, paste this multibyte character and you got the same result.

This is not a Python bug.
msg175954 - (view) Author: Thomas Kluyver (takluyver) * Date: 2012-11-19 10:45
OK, thanks, and sorry for the noise. I've closed this issue.

Looking at the readline manual, it looks like this is tied up with the options input-meta, output-meta and convert-meta. Fiddling around with .inputrc hasn't clarified exactly what they do, but it seems that the terminal can either handle unicode, or shortcuts involving meta (alt), but not both.
History
Date User Action Args
2022-04-11 14:57:15adminsetgithub: 55888
2012-11-19 10:45:14takluyversetstatus: open -> closed
resolution: not a bug
messages: + msg175954
2012-11-18 10:06:13serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg175854
2012-11-18 00:26:40vstinnersetmessages: + msg175837
2012-11-18 00:15:16eric.araujosetversions: + Python 3.4
nosy: + eric.araujo

messages: + msg175836

stage: needs patch
2011-07-30 10:41:45petri.lehtinensetnosy: + petri.lehtinen

messages: + msg141435
versions: + Python 2.7, Python 3.2, Python 3.3, - Python 2.6, Python 3.1
2011-03-26 00:32:55pitrousetnosy: + vstinner
2011-03-26 00:28:38takluyvercreate