This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Pasting the U00FF character into Python REPL misinterprets character
Type: behavior Stage: resolved
Components: Versions: Python 3.10
process
Status: closed Resolution: third party
Dependencies: Superseder:
Assigned To: Nosy List: gwk, ned.deily
Priority: normal Keywords:

Created on 2021-11-26 16:28 by gwk, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (3)
msg407065 - (view) Author: George King (gwk) * Date: 2021-11-26 16:28
Using macOS 11.6 Terminal.app with Python 3.10.0 installed directly from python.org.

I open the REPL. If I enter `char(0xff)` I get back 'ÿ' as expected (U00FF LATIN SMALL LETTER Y WITH DIAERESIS).

However, If I copy this character with surrounding quotes, and then paste it into the REPL, it pastes as '' and evaluates to the empty string.

If I copy it without quotes and then paste into the REPL, I see nothing. When I hit return, the prompt renders as `>>> ^M>>>`. This suggests that the character is getting misinterpreted as a control character or something.

If I paste it into the terminal shell when the Python REPL is not running, it appears as the latin1 letter that I expect.

If I run `python3 -c 'print("ÿ")'` the character prints fine.

It seems to me that the python REPL is setting some terminal mode that fails on this particular character. Perhaps this is a problem with the macOS readline/libedit implementation?

It seems that only U00FF is problematic; U00FE and U01000 both paste in just fine.

I verified that my terminal profile is set to UTF-8 encoding. I also repeated this experiment in the Kitty terminal emulator, and got identical results.


Here is the readline version:
>>> readline._READLINE_LIBRARY_VERSION
'EditLine wrapper'
>>> readline._READLINE_RUNTIME_VERSION
1026
>>> readline._READLINE_VERSION
1026
msg407066 - (view) Author: George King (gwk) * Date: 2021-11-26 16:29
Edit: `chr(0xff)`
msg407190 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2021-11-28 06:28
Thanks for the report. macOS does not ship with the GNU readline library due to its GPL licensing and instead relies on the BSD editline library, libedit, which, while providing similar functionality, has a different API than GNU readline. However, editline does provide a compatibility layer that provides much, but not all, of the GNU readline API. Third-party programs like Python have linked with that compatibility layer for many years but there are some shortcomings with it, like when trying to use full Unicode in the REPL as in your case.

There have been many reports over the years of similar problems in Python and in other products that use the readline compatibility layer of libedit. If this behavior is unacceptable, the standard recommendation on the webs has been to use a version of the product that is linked with GNU readline rather than with libedit's readline layer. (Alas, Python does not support directly linking with libedit's native API which would likely avoid these issues.) The potential drawback to using GNU readline is that it is licensed under GPL v3 which may be unacceptable for some users.

There is a third-party package on PyPI called gnureadline which allowed replacing the Python readline module with one linked with GNU readline; I'm not sure what its status is as it doesn't appear to have been updated recently. Alternatively, there are Python distributions from other sources (like Homebrew and MacPorts) that optionally provide GNU readline for Python. 

Hope that helps!
History
Date User Action Args
2022-04-11 14:59:52adminsetgithub: 90062
2021-11-28 06:28:19ned.deilysetstatus: open -> closed

nosy: + ned.deily
messages: + msg407190

resolution: third party
stage: resolved
2021-11-26 16:29:21gwksetmessages: + msg407066
2021-11-26 16:28:02gwkcreate