Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode character ends interactive session #67612

Closed
AGrzes mannequin opened this issue Feb 9, 2015 · 3 comments
Closed

Unicode character ends interactive session #67612

AGrzes mannequin opened this issue Feb 9, 2015 · 3 comments
Labels
OS-windows topic-unicode type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@AGrzes
Copy link
Mannequin

AGrzes mannequin commented Feb 9, 2015

BPO 23424
Nosy @vstinner, @tjguk, @ezio-melotti, @zware, @eryksun, @zooba
Superseder
  • bpo-1602: windows console doesn't print or input Unicode
  • Files
  • -v.txt
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2015-02-13.21:42:39.440>
    created_at = <Date 2015-02-09.20:37:19.694>
    labels = ['expert-unicode', 'OS-windows', 'type-crash']
    title = 'Unicode character ends interactive session'
    updated_at = <Date 2015-02-13.21:42:39.440>
    user = 'https://bugs.python.org/AGrzes'

    bugs.python.org fields:

    activity = <Date 2015-02-13.21:42:39.440>
    actor = 'terry.reedy'
    assignee = 'none'
    closed = True
    closed_date = <Date 2015-02-13.21:42:39.440>
    closer = 'terry.reedy'
    components = ['Unicode', 'Windows']
    creation = <Date 2015-02-09.20:37:19.694>
    creator = 'AGrzes'
    dependencies = []
    files = ['38061']
    hgrepos = []
    issue_num = 23424
    keywords = []
    message_count = 3.0
    messages = ['235629', '235644', '235655']
    nosy_count = 7.0
    nosy_names = ['vstinner', 'tim.golden', 'ezio.melotti', 'zach.ware', 'eryksun', 'steve.dower', 'AGrzes']
    pr_nums = []
    priority = 'normal'
    resolution = 'duplicate'
    stage = 'resolved'
    status = 'closed'
    superseder = '1602'
    type = 'crash'
    url = 'https://bugs.python.org/issue23424'
    versions = ['Python 3.4']

    @AGrzes
    Copy link
    Mannequin Author

    AGrzes mannequin commented Feb 9, 2015

    Inputing some Unicode characters (like 'łąśćńó...') causes interactive session to abort.

    When console session is set to use UTF-8 code page (65001) after diacritic character appears in string the session abruptly ends. Looking into debug output it looks like some cleanup is performed but there are no error messages indicating what caused problem.

    Problem spotted on Windows 10 (technical preview) but I may try to replicate it on some released operating system.

    ---
    C:\>chcp 1250
    Active code page: 1250

    C:\>python -i
    Python 3.4.2 (v3.4.2:ab2c023a9432, Oct  6 2014, 22:15:05) [MSC v.1600 32 bit (Intel)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> 'ł'
    'ł'
    >>> exit()

    C:\>chcp 65001
    Active code page: 65001

    C:\>python -i
    Python 3.4.2 (v3.4.2:ab2c023a9432, Oct  6 2014, 22:15:05) [MSC v.1600 32 bit (Intel)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> 'ł'

    C:\

    @AGrzes AGrzes mannequin added topic-unicode OS-windows type-crash A hard crash of the interpreter, possibly with a core dump labels Feb 9, 2015
    @vstinner
    Copy link
    Member

    vstinner commented Feb 9, 2015

    This issue looks to be a duplicate of the issue bpo-1602: windows console doesn't print or input Unicode. It's a limitation of Windows, not of Python itself. Python supports any Unicode character if the output is written in a file (encoded in UTF-8).

    Workaround: use IDLE or another Python "REPL" (interactive interpreter) which has a better Unicode support.

    @eryksun
    Copy link
    Contributor

    eryksun commented Feb 10, 2015

    This isn't a Python bug. The Windows console doesn't properly support UTF-8. See bpo-1602 and Drekin's win-unicode-console, an alternative REPL based on the wide-character (UCS-2) console API.

    FWIW, I attached a debugger to conhost.exe under Windows 7 to inspect what's happening here. In the client, the CRT's read() function calls WinAPI ReadFile. For a console handle this calls either ReadConsoleA or (in Windows 8+) NtReadFile. Either way, most of the action happens in the server process, conhost.exe.

    The server's input buffer is Unicode, which gets encoded to CP 65001 (UTF-8) by calling WideCharToMultibyte. However the server incorrectly assumes the current codepage is a Windows ANSI codepage with a one-to-one mapping, i.e. that each 16-bit wchar_t maps to an 8-bit char in the current codepage. Since 'ł' gets UTF-8 encoded as the two-byte string b'\xc5\x82', the allocated buffer is too small by a byte. The server doesn't recover from this failure by allocating a larger buffer. It just reports back to the client process that it read 0 bytes. The CRT in turn sets the end-of-file (EOF) flag on the stdin FILE stream, which causes Python to exit 'normally'.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    OS-windows topic-unicode type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants