Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

input() doesn't catch _PyUnicode_AsString() exception; io.StringIO().encoding is None #52503

Closed
dangyogi mannequin opened this issue Mar 28, 2010 · 32 comments
Closed

input() doesn't catch _PyUnicode_AsString() exception; io.StringIO().encoding is None #52503

dangyogi mannequin opened this issue Mar 28, 2010 · 32 comments
Labels
3.7 (EOL) end of life stdlib Python modules in the Lib dir topic-IO type-bug An unexpected behavior, bug, or error

Comments

@dangyogi
Copy link
Mannequin

dangyogi mannequin commented Mar 28, 2010

BPO 8256
Nosy @amauryfa, @abalkin, @pitrou, @vstinner, @bitdancer, @florentx, @embray, @vadmium, @serhiy-storchaka
PRs
  • bpo-8256: Fixed possible failing or crashing input() #517
  • bpo-8256: Fixed possible failing or crashing input() (#517) #640
  • [3.6] bpo-8256: Fixed possible failing or crashing input() #641
  • [3.5] bpo-8256: Fixed possible failing or crashing input() #642
  • [Do Not Merge] Sample of CPython life with blurb. #703
  • Dependencies
  • bpo-24402: input() uses sys.stdout instead of sys.stdout for prompt
  • Files
  • bug.py: small bug demo program
  • 8256_3.patch
  • input_stdout_encoding.patch
  • input_stdout_none_encoding.patch
  • p1345978092.diff
  • input_fallback.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2017-03-17.21:11:46.505>
    created_at = <Date 2010-03-28.20:15:14.678>
    labels = ['3.7', 'type-bug', 'library', 'expert-IO']
    title = "input() doesn't catch _PyUnicode_AsString() exception; io.StringIO().encoding is None"
    updated_at = <Date 2017-03-24.22:23:04.140>
    user = 'https://bugs.python.org/dangyogi'

    bugs.python.org fields:

    activity = <Date 2017-03-24.22:23:04.140>
    actor = 'serhiy.storchaka'
    assignee = 'none'
    closed = True
    closed_date = <Date 2017-03-17.21:11:46.505>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)', 'IO']
    creation = <Date 2010-03-28.20:15:14.678>
    creator = 'dangyogi'
    dependencies = ['24402']
    files = ['16681', '16700', '17324', '23608', '27006', '40704']
    hgrepos = []
    issue_num = 8256
    keywords = ['patch']
    message_count = 32.0
    messages = ['101874', '101876', '101879', '101880', '101886', '101888', '101900', '101904', '101956', '105422', '105435', '105436', '105439', '105555', '105616', '105676', '105699', '105700', '146590', '146969', '146973', '168873', '168880', '169166', '252421', '252422', '252437', '255217', '255220', '290194', '290195', '290200']
    nosy_count = 12.0
    nosy_names = ['amaury.forgeotdarc', 'belopolsky', 'pitrou', 'vstinner', 'dangyogi', 'r.david.murray', 'gruszczy', 'flox', 'erik.bray', 'aliles', 'martin.panter', 'serhiy.storchaka']
    pr_nums = ['517', '640', '641', '642', '703']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue8256'
    versions = ['Python 3.5', 'Python 3.6', 'Python 3.7']

    @dangyogi
    Copy link
    Mannequin Author

    dangyogi mannequin commented Mar 28, 2010

    I'm getting a "TypeError: bad argument type for built-in operation" on a print() with no arguments. This seems to be a problem in both 3.1 and 3.1.2 (haven't tried 3.1.1).

    I've narrowed the problem down in a very small demo program that you can run to reproduce the bug. Just do "python3.1 bug.py" and hit <ENTER> at the "prompt:".

    Removing the doctest call (and calling "foo" directly) doesn't get the error. Also removing the "input" call (and leaving the doctest call in) doesn't get the error.

    The startup banner on my python3.1 is:
    Python 3.1.2 (r312:79147, Mar 26 2010, 16:55:44)
    [GCC 4.3.3] on linux2

    I compiled python 3.1.2 with ./configure, make, make altinstall without any options. I'm running ubuntu 9.04 with the 2.6.28-18-generic (32-bit) kernel.

    @dangyogi dangyogi mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) stdlib Python modules in the Lib dir topic-IO type-bug An unexpected behavior, bug, or error labels Mar 28, 2010
    @florentx
    Copy link
    Mannequin

    florentx mannequin commented Mar 28, 2010

    Confirmed.

    There's something wrong around the doctest._SpoofOut class.
    This script triggers the same bug (both 3.x and 3.1).

    Output:

    $ ./python issue8256_case.py 
    prompt:
    Traceback (most recent call last):
      File "issue8256_case.py", line 13, in <module>
        foo()
      File "issue8256_case.py", line 7, in foo
        print()
    TypeError: bad argument type for built-in operation

    @florentx florentx mannequin removed the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Mar 28, 2010
    @gruszczy
    Copy link
    Mannequin

    gruszczy mannequin commented Mar 28, 2010

    The bug is triggered by input, not by print. The exact place is _PyUnicode_AsStringAndSize, where unicode check happens. Then print checks PyError_Occured and catches this error. Either this error should not be raised or should be cleared input finishes.

    I'd love to provide a patch, but I have no idea, what should be corrected and how. If some would tutor me a little, I would be very happy to learn and code this.

    @florentx
    Copy link
    Mannequin

    florentx mannequin commented Mar 28, 2010

    Right. It does not involve doctest.

    #
    import io, sys
    
    original_stdout = sys.stdout

    try:
    sys.stdout = io.StringIO()
    input("prompt:")
    print()
    finally:
    sys.stdout = original_stdout

    @gruszczy
    Copy link
    Mannequin

    gruszczy mannequin commented Mar 29, 2010

    The problem occurs in line in bltinmodule.c:

    		po = PyUnicode_AsEncodedString(stringpo,
    			_PyUnicode_AsString(stdout_encoding), NULL);

    Where _PyUnicode_AsString returns NULL, since stdout_encoding is Py_None and that won't pass PyUnicode_Check in _PyUnicode_AsStringAndSize. To what object can _PyUnicode_AsString be turned and then passed to _PyUnicode_AsStringAndSize? Is there some default 'utf-8' encoding object?

    @bitdancer
    Copy link
    Member

    Whatever the solution to this issue is, it certainly looks like a bug that the return value of that function isn't being checked for errors.

    @gruszczy
    Copy link
    Mannequin

    gruszczy mannequin commented Mar 29, 2010

    I have written a small patch, that solves the problem, but is disgusting. Could anyone tell me, how I can get some default encoding from Python internals (I have no idea where to look) and return it inside _PyUnicode_AsStringAndSize? Anyway, now when the error happens inside input, it raises an Exception properly. So now I only need to know, how to correct the bug in an elegant fashion.

    @gruszczy
    Copy link
    Mannequin

    gruszczy mannequin commented Mar 29, 2010

    Ok, I have found Py_FileDefaultSystemEncoding and use it, however I had to cast it to (char *), because it's a const char *. Maybe I could do it better?

    @gruszczy
    Copy link
    Mannequin

    gruszczy mannequin commented Mar 30, 2010

    I have read, that I shouldn't directly use Py_FileSystemDefaultEncoding and rather use PyUnicode_GetDefaultEncoding, so I have changed the code a little.

    @gruszczy
    Copy link
    Mannequin

    gruszczy mannequin commented May 9, 2010

    Bump! Is there anything happening about this bug? Is my patch any good or should I try to work on something different?

    @bitdancer
    Copy link
    Member

    Victor, you've been dealing with Python's default encoding lately, care to render an opinion on the correct fix for this bug?

    @filip: the patch will need a unit test, which will also help with assessing the validity of the fix.

    @gruszczy
    Copy link
    Mannequin

    gruszczy mannequin commented May 10, 2010

    I'll try to code a small test this evening.

    @amauryfa
    Copy link
    Member

    The patch is wrong: _PyUnicode_AsString(Py_None) should not return "utf8"!

    I suggest that since PyOS_Readline() write the prompt to stderr, the conversion uses the encoding of stderr.

    @gruszczy
    Copy link
    Mannequin

    gruszczy mannequin commented May 11, 2010

    Amaury, could you elaborate a little more on this? I am pretty new to all this and I would happily write the patch, if only you could give me some clue on how I should approach this.

    @vstinner
    Copy link
    Member

    This issue is directly related to issue bpo-6697. The first problem is that the builtin input() function doesn't check that _PyUnicode_AsString() result is not NULL.

    The second problem is that io.StringIO().encoding is None. I don't understand why it is None whereas it uses utf8 (it calls TextIOWrapper constructor with encodings="utf8" and errors="strict").

    I will be difficult to write an unit test because the issue only occurs if stdin and stdout are TTY: input() calls PyOS_Readline(stdin, stdout, prompt).

    --

    @gruszczy: You're patch is just a workaround, not the right fix. The problem should be fixed in input(), not in PyUnicode methods. _PyUnicode_AsString() expects an unicode argument, it should raise an error if the argument is None (and not return a magical value).

    @vstinner vstinner changed the title TypeError: bad argument type for built-in operation input() doesn't catch _PyUnicode_AsString() exception; io.StringIO().encoding is None May 13, 2010
    @vstinner
    Copy link
    Member

    Here is a patch catching the _PyUnicode_AsString() error.

    input() uses sys.stdout.encoding to encode the prompt to a byte string, but
    PyOS_StdioReadline() writes the prompt to stderr (it should use sys_stdout).

    I don't know which encoding should be used if sys.stdout.encoding is None (eg.
    if sys.stdout is a StringIO() object).

    StringIO() of _io module has no encoding because it stores unicode characters,
    not bytes. StringIO() of _pyio module is based on BytesIO() and use utf8
    encoding, but the reference implementation is now _io.

    @amauryfa
    Copy link
    Member

    since the prompt is written to stderr, why is sys.stdout.encoding used instead of sys.stderr.encoding?

    @vstinner
    Copy link
    Member

    amaury> since the prompt is written to stderr, why is sys.stdout.encoding
    amaury> used instead of sys.stderr.encoding?

    input() calls PyOS_Readline() but PyOS_Readline() has multiple
    implementations:

    • PyOS_StdioReadline() if sys_stdin or sys_stdout is not a TTY
    • or PyOS_ReadlineFunctionPointer callback:
      • vms__StdioReadline() (VMS only)
      • PyOS_StdioReadline()
      • call_readline() when readline module is loaded

    call_readline() calls rl_callback_handler_install() with the prompt which
    writes the prompt to *stdout* (try ./python 2>/dev/null).

    I don't think that it really matters that the prompt is written to stderr with
    stdout encoding, because both outputs always use the same encoding.

    @florentx
    Copy link
    Mannequin

    florentx mannequin commented Oct 29, 2011

    Confirmed in 3.3.
    The patch does not apply cleanly on trunk.

    @vstinner
    Copy link
    Member

    vstinner commented Nov 3, 2011

    A patch similar to input_stdout_encoding.patch has been applied to 3.2 and 3.3 for the issue bpo-6697: see changeset 846866aa0eb6.

    @vstinner
    Copy link
    Member

    vstinner commented Nov 3, 2011

    input_stdout_none_encoding.patch uses UTF-8 if sys.stdout.encoding is None.

    @aliles
    Copy link
    Mannequin

    aliles mannequin commented Aug 22, 2012

    Replicated this issue on Python 3.3b2. The cause is the 'encoding' and 'errors' attributes on io.StringIO() being None. Doctest replaces sys.stdout with a StringIO subclass. The exception raised is still a TypeError.

    At this point I'm unsure what the fix should be:

    1. Should the exception raised be more descriptive of the problem?
    2. Should io.StringIO have real values for encoding and errors?
    3. Should Doctest's StingIO class provide encoding and errors?

    @pitrou
    Copy link
    Member

    pitrou commented Aug 22, 2012

    I suggest that since PyOS_Readline() write the prompt to stderr, the
    conversion uses the encoding of stderr.

    Agreed with Amaury.

    @aliles
    Copy link
    Mannequin

    aliles mannequin commented Aug 26, 2012

    Upload new patch that uses encoding and errors from stderr if stdout values are invalid unicode. Includes unit test in test_builtin.py.

    With this patch I am no longer able to replicate this issue.

    @serhiy-storchaka
    Copy link
    Member

    I would fallback to PyFile_WriteObject(prompt, fout, Py_PRINT_RAW) if the stdout has no the encoding attribute or it is not a string.

    @serhiy-storchaka
    Copy link
    Member

    Here is a patch.

    @vadmium
    Copy link
    Member

    vadmium commented Oct 6, 2015

    Serhiy, your patch looks like a worthwhile improvement because it adds proper error checking and handling. However I suspect this original bug is actually a side effect of bpo-24402. The code in question shouldn’t even be running, because sys.stdout is not the original output file descriptor, and is not a terminal.

    @embray
    Copy link
    Contributor

    embray commented Nov 23, 2015

    I just recently discovered this myself. In the process of debugging the issue I also noticed the same bug that is now fixed via bpo-24402.

    While I agree that bpo-24402 mostly mitigates the issue I think this patch is still worthwhile, as the current behavior still leads to cryptic, hard to debug errors. For example (although this is not great code, bear with me...) one could write a stdout wrapper like:

    >>> class WrappedStream:
    ...     encoding = 'utf8'
    ...     errors = None
    ...     def __getattr__(self, attr):
    ...         return getattr(sys.__stdout__, attr)
    ... 
    >>> sys.stdout = WrappedStream()
    >>> sys.stdout.fileno()
    1
    >>> sys.stdout.isatty()
    True
    >>> sys.stdout.errors
    >>> input('Prompt: ')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: bad argument type for built-in operation

    This still goes down the path for ttys, but because the 'errors' attribute does not defer to the underlying stream it still leads to a hard to debug exception. To be clear, I think the above code *should* break, just not as cryptically.

    @embray
    Copy link
    Contributor

    embray commented Nov 23, 2015

    I think the above code *should* break

    Actually, I see now that Serhiy's patch would allow this example to just pass through to the non-interactive fallback. So I take it back that my example should break--I think using the fallback would also be fine.

    @serhiy-storchaka
    Copy link
    Member

    New changeset a16894e by Serhiy Storchaka in branch '3.5':
    [3.5] bpo-8256: Fixed possible failing or crashing input() (#642)
    a16894e

    @serhiy-storchaka
    Copy link
    Member

    New changeset aac875f by Serhiy Storchaka in branch '3.6':
    [3.6] bpo-8256: Fixed possible failing or crashing input() (#641)
    aac875f

    @serhiy-storchaka
    Copy link
    Member

    New changeset c2cf128 by Serhiy Storchaka in branch 'master':
    bpo-8256: Fixed possible failing or crashing input() (#517)
    c2cf128

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life stdlib Python modules in the Lib dir topic-IO type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    7 participants