Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyUnicode_AsDecodedObject can only return unicode now #72612

Closed
zhangyangyu opened this issue Oct 13, 2016 · 17 comments
Closed

PyUnicode_AsDecodedObject can only return unicode now #72612

zhangyangyu opened this issue Oct 13, 2016 · 17 comments
Assignees
Labels
3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@zhangyangyu
Copy link
Member

BPO 28426
Nosy @malemburg, @ncoghlan, @vstinner, @larryhastings, @benjaminp, @ezio-melotti, @serhiy-storchaka, @zhangyangyu
PRs
  • [Do Not Merge] Convert Misc/NEWS so that it is managed by towncrier #552
  • Dependencies
  • bpo-28526: Use PyUnicode_AsEncodedString instead of PyUnicode_AsEncodedObject
  • Files
  • PyUnicode_AsDecodedObject-crash.patch
  • deprecate-str-to-str-coding-unicode-api.patch
  • deprecate-str-to-str-coding-unicode-api-2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2016-10-27.18:17:34.417>
    created_at = <Date 2016-10-13.04:30:06.104>
    labels = ['interpreter-core', 'type-bug', '3.7']
    title = 'PyUnicode_AsDecodedObject can only return unicode now'
    updated_at = <Date 2017-03-31.16:36:09.529>
    user = 'https://github.com/zhangyangyu'

    bugs.python.org fields:

    activity = <Date 2017-03-31.16:36:09.529>
    actor = 'dstufft'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2016-10-27.18:17:34.417>
    closer = 'serhiy.storchaka'
    components = ['Interpreter Core']
    creation = <Date 2016-10-13.04:30:06.104>
    creator = 'xiang.zhang'
    dependencies = ['28526']
    files = ['45070', '45071', '45211']
    hgrepos = []
    issue_num = 28426
    keywords = ['patch']
    message_count = 17.0
    messages = ['278545', '278546', '278550', '278551', '278552', '278554', '278628', '279133', '279337', '279344', '279363', '279368', '279375', '279378', '279557', '279559', '279718']
    nosy_count = 10.0
    nosy_names = ['lemburg', 'ncoghlan', 'vstinner', 'larry', 'benjamin.peterson', 'ezio.melotti', 'Arfrever', 'python-dev', 'serhiy.storchaka', 'xiang.zhang']
    pr_nums = ['552']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue28426'
    versions = ['Python 3.6', 'Python 3.7']

    @zhangyangyu
    Copy link
    Member Author

    PyUnicode_AsDecodedObject was added in f46d49e2e0f0 and became an API in 2284fa89ab08. It seems its intention is to return a Python object. But during evolution, with commits 5f11621a6f51 and 123f2dc08b3e, it can only return unicode now, becoming another version of PyUnicode_AsDecodedUnicode. Is this the wanted behaviour?

    @zhangyangyu zhangyangyu added 3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error labels Oct 13, 2016
    @serhiy-storchaka
    Copy link
    Member

    Indeed. The only difference is that PyUnicode_AsDecodedUnicode fails for most encodings (except rot13), but PyUnicode_AsDecodedObject just crashes in debug build. It seems to me that these functions (as well as PyUnicode_AsEncodedUnicode) shouldn't exist it Python 3. None of these functions are documented. PyUnicode_AsDecodedObject emits Py3k warning in 2.7. PyUnicode_AsDecodedUnicode and PyUnicode_AsEncodedUnicode were added in Python 3 (2284fa89ab08), and the purpose of this is not clear to me. They work only with rot13, but general PyCodec_Decode and PyCodec_Encode can be used instead. Could you please explain Marc-André?

    @malemburg
    Copy link
    Member

    PyUnicode_AsDecodedObject() and PyUnicode_AsEncodedObject() were meant as C API implementations of the unicode.decode() and unicode.encode() methods in Python2. Not having PyUnicode_AsDecodedObject() documented was likely an oversight on my part.

    In Python2, unicode.decode() and unicode.encode() were more or less direct interfaces to the codec registry. In Python 2.7 this was changed to issue a warning for porting to Python 3.

    In Python3, the methods were changed to only return unicode objects and to reflect this change without breaking the C API, the new PyUnicode_AsDecodedUnicode() and PyUnicode_AsEncodedUnicode() were added.

    I guess the more recent changes simply didn't pay attention to this difference anymore and put restrictions on the output of PyUnicode_AsDecodedObject() and PyUnicode_AsEncodedObject() which were not originally intended, hence the crash you are seeing, Serhiy.

    Going forward, C extensions in Python3 could indeed use the PyCodec_*() APIs directly.

    @serhiy-storchaka
    Copy link
    Member

    Shouldn't all these function be deprecated in favour of PyCodec_*() APIs?

    @malemburg
    Copy link
    Member

    On 13.10.2016 10:10, Serhiy Storchaka wrote:

    Shouldn't all these function be deprecated in favour of PyCodec_*() APIs?

    Not all of them, since you still want to have a C API for
    unicode.encode(). PyUnicode_AsEncodedUnicode() would have
    to stay.

    As for the others, I don't know how much use they get.

    @serhiy-storchaka
    Copy link
    Member

    I propose following:

    1. Fix a crash in PyUnicode_AsDecodedObject by removing unicode_result() in all maintained 3.x versions (starting from 3.4? or 3.3?).

    2. Deprecate PyUnicode_AsDecodedObject, PyUnicode_AsDecodedUnicode and PyUnicode_AsEncodedUnicode in 3.6, make they always failing in 3.7 and remove them in future versions. They shouldn't be widely used since they are not documented, PyUnicode_AsDecodedObject already is deprecated in 2.7, and the only supported standard encoding is rot13.

    @zhangyangyu
    Copy link
    Member Author

    +1 for 2). Patch looks good.

    @serhiy-storchaka
    Copy link
    Member

    Marc-Andre, what are your thoughts about this?

    @malemburg
    Copy link
    Member

    As I already mentioned, PyUnicode_AsEncodedUnicode() needs to stay, since it's the C API for unicode.encode(). The others can be deprecated.

    @zhangyangyu
    Copy link
    Member Author

    Marc-Andre, shouldn't the C API of unicode.encode() be PyUnicode_AsEncodedString instead of PyUnicode_AsEncodedUnicode now?

    BTW Serhiy, how about PyUnicode_AsEncodedObject? Not see it in your deprecate list.

    @malemburg
    Copy link
    Member

    On 25.10.2016 04:16, Xiang Zhang wrote:

    Marc-Andre, shouldn't the C API of unicode.encode() be PyUnicode_AsEncodedString instead of PyUnicode_AsEncodedUnicode now?

    You're right. I got confused with all the slight variations.

    BTW Serhiy, how about PyUnicode_AsEncodedObject? Not see it in your deprecate list.

    Let's see what we have:

    PyUnicode_AsEncodedString(): encode to bytes (unicode.encode())
    PyUnicode_AsEncodedUnicode(): encode to unicode (for e.g. rot13)
    PyUnicode_AsEncodedObject(): encode to whatever the codec returns

    codecs.encode() can be used for the last two.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 25, 2016

    New changeset 71dce630dc02 by Serhiy Storchaka in branch '3.4':
    Issue bpo-28426: Fixed potential crash in PyUnicode_AsDecodedObject() in debug build.
    https://hg.python.org/cpython/rev/71dce630dc02

    New changeset 74569ecd67e4 by Serhiy Storchaka in branch '3.5':
    Issue bpo-28426: Fixed potential crash in PyUnicode_AsDecodedObject() in debug build.
    https://hg.python.org/cpython/rev/74569ecd67e4

    New changeset 6af1a26e655f by Serhiy Storchaka in branch '3.6':
    Issue bpo-28426: Fixed potential crash in PyUnicode_AsDecodedObject() in debug build.
    https://hg.python.org/cpython/rev/6af1a26e655f

    New changeset 1ab1fd00e9d6 by Serhiy Storchaka in branch 'default':
    Issue bpo-28426: Fixed potential crash in PyUnicode_AsDecodedObject() in debug build.
    https://hg.python.org/cpython/rev/1ab1fd00e9d6

    @serhiy-storchaka
    Copy link
    Member

    BTW Serhiy, how about PyUnicode_AsEncodedObject? Not see it in your deprecate list.

    Indeed. It is not such useless as other functions (that support only the rot13 encoding), but it can be replaced with either PyUnicode_AsEncodedString or PyCodec_Encode (both exists in 2.7).

    Updated patch deprecates PyUnicode_AsEncodedObject too. Please make a review of warning messages, I'm not sure about the wording.

    @serhiy-storchaka serhiy-storchaka self-assigned this Oct 25, 2016
    @zhangyangyu
    Copy link
    Member Author

    LGTM. I can understand the wording.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 27, 2016

    New changeset 15a494886c5a by Serhiy Storchaka in branch '3.6':
    Issue bpo-28426: Deprecated undocumented functions PyUnicode_AsEncodedObject(),
    https://hg.python.org/cpython/rev/15a494886c5a

    New changeset 50c28727d91c by Serhiy Storchaka in branch 'default':
    Issue bpo-28426: Deprecated undocumented functions PyUnicode_AsEncodedObject(),
    https://hg.python.org/cpython/rev/50c28727d91c

    @serhiy-storchaka
    Copy link
    Member

    Thanks Xiang.

    @Arfrever
    Copy link
    Mannequin

    Arfrever mannequin commented Oct 30, 2016

    New changeset 15a494886c5a by Serhiy Storchaka in branch '3.6':
    Issue bpo-28426: Deprecated undocumented functions PyUnicode_AsEncodedObject(),
    https://hg.python.org/cpython/rev/15a494886c5a

    s/that encode form str/that encode from str/ in Include/unicodeobject.h

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants