Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unicode_internal codec #80478

Closed
methane opened this issue Mar 15, 2019 · 9 comments
Closed

Remove unicode_internal codec #80478

methane opened this issue Mar 15, 2019 · 9 comments
Labels
3.8 only security fixes topic-unicode

Comments

@methane
Copy link
Member

methane commented Mar 15, 2019

BPO 36297
Nosy @malemburg, @vstinner, @ezio-melotti, @methane, @serhiy-storchaka
PRs
  • bpo-36297: remove "unicode_internal" codec #12342
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-03-18.06:44:28.104>
    created_at = <Date 2019-03-15.05:32:43.113>
    labels = ['3.8', 'expert-unicode']
    title = 'Remove unicode_internal codec'
    updated_at = <Date 2019-03-18.10:08:24.371>
    user = 'https://github.com/methane'

    bugs.python.org fields:

    activity = <Date 2019-03-18.10:08:24.371>
    actor = 'methane'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-03-18.06:44:28.104>
    closer = 'methane'
    components = ['Unicode']
    creation = <Date 2019-03-15.05:32:43.113>
    creator = 'methane'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 36297
    keywords = ['patch']
    message_count = 9.0
    messages = ['337965', '337976', '338000', '338005', '338006', '338009', '338164', '338184', '338190']
    nosy_count = 5.0
    nosy_names = ['lemburg', 'vstinner', 'ezio.melotti', 'methane', 'serhiy.storchaka']
    pr_nums = ['12342']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue36297'
    versions = ['Python 3.8']

    @methane
    Copy link
    Member Author

    methane commented Mar 15, 2019

    unicode_internal codec is deprecated since Python 3.3.
    It raises DeprecationWarning from 3.3.

    >>> "hello".encode('unicode_internal')
    __main__:1: DeprecationWarning: unicode_internal codec has been deprecated
    b'h\x00\x00\x00e\x00\x00\x00l\x00\x00\x00l\x00\x00\x00o\x00\x00\x00'

    May I remove it in 3.8?

    @methane methane added 3.8 only security fixes topic-unicode labels Mar 15, 2019
    @vstinner
    Copy link
    Member

    I found:

    Files which contain "unicode_internal":

    Doc/library/codecs.rst
    Doc/whatsnew/3.3.rst
    Lib/encodings/unicode_internal.py
    Lib/test/test_codeccallbacks.py
    Lib/test/test_codecs.py
    Lib/test/test_unicode.py
    Misc/HISTORY
    Modules/_codecsmodule.c
    Modules/clinic/_codecsmodule.c.h
    Objects/unicodeobject.c
    PCbuild/lib.pyproj

    May I remove it in 3.8?

    Since using the codec emits a DeprecationWarning at runtime, I think that it's safe to remove it.

    @serhiy-storchaka
    Copy link
    Member

    What is the purpose of the unicode-internal codec at first place?

    @malemburg
    Copy link
    Member

    On 15.03.2019 17:35, Serhiy Storchaka wrote:

    What is the purpose of the unicode-internal codec at first place?

    It provides a fast and direct access to the internal representation of
    Unicode used in Python to the outside world.

    @serhiy-storchaka
    Copy link
    Member

    Is it for debugging only?

    @malemburg
    Copy link
    Member

    On 15.03.2019 17:55, Serhiy Storchaka wrote:

    Is it for debugging only?

    No, you can use it to store Unicode object as-is without any
    encoding/decoding, but after the recent changes to the internals
    of the Unicode implementation it's not all that useful anymore,
    since we now have per object state which is not reflected by the
    codec.

    @methane
    Copy link
    Member Author

    methane commented Mar 18, 2019

    New changeset 6a16b18 by Inada Naoki in branch 'master':
    bpo-36297: remove "unicode_internal" codec (GH-12342)
    6a16b18

    @methane methane closed this as completed Mar 18, 2019
    @vstinner
    Copy link
    Member

    Thanks INADA-san. IMHO Python has too many codecs, it's painful to maintain them. So it's nice to see deprecate ones to be removed.

    Next step: remove all deprecated APIs using Py_UNICODE* :-D (I know that Serhiy is working on that.)

    @methane
    Copy link
    Member Author

    methane commented Mar 18, 2019

    I tried to remove all legacy API and wchar_t cache in unicodeobject. This is experimental branch.
    https://github.com/methane/cpython/pull/18/files

    I'm thinking about adding configure option to remove them from 3.8.

    • It may help people to find third party extensions using legacy API.
    • Projects which doesn't use such third party extension can use this option to reduce some memory usage (8 byte for all unicode object).

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes topic-unicode
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants