Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add codecs.unregister() to unregister a codec search function #86008

Closed
vstinner opened this issue Sep 23, 2020 · 15 comments
Closed

Add codecs.unregister() to unregister a codec search function #86008

vstinner opened this issue Sep 23, 2020 · 15 comments
Labels
3.10 only security fixes easy stdlib Python modules in the Lib dir topic-unicode

Comments

@vstinner
Copy link
Member

BPO 41842
Nosy @malemburg, @vstinner, @ezio-melotti, @serhiy-storchaka, @shihai1991
PRs
  • bpo-41842: Add a unregister function in _codecs module #22360
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-09-28.22:28:43.585>
    created_at = <Date 2020-09-23.12:56:58.654>
    labels = ['easy', 'library', '3.10', 'expert-unicode']
    title = 'Add codecs.unregister() to unregister a codec search function'
    updated_at = <Date 2020-09-28.22:28:43.585>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2020-09-28.22:28:43.585>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-09-28.22:28:43.585>
    closer = 'vstinner'
    components = ['Library (Lib)', 'Unicode']
    creation = <Date 2020-09-23.12:56:58.654>
    creator = 'vstinner'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 41842
    keywords = ['easy (C)']
    message_count = 15.0
    messages = ['377377', '377378', '377379', '377380', '377381', '377392', '377408', '377572', '377596', '377597', '377600', '377630', '377639', '377641', '377642']
    nosy_count = 5.0
    nosy_names = ['lemburg', 'vstinner', 'ezio.melotti', 'serhiy.storchaka', 'shihai1991']
    pr_nums = ['22360']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue41842'
    versions = ['Python 3.10']

    @vstinner
    Copy link
    Member Author

    Writing an unit test on the Python codecs machinery is facing a practical problem: there is no C nor Python API to unregister a codec search function.

    It's even documented in a note of the codecs.register() function:

    "Note: Search function registration is not currently reversible, which may cause problems in some cases, such as unit testing or module reloading."

    https://docs.python.org/dev/library/codecs.html#codecs.register

    test_codecs contains a long comment about that:

        # There's no way to unregister a codec search function, so we just
        # ensure we render this one fairly harmless after the test
        # case finishes by using the test case repr as the codec name
        # The codecs module normalizes codec names, although this doesn't
        # appear to be formally documented...
        # We also make sure we use a truly unique id for the custom codec
        # to avoid issues with the codec cache when running these tests
        # multiple times (e.g. when hunting for refleaks)
    

    See bpo-22166 which fixed memory leaks in test_codecs.

    In 2011, a Python user requested the function
    https://mail.python.org/pipermail/python-dev/2011-September/113588.html

    Marc-Andre Lemburg explained:

    "There is no API to unregister a codec search function, since deregistration
    would break the codec cache used by the registry to speedup codec
    lookup."

    One simple solution would be to clear the cache (PyInterpreterState.codec_search_cache) when codecs.unregister() removes a search function. I expect that calling unregister() is an uncommon operation, so the performance is not a blocker issue.

    @vstinner vstinner added 3.10 only security fixes stdlib Python modules in the Lib dir topic-unicode labels Sep 23, 2020
    @vstinner
    Copy link
    Member Author

    Writing an unit test on the Python codecs machinery is facing a practical problem: there is no C nor Python API to unregister a codec search function.

    For example, see PR 19069 of bpo-39337.

    @vstinner
    Copy link
    Member Author

    Hai Shi wrote PR 22360 to implement codecs.unregister().

    @malemburg
    Copy link
    Member

    On 23.09.2020 14:56, STINNER Victor wrote:

    Marc-Andre Lemburg explained:

    "There is no API to unregister a codec search function, since deregistration
    would break the codec cache used by the registry to speedup codec
    lookup."

    One simple solution would be to clear the cache (PyInterpreterState.codec_search_cache) when codecs.unregister() removes a search function. I expect that calling unregister() is an uncommon operation, so the performance is not a blocker issue.

    +1

    BTW: While you're at it, having a way to access the search function
    list from Python would be nice as well, since this would then open
    up the possibility to reorder search functions.

    --
    Marc-Andre Lemburg
    eGenix.com

    Professional Python Services directly from the Experts (#1, Sep 23 2020)
    >>> Python Projects, Coaching and Support ...    https://www.egenix.com/
    >>> Python Product Development ...        https://consulting.egenix.com/
    ________________________________________________________________________

    ::: We implement business ideas - efficiently in both time and costs :::

    eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
    Registered at Amtsgericht Duesseldorf: HRB 46611
    https://www.egenix.com/company/contact/
    https://www.malemburg.com/

    @malemburg
    Copy link
    Member

    Just found an internal API which already takes care of
    unregistering a search function: _PyCodec_Forget().

    All that needs to be done is to expose this as codecs.unregister()
    and add the clearing of the lookup cache.

    --
    Marc-Andre Lemburg
    eGenix.com

    Professional Python Services directly from the Experts (#1, Sep 23 2020)
    >>> Python Projects, Coaching and Support ...    https://www.egenix.com/
    >>> Python Product Development ...        https://consulting.egenix.com/
    ________________________________________________________________________

    ::: We implement business ideas - efficiently in both time and costs :::

    eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
    Registered at Amtsgericht Duesseldorf: HRB 46611
    https://www.egenix.com/company/contact/
    https://www.malemburg.com/

    @vstinner
    Copy link
    Member Author

    Just found an internal API which already takes care of
    unregistering a search function: _PyCodec_Forget().

    All that needs to be done is to expose this as codecs.unregister()
    and add the clearing of the lookup cache.

    Yeah, I saw this function, but it's related to the cache, not to the list of search functions.

    BTW: While you're at it, having a way to access the search function
    list from Python would be nice as well, since this would then open
    up the possibility to reorder search functions.

    I didn't hear anyone (ok, apart you) who requested to order search functions.

    I dislike the idea of exposing it, since it introduces the risk that someone "unregisters" a search function simply by removing it from the list, without invalidating the cache.

    I prefer to hide the internals to ensure that the cache remains consistent.

    @malemburg
    Copy link
    Member

    On 23.09.2020 16:49, STINNER Victor wrote:

    STINNER Victor <vstinner@python.org> added the comment:

    > Just found an internal API which already takes care of
    > unregistering a search function: _PyCodec_Forget().
    >
    > All that needs to be done is to expose this as codecs.unregister()
    > and add the clearing of the lookup cache.

    Yeah, I saw this function, but it's related to the cache, not to the list of search functions.

    Ah, right. I just looked at the first occurance of codec_search_path :-)

    > BTW: While you're at it, having a way to access the search function
    > list from Python would be nice as well, since this would then open
    > up the possibility to reorder search functions.

    I didn't hear anyone (ok, apart you) who requested to order search functions.

    This has come up in the past from people who wanted to override
    builtin codecs with their own versions.

    I dislike the idea of exposing it, since it introduces the risk that someone "unregisters" a search function simply by removing it from the list, without invalidating the cache.

    I prefer to hide the internals to ensure that the cache remains consistent.

    Sure, a function would merely return a tuple with the entries,
    not the list itself, e.g. in pseudo code:

    def get_search_path():
        return tuple(interp->codec_search_path)

    For replacing the vanilla setup, this is not needed, since only
    one search function gets registered (the builtin one), so rather
    low priority, I guess.

    --
    Marc-Andre Lemburg
    eGenix.com

    Professional Python Services directly from the Experts (#1, Sep 23 2020)
    >>> Python Projects, Coaching and Support ...    https://www.egenix.com/
    >>> Python Product Development ...        https://consulting.egenix.com/
    ________________________________________________________________________

    ::: We implement business ideas - efficiently in both time and costs :::

    eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
    Registered at Amtsgericht Duesseldorf: HRB 46611
    https://www.egenix.com/company/contact/
    https://www.malemburg.com/

    @serhiy-storchaka
    Copy link
    Member

    If add a function for unregistering a codec search function, it would be worth to add also a function for unregistering an error handler.

    @shihai1991
    Copy link
    Member

    If add a function for unregistering a codec search function, it would be worth to add also a function for unregistering an error handler.

    Registering an error handler have no refleaks when registering multiple search functions.
    +1 if end user need this function.

    @shihai1991
    Copy link
    Member

    oh, sorry, typo error. multiple search functions-->multiple error handler.

    @serhiy-storchaka
    Copy link
    Member

    Although unregistering an error handler may be not so easy, so it is better to open a separate issue for it.

    @vstinner
    Copy link
    Member Author

    New changeset d332e7b by Hai Shi in branch 'master':
    bpo-41842: Add codecs.unregister() function (GH-22360)
    d332e7b

    @vstinner
    Copy link
    Member Author

    PR 22360 required 15 iterations (15 commits) and it changes a lot of code. I wouldn't say that it's an "easy (C)" issue.

    @vstinner
    Copy link
    Member Author

    codecs.unregister() is needed to write unit tests: see PR 19069 of bpo-39337.

    I don't see a strong need to add a new codecs.unregister_error() function, excpet for completeness. I don't think that completeness is enough to justify to add the function, but feel free to open a new issue if you consider that it's really needed.

    @vstinner
    Copy link
    Member Author

    The function is added, I close the issue. Thanks Hai Shi.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes easy stdlib Python modules in the Lib dir topic-unicode
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants