classification
Title: Add codecs.unregister() to unregister a codec search function
Type: Stage: resolved
Components: Library (Lib), Unicode Versions: Python 3.10
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, lemburg, serhiy.storchaka, shihai1991, vstinner
Priority: normal Keywords: easy (C)

Created on 2020-09-23 12:56 by vstinner, last changed 2020-09-28 22:28 by vstinner. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 22360 merged shihai1991, 2020-09-23 15:05
Messages (15)
msg377377 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-09-23 12:56
Writing an unit test on the Python codecs machinery is facing a practical problem: there is no C nor Python API to unregister a codec search function.

It's even documented in a note of the codecs.register() function:

"Note: Search function registration is not currently reversible, which may cause problems in some cases, such as unit testing or module reloading."

https://docs.python.org/dev/library/codecs.html#codecs.register

test_codecs contains a long comment about that:

        # There's no way to unregister a codec search function, so we just
        # ensure we render this one fairly harmless after the test
        # case finishes by using the test case repr as the codec name
        # The codecs module normalizes codec names, although this doesn't
        # appear to be formally documented...
        # We also make sure we use a truly unique id for the custom codec
        # to avoid issues with the codec cache when running these tests
        # multiple times (e.g. when hunting for refleaks)

See bpo-22166 which fixed memory leaks in test_codecs.

In 2011, a Python user requested the function
https://mail.python.org/pipermail/python-dev/2011-September/113588.html

Marc-Andre Lemburg explained:

"There is no API to unregister a codec search function, since deregistration
would break the codec cache used by the registry to speedup codec
lookup."

One simple solution would be to clear the cache (PyInterpreterState.codec_search_cache) when codecs.unregister() removes a search function. I expect that calling unregister() is an uncommon operation, so the performance is not a blocker issue.
msg377378 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-09-23 12:57
> Writing an unit test on the Python codecs machinery is facing a practical problem: there is no C nor Python API to unregister a codec search function.

For example, see PR 19069 of bpo-39337.
msg377379 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-09-23 12:58
Hai Shi wrote PR 22360 to implement codecs.unregister().
msg377380 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2020-09-23 13:04
On 23.09.2020 14:56, STINNER Victor wrote:
> Marc-Andre Lemburg explained:
> 
> "There is no API to unregister a codec search function, since deregistration
> would break the codec cache used by the registry to speedup codec
> lookup."
> 
> One simple solution would be to clear the cache (PyInterpreterState.codec_search_cache) when codecs.unregister() removes a search function. I expect that calling unregister() is an uncommon operation, so the performance is not a blocker issue.

+1

BTW: While you're at it, having a way to access the search function
list from Python would be nice as well, since this would then open
up the possibility to reorder search functions.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Sep 23 2020)
>>> Python Projects, Coaching and Support ...    https://www.egenix.com/
>>> Python Product Development ...        https://consulting.egenix.com/
________________________________________________________________________

::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               https://www.egenix.com/company/contact/
                     https://www.malemburg.com/
msg377381 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2020-09-23 13:06
Just found an internal API which already takes care of
unregistering a search function: _PyCodec_Forget().

All that needs to be done is to expose this as codecs.unregister()
and add the clearing of the lookup cache.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Sep 23 2020)
>>> Python Projects, Coaching and Support ...    https://www.egenix.com/
>>> Python Product Development ...        https://consulting.egenix.com/
________________________________________________________________________

::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               https://www.egenix.com/company/contact/
                     https://www.malemburg.com/
msg377392 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-09-23 14:49
> Just found an internal API which already takes care of
> unregistering a search function: _PyCodec_Forget().
>
> All that needs to be done is to expose this as codecs.unregister()
> and add the clearing of the lookup cache.

Yeah, I saw this function, but it's related to the cache, not to the list of search functions.


> BTW: While you're at it, having a way to access the search function
> list from Python would be nice as well, since this would then open
> up the possibility to reorder search functions.

I didn't hear anyone (ok, apart you) who requested to order search functions.

I dislike the idea of exposing it, since it introduces the risk that someone "unregisters" a search function simply by removing it from the list, without invalidating the cache.

I prefer to hide the internals to ensure that the cache remains consistent.
msg377408 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2020-09-23 16:51
On 23.09.2020 16:49, STINNER Victor wrote:
> 
> STINNER Victor <vstinner@python.org> added the comment:
> 
>> Just found an internal API which already takes care of
>> unregistering a search function: _PyCodec_Forget().
>>
>> All that needs to be done is to expose this as codecs.unregister()
>> and add the clearing of the lookup cache.
> 
> Yeah, I saw this function, but it's related to the cache, not to the list of search functions.

Ah, right. I just looked at the first occurance of codec_search_path :-)

>> BTW: While you're at it, having a way to access the search function
>> list from Python would be nice as well, since this would then open
>> up the possibility to reorder search functions.
> 
> I didn't hear anyone (ok, apart you) who requested to order search functions.

This has come up in the past from people who wanted to override
builtin codecs with their own versions.

> I dislike the idea of exposing it, since it introduces the risk that someone "unregisters" a search function simply by removing it from the list, without invalidating the cache.
> 
> I prefer to hide the internals to ensure that the cache remains consistent.

Sure, a function would merely return a tuple with the entries,
not the list itself, e.g. in pseudo code:

def get_search_path():
    return tuple(interp->codec_search_path)

For replacing the vanilla setup, this is not needed, since only
one search function gets registered (the builtin one), so rather
low priority, I guess.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Sep 23 2020)
>>> Python Projects, Coaching and Support ...    https://www.egenix.com/
>>> Python Product Development ...        https://consulting.egenix.com/
________________________________________________________________________

::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               https://www.egenix.com/company/contact/
                     https://www.malemburg.com/
msg377572 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-09-27 14:47
If add a function for unregistering a codec search function, it would be worth to add also a function for unregistering an error handler.
msg377596 - (view) Author: hai shi (shihai1991) * Date: 2020-09-28 11:27
> If add a function for unregistering a codec search function, it would be worth to add also a function for unregistering an error handler.

Registering an error handler have no refleaks when registering multiple search functions.
+1 if end user need this function.
msg377597 - (view) Author: hai shi (shihai1991) * Date: 2020-09-28 11:29
oh, sorry, typo error. multiple search functions-->multiple error handler.
msg377600 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-09-28 15:08
Although unregistering an error handler may be not so easy, so it is better to open a separate issue for it.
msg377630 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-09-28 21:41
New changeset d332e7b8164c3c9c885b9e631f33d9517b628b75 by Hai Shi in branch 'master':
bpo-41842: Add codecs.unregister() function (GH-22360)
https://github.com/python/cpython/commit/d332e7b8164c3c9c885b9e631f33d9517b628b75
msg377639 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-09-28 22:25
PR 22360 required 15 iterations (15 commits) and it changes a lot of code. I wouldn't say that it's an "easy (C)" issue.
msg377641 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-09-28 22:27
codecs.unregister() is needed to write unit tests: see PR 19069 of bpo-39337.

I don't see a strong need to add a new codecs.unregister_error() function, excpet for completeness. I don't think that completeness is enough to justify to add the function, but feel free to open a new issue if you consider that it's really needed.
msg377642 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-09-28 22:27
The function is added, I close the issue. Thanks Hai Shi.
History
Date User Action Args
2020-09-28 22:28:43vstinnersetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2020-09-28 22:27:54vstinnersetmessages: + msg377642
2020-09-28 22:27:25vstinnersetmessages: + msg377641
2020-09-28 22:25:13vstinnersetmessages: + msg377639
2020-09-28 21:41:18vstinnersetmessages: + msg377630
2020-09-28 15:08:43serhiy.storchakasetkeywords: + easy (C), - patch

messages: + msg377600
2020-09-28 11:29:41shihai1991setmessages: + msg377597
2020-09-28 11:27:25shihai1991setmessages: + msg377596
2020-09-27 14:47:15serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg377572
2020-09-23 16:51:36lemburgsetmessages: + msg377408
2020-09-23 15:05:20shihai1991setkeywords: + patch
nosy: + shihai1991

pull_requests: + pull_request21424
stage: patch review
2020-09-23 14:49:54vstinnersetmessages: + msg377392
2020-09-23 13:06:03lemburgsetmessages: + msg377381
2020-09-23 13:04:00lemburgsetnosy: + lemburg
messages: + msg377380
2020-09-23 12:58:12vstinnersetmessages: + msg377379
2020-09-23 12:57:47vstinnersetmessages: + msg377378
2020-09-23 12:56:58vstinnercreate