Issue41842
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2020-09-23 12:56 by vstinner, last changed 2022-04-11 14:59 by admin. This issue is now closed.
Pull Requests | |||
---|---|---|---|
URL | Status | Linked | Edit |
PR 22360 | merged | shihai1991, 2020-09-23 15:05 |
Messages (15) | |||
---|---|---|---|
msg377377 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2020-09-23 12:56 | |
Writing an unit test on the Python codecs machinery is facing a practical problem: there is no C nor Python API to unregister a codec search function. It's even documented in a note of the codecs.register() function: "Note: Search function registration is not currently reversible, which may cause problems in some cases, such as unit testing or module reloading." https://docs.python.org/dev/library/codecs.html#codecs.register test_codecs contains a long comment about that: # There's no way to unregister a codec search function, so we just # ensure we render this one fairly harmless after the test # case finishes by using the test case repr as the codec name # The codecs module normalizes codec names, although this doesn't # appear to be formally documented... # We also make sure we use a truly unique id for the custom codec # to avoid issues with the codec cache when running these tests # multiple times (e.g. when hunting for refleaks) See bpo-22166 which fixed memory leaks in test_codecs. In 2011, a Python user requested the function https://mail.python.org/pipermail/python-dev/2011-September/113588.html Marc-Andre Lemburg explained: "There is no API to unregister a codec search function, since deregistration would break the codec cache used by the registry to speedup codec lookup." One simple solution would be to clear the cache (PyInterpreterState.codec_search_cache) when codecs.unregister() removes a search function. I expect that calling unregister() is an uncommon operation, so the performance is not a blocker issue. |
|||
msg377378 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2020-09-23 12:57 | |
> Writing an unit test on the Python codecs machinery is facing a practical problem: there is no C nor Python API to unregister a codec search function. For example, see PR 19069 of bpo-39337. |
|||
msg377379 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2020-09-23 12:58 | |
Hai Shi wrote PR 22360 to implement codecs.unregister(). |
|||
msg377380 - (view) | Author: Marc-Andre Lemburg (lemburg) * ![]() |
Date: 2020-09-23 13:04 | |
On 23.09.2020 14:56, STINNER Victor wrote: > Marc-Andre Lemburg explained: > > "There is no API to unregister a codec search function, since deregistration > would break the codec cache used by the registry to speedup codec > lookup." > > One simple solution would be to clear the cache (PyInterpreterState.codec_search_cache) when codecs.unregister() removes a search function. I expect that calling unregister() is an uncommon operation, so the performance is not a blocker issue. +1 BTW: While you're at it, having a way to access the search function list from Python would be nice as well, since this would then open up the possibility to reorder search functions. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Sep 23 2020) >>> Python Projects, Coaching and Support ... https://www.egenix.com/ >>> Python Product Development ... https://consulting.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/ |
|||
msg377381 - (view) | Author: Marc-Andre Lemburg (lemburg) * ![]() |
Date: 2020-09-23 13:06 | |
Just found an internal API which already takes care of unregistering a search function: _PyCodec_Forget(). All that needs to be done is to expose this as codecs.unregister() and add the clearing of the lookup cache. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Sep 23 2020) >>> Python Projects, Coaching and Support ... https://www.egenix.com/ >>> Python Product Development ... https://consulting.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/ |
|||
msg377392 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2020-09-23 14:49 | |
> Just found an internal API which already takes care of > unregistering a search function: _PyCodec_Forget(). > > All that needs to be done is to expose this as codecs.unregister() > and add the clearing of the lookup cache. Yeah, I saw this function, but it's related to the cache, not to the list of search functions. > BTW: While you're at it, having a way to access the search function > list from Python would be nice as well, since this would then open > up the possibility to reorder search functions. I didn't hear anyone (ok, apart you) who requested to order search functions. I dislike the idea of exposing it, since it introduces the risk that someone "unregisters" a search function simply by removing it from the list, without invalidating the cache. I prefer to hide the internals to ensure that the cache remains consistent. |
|||
msg377408 - (view) | Author: Marc-Andre Lemburg (lemburg) * ![]() |
Date: 2020-09-23 16:51 | |
On 23.09.2020 16:49, STINNER Victor wrote: > > STINNER Victor <vstinner@python.org> added the comment: > >> Just found an internal API which already takes care of >> unregistering a search function: _PyCodec_Forget(). >> >> All that needs to be done is to expose this as codecs.unregister() >> and add the clearing of the lookup cache. > > Yeah, I saw this function, but it's related to the cache, not to the list of search functions. Ah, right. I just looked at the first occurance of codec_search_path :-) >> BTW: While you're at it, having a way to access the search function >> list from Python would be nice as well, since this would then open >> up the possibility to reorder search functions. > > I didn't hear anyone (ok, apart you) who requested to order search functions. This has come up in the past from people who wanted to override builtin codecs with their own versions. > I dislike the idea of exposing it, since it introduces the risk that someone "unregisters" a search function simply by removing it from the list, without invalidating the cache. > > I prefer to hide the internals to ensure that the cache remains consistent. Sure, a function would merely return a tuple with the entries, not the list itself, e.g. in pseudo code: def get_search_path(): return tuple(interp->codec_search_path) For replacing the vanilla setup, this is not needed, since only one search function gets registered (the builtin one), so rather low priority, I guess. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Sep 23 2020) >>> Python Projects, Coaching and Support ... https://www.egenix.com/ >>> Python Product Development ... https://consulting.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/ |
|||
msg377572 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2020-09-27 14:47 | |
If add a function for unregistering a codec search function, it would be worth to add also a function for unregistering an error handler. |
|||
msg377596 - (view) | Author: Hai Shi (shihai1991) * ![]() |
Date: 2020-09-28 11:27 | |
> If add a function for unregistering a codec search function, it would be worth to add also a function for unregistering an error handler. Registering an error handler have no refleaks when registering multiple search functions. +1 if end user need this function. |
|||
msg377597 - (view) | Author: Hai Shi (shihai1991) * ![]() |
Date: 2020-09-28 11:29 | |
oh, sorry, typo error. multiple search functions-->multiple error handler. |
|||
msg377600 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2020-09-28 15:08 | |
Although unregistering an error handler may be not so easy, so it is better to open a separate issue for it. |
|||
msg377630 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2020-09-28 21:41 | |
New changeset d332e7b8164c3c9c885b9e631f33d9517b628b75 by Hai Shi in branch 'master': bpo-41842: Add codecs.unregister() function (GH-22360) https://github.com/python/cpython/commit/d332e7b8164c3c9c885b9e631f33d9517b628b75 |
|||
msg377639 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2020-09-28 22:25 | |
PR 22360 required 15 iterations (15 commits) and it changes a lot of code. I wouldn't say that it's an "easy (C)" issue. |
|||
msg377641 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2020-09-28 22:27 | |
codecs.unregister() is needed to write unit tests: see PR 19069 of bpo-39337. I don't see a strong need to add a new codecs.unregister_error() function, excpet for completeness. I don't think that completeness is enough to justify to add the function, but feel free to open a new issue if you consider that it's really needed. |
|||
msg377642 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2020-09-28 22:27 | |
The function is added, I close the issue. Thanks Hai Shi. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:35 | admin | set | github: 86008 |
2020-09-28 22:28:43 | vstinner | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
2020-09-28 22:27:54 | vstinner | set | messages: + msg377642 |
2020-09-28 22:27:25 | vstinner | set | messages: + msg377641 |
2020-09-28 22:25:13 | vstinner | set | messages: + msg377639 |
2020-09-28 21:41:18 | vstinner | set | messages: + msg377630 |
2020-09-28 15:08:43 | serhiy.storchaka | set | keywords:
+ easy (C), - patch messages: + msg377600 |
2020-09-28 11:29:41 | shihai1991 | set | messages: + msg377597 |
2020-09-28 11:27:25 | shihai1991 | set | messages: + msg377596 |
2020-09-27 14:47:15 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages: + msg377572 |
2020-09-23 16:51:36 | lemburg | set | messages: + msg377408 |
2020-09-23 15:05:20 | shihai1991 | set | keywords:
+ patch nosy: + shihai1991 pull_requests: + pull_request21424 stage: patch review |
2020-09-23 14:49:54 | vstinner | set | messages: + msg377392 |
2020-09-23 13:06:03 | lemburg | set | messages: + msg377381 |
2020-09-23 13:04:00 | lemburg | set | nosy:
+ lemburg messages: + msg377380 |
2020-09-23 12:58:12 | vstinner | set | messages: + msg377379 |
2020-09-23 12:57:47 | vstinner | set | messages: + msg377378 |
2020-09-23 12:56:58 | vstinner | create |