classification
Title: Remove unicode_internal codec
Type: Stage: resolved
Components: Unicode Versions: Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, inada.naoki, lemburg, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2019-03-15 05:32 by inada.naoki, last changed 2019-03-18 10:08 by inada.naoki. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 12342 merged inada.naoki, 2019-03-15 12:42
Messages (9)
msg337965 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2019-03-15 05:32
unicode_internal codec is deprecated since Python 3.3.
It raises DeprecationWarning from 3.3.

>>> "hello".encode('unicode_internal')
__main__:1: DeprecationWarning: unicode_internal codec has been deprecated
b'h\x00\x00\x00e\x00\x00\x00l\x00\x00\x00l\x00\x00\x00o\x00\x00\x00'

May I remove it in 3.8?
msg337976 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-15 09:26
I found:

* _PyUnicode_DecodeUnicodeInternal()
* _codecs.unicode_internal_decode()
* _codecs.unicode_internal_encode()
* Lib/encodings/unicode_internal.py

Files which contain "unicode_internal":

Doc/library/codecs.rst
Doc/whatsnew/3.3.rst
Lib/encodings/unicode_internal.py
Lib/test/test_codeccallbacks.py
Lib/test/test_codecs.py
Lib/test/test_unicode.py
Misc/HISTORY
Modules/_codecsmodule.c
Modules/clinic/_codecsmodule.c.h
Objects/unicodeobject.c
PCbuild/lib.pyproj


> May I remove it in 3.8?

Since using the codec emits a DeprecationWarning at runtime, I think that it's safe to remove it.
msg338000 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-03-15 16:35
What is the purpose of the unicode-internal codec at first place?
msg338005 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2019-03-15 16:51
On 15.03.2019 17:35, Serhiy Storchaka wrote:
> 
> What is the purpose of the unicode-internal codec at first place?

It provides a fast and direct access to the internal representation of
Unicode used in Python to the outside world.
msg338006 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-03-15 16:55
Is it for debugging only?
msg338009 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2019-03-15 17:05
On 15.03.2019 17:55, Serhiy Storchaka wrote:
> Is it for debugging only?

No, you can use it to store Unicode object as-is without any
encoding/decoding, but after the recent changes to the internals
of the Unicode implementation it's not all that useful anymore,
since we now have per object state which is not reflected by the
codec.
msg338164 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2019-03-18 06:44
New changeset 6a16b18224fa98f6d192aa5014affeccc0376eb3 by Inada Naoki in branch 'master':
bpo-36297: remove "unicode_internal" codec (GH-12342)
https://github.com/python/cpython/commit/6a16b18224fa98f6d192aa5014affeccc0376eb3
msg338184 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-18 09:34
Thanks INADA-san. IMHO Python has too many codecs, it's painful to maintain them. So it's nice to see deprecate ones to be removed.

Next step: remove all deprecated APIs using Py_UNICODE* :-D (I know that Serhiy is working on that.)
msg338190 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2019-03-18 10:08
I tried to remove all legacy API and wchar_t cache in unicodeobject.  This is experimental branch.
https://github.com/methane/cpython/pull/18/files


I'm thinking about adding configure option to remove them from 3.8.

* It may help people to find third party extensions using legacy API.
* Projects which doesn't use such third party extension can use this option to reduce some memory usage (8 byte for all unicode object).
History
Date User Action Args
2019-03-18 10:08:24inada.naokisetmessages: + msg338190
2019-03-18 09:34:20vstinnersetmessages: + msg338184
2019-03-18 06:44:28inada.naokisetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2019-03-18 06:44:13inada.naokisetmessages: + msg338164
2019-03-15 17:05:40lemburgsetmessages: + msg338009
2019-03-15 16:55:22serhiy.storchakasetmessages: + msg338006
2019-03-15 16:51:45lemburgsetnosy: + lemburg
messages: + msg338005
2019-03-15 16:35:33serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg338000
2019-03-15 12:42:35inada.naokisetkeywords: + patch
stage: patch review
pull_requests: + pull_request12309
2019-03-15 09:26:11vstinnersetmessages: + msg337976
2019-03-15 05:32:43inada.naokicreate