classification
Title: Deprecate codecs.open()
Type: behavior Stage: needs patch
Components: Library (Lib), Unicode Versions: Python 3.4
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: brett.cannon, eric.araujo, ezio.melotti, flox, haypo, lemburg, loewis, meatballhat, petri.lehtinen, pitrou, python-dev, rhettinger, vadmium
Priority: normal Keywords: patch

Created on 2010-05-23 19:27 by haypo, last changed 2014-01-05 13:49 by vadmium.

Files
File name Uploaded Description Edit
deprecate_codecs.patch haypo, 2011-05-23 14:33 review
Messages (20)
msg106339 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-05-23 19:27
codecs module (and codecs.open() function) was added to Python 2.0. codecs.open() creates a StreamReaderWriter object which use two other objects: StreamReader and StreamWriter.

Python 2.6 and 3.0 have a new API: the io module. io.open() creates a TextIOWrapper object which is fully compatible with the file object API (it *is* the (text) file object API :-)). TextIOWrapper supports univeral newline and does better support reading+writing than StreamReaderWriter. TextIOWrapper has a better test suite and is used by default to read and write text files in Python3 (since Python 3.0). The io module has an *optimized* design and the io module was rewritten in C (in Python 2.7 and 3.1).

codecs.open() should be deprecated in Python 3.2 and removed in Python 3.3 (not in Python 2.7). Maybe also StreamReader, StreamWriter and StreamReaderWriter: I don't know if any program use directly these classes, but I think that TextIOWrapper can be used instead.
msg106479 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2010-05-25 21:43
That deprecation is way too fast. If someone wants to write code that works in Python 2.5 or older *and* Python 3 then codecs.open will most likely be how they keep compatibility for reading in encoded files.

But yes, overall it should get deprecated. Probably a PendingDeprecationWarning to start is good and then eventually switch to a DeprecationWarning once most Linux distributions have moved to Python 2.6.
msg106480 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-05-25 21:45
> If someone wants to write code that works in Python 2.5 
> or older *and* Python 3 then codecs.open will most likely
> be how they keep compatibility for reading in encoded files.

Can't 2to3 do the conversion? (codecs.open => open)
msg106481 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2010-05-25 21:49
I'm not talking about those people who use 2to3, I'm talking about those who want source-compatibility between Python 2 and Python 3. So they don't run 2to3 as it just works in Python 3 without modification.
msg116286 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-09-13 07:46
We can reconsider this at some later time, when Python 2.x is not really used much anymore.
msg136199 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-05-17 23:56
Python 3.2 has been published. Can we start deprecating StreamWriter and StreamReader in Python 3.3 (to remove them from Python 3.4)? The doc should explain how to convert code using codecs into code using the io module (it should be simple), and using a StreamReader/StreamWriter should emit a warning.

--

codecs.StreamWriter writes twice the BOM of UTF-8-SIG, UTF-16, UTF-32 encodings if the file is opened in append mode or after a seek(0). Bug fixed in io.TextIOWrapper (issue #5006). io.TextIOWrapper calls also encoder.setstate(0) on a seek different than seek(0), whereas codecs.StreamWriter doesn't (it is not an incremental encoder, it doesn't have the setstate method).

codecs.StreamReader doesn't ignore the BOM of UTF-8-SIG, UTF-16 or UTF-32 encodings after seek(0). Bug fixed in io.TextIOWrapper (issue #4862).

These bugs should maybe be mentioned in the codecs doc, with a pointer to the io module saying that the io module handles these encodings correctly.
msg136200 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-05-18 00:00
> ... once most Linux distributions have moved to Python 2.6

Debian uses Python 2.6 by default since it's last stable release (Squeeze). I think that it was the last distro using Python 2.5 by default.
msg136212 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2011-05-18 08:25
STINNER Victor wrote:
> 
> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
> 
> Python 3.2 has been published. Can we start deprecating StreamWriter and StreamReader in Python 3.3 (to remove them from Python 3.4)? The doc should explain how to convert code using codecs into code using the io module (it should be simple), and using a StreamReader/StreamWriter should emit a warning.

This ticket is about deprecating codecs.open(), not about
StreamWriter and StreamReader.

The arguments mentioned here against doing that anytime soon
still stand.

I'm -1 on deprecating StreamWriter and StreamReader as they provide
different mechanisms than the io layer which has a specific focus
on files and buffers.

> --
> 
> codecs.StreamWriter writes twice the BOM of UTF-8-SIG, UTF-16, UTF-32 encodings if the file is opened in append mode or after a seek(0). Bug fixed in io.TextIOWrapper (issue #5006). io.TextIOWrapper calls also encoder.setstate(0) on a seek different than seek(0), whereas codecs.StreamWriter doesn't (it is not an incremental encoder, it doesn't have the setstate method).
> 
> codecs.StreamReader doesn't ignore the BOM of UTF-8-SIG, UTF-16 or UTF-32 encodings after seek(0). Bug fixed in io.TextIOWrapper (issue #4862).
> 
> These bugs should maybe be mentioned in the codecs doc, with a pointer to the io module saying that the io module handles these encodings correctly.

Those are not bugs of the generic codecs.StreamWriter/StreamReader
implementations or their concept. They are bugs in those specific
codecs.

The codecs StreamWriter and StreamReader concept was explicitly
designed to be able to have state. However, the generic implementation
does not make use of such state for the purpose of writing special
beginning-of-file markers - that's just way to specific for general
purpose implementations. They do use state to implement buffered
reads.

It would certainly be possible to make the implementations of
the codecs you mentioned smarter to handle writing BOMs correctly,
e.g. by making use of the incremental encoder/decoders, if there's
interest.
msg136216 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-05-18 09:44
> This ticket is about deprecating codecs.open(), not about
> StreamWriter and StreamReader.

Right. I may open a different issue.

Can we start by modifying codecs.open() to use the builtin open() (to reuse TextIOWrapper)?

> I'm -1 on deprecating StreamWriter and StreamReader as they provide
> different mechanisms than the io layer which has a specific focus
> on files and buffers.

What are the usecases of StreamReader and StreamWriter, not covered by TextIOWrapper?

TextIOWrapper are used in Python for:

 - files (e.g. open)
 - processes (e.g. open.popen)
 - emails (e.g. mailbox.Message)
 - sockets (e.g. socket.makefile)
 - and maybe other things

StreamReader and StreamWriter are used for:

 - read/write files in Sphinx 1.0.7 (written for Python 2)
 - write the output in pygment 1.3.1 (written for Python 2)
 - but not in the Python interpreter or standard library

*Quick* search of other usages of StreamReader and StreamWriter on the WWW:

 - twisted/mail/imap4.py
 - feeds2imap implements a 'mod-utf-7' codec, pyflag implements a 'ms-pst' codec, pygsm implements a 'gsm0338' codec, so they have StreamReader and StreamWriter classes (but I don't know if these classes are used)

> It would certainly be possible to make the implementations of
> the codecs you mentioned smarter to handle writing BOMs correctly,
> e.g. by making use of the incremental encoder/decoders, if there's
> interest.

Yes, it is possible to fix StreamReader and StreamWriter classes of the mentionned codecs, but it's not possible to write a generic fix in codecs.py. This is exactly why I dislike StreamReader and StreamWriter: they are not incremental and so don't have reset() or setstate() methods. When you implement a StreamReader or StreamWriter class, you have to reimpelment a pseudo-incremental encoder. Compare for example IncrementalEncoder and StreamWriter classes of UTF-16: most code is duplicated.

Because StreamReader and StreamWriter are not incremental, they are not efficient, and it's difficult to handle some issues like BOM which require to handle the codec state.

TextIOWrapper "simply" reuses incremental encoders and decoders, and so use reset() and setstate() methods.
msg136617 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-05-23 11:59
If there are use cases of Stream{Reader,Writer} which are not covered by TextIOWrapper, it would be nice to know so that we can improve TextIOWrapper. After all, there should be one obvious way to do it ;)

By the way, something interesting (probably unintended):

>>> codecs.open("LICENSE", "r")
<_io.TextIOWrapper name='LICENSE' mode='r' encoding='UTF-8'>
>>> codecs.open("LICENSE", "r", encoding="utf-8")
<codecs.StreamReaderWriter object at 0x7f71846ac840>
msg136649 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-05-23 14:33
deprecate_codecs.patch: "Deprecate open(), StreamReader, StreamWriter, StreamReaderWriter, StreamRecord and EncodedFile() of the codec module. Use the builtin open() function or io.TextIOWrapper instead."

EncodedFile() and StreamRecord cannot be replaced easily by open() or TextIOWrapper. But do we still need this function? In 2002, Martin von Loewis wrote "I never found this class useful."
http://mail.python.org/pipermail/python-dev/2002-August/027491.html

It is maybe no more useful with Python 3 which process all text data as Unicode, copy/paste of the mail thread:
------------
> In a well-designed designed application, you should not need to say
> this. The inside world should use Unicode objects.

Agreed, but if you want to port an existing application to
the Unicode world, it sometimes helps.
------------

Deprecated in Python 3.3, the related code will be removed in Python 3.4.
msg136666 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2011-05-23 16:11
Closing the ticket again.

We still need codecs.open() to support applications that target Python 2.x and 3.x.

You can reopen it after Python 2.x has been end-of-life'd.
msg136671 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-05-23 16:20
Le lundi 23 mai 2011 à 16:11 +0000, Marc-Andre Lemburg a écrit :
> We still need codecs.open() to support applications that target Python 2.x and 3.x.

io.TextIOWrapper exists in Python 2.6 and 2.7, and 2to3 can simply
replace codecs.open() by open().
msg136672 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2011-05-23 16:21
Correcting the title: this ticket is about codecs.open(), not StreamRead and StreamWriter, both of which are essential parts of the Python codec machinery and are needed to be able to implement per-codec implementations of codecs which read from and write to streams.

TextIOWrapper() is conceptually something completely different. It's more something like StreamReaderWriter().

The point about having them use incremental codecs for encoding and decoding is a good one and would need to be investigated. If possible, we could use incremental encoders/decoders for the standard StreamReader/Writer base classes or add new IncrementalStreamReader/Writer classes which then use the IncrementalEncode/Decoder per default.

Please open a new ticket for this.

Thanks.
msg136698 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-05-23 21:23
> TextIOWrapper() is conceptually something completely different. It's
> more something like StreamReaderWriter().

That's a rather strange assertion. Can you expand?
TextIOWrapper supports read-only, write-only, read-write, unseekable and
seekable streams.
msg136700 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2011-05-23 22:02
Antoine Pitrou wrote:
> 
> Antoine Pitrou <pitrou@free.fr> added the comment:
> 
>> TextIOWrapper() is conceptually something completely different. It's
>> more something like StreamReaderWriter().
> 
> That's a rather strange assertion. Can you expand?
> TextIOWrapper supports read-only, write-only, read-write, unseekable and
> seekable streams.

StreamReader and StreamWriter classes provide the base codec
implementations for stateful interaction with streams. They
define the interface and provide a working implementation for
those codecs that choose not to implement their own variants.

Each codec can, however, implement variants which are optimized
for the specific encoding or intercept certain stream methods
to add functionality or improve the encoding/decoding
performance.

Both are essential parts of the codec interface.

TextIOWrapper and StreamReaderWriter are merely wrappers
around streams that make use of the codecs. They don't
provide any codec logic themselves. That's the conceptual
difference.
msg137017 - (view) Author: Roundup Robot (python-dev) Date: 2011-05-26 23:54
New changeset 3555cf6f9c98 by Victor Stinner in branch 'default':
Issue #8796: codecs.open() calls the builtin open() function instead of using
http://hg.python.org/cpython/rev/3555cf6f9c98
msg137031 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2011-05-27 07:39
Roundup Robot wrote:
> 
> Roundup Robot <devnull@devnull> added the comment:
> 
> New changeset 3555cf6f9c98 by Victor Stinner in branch 'default':
> Issue #8796: codecs.open() calls the builtin open() function instead of using
> http://hg.python.org/cpython/rev/3555cf6f9c98

Viktor, could you please back out this change again.

I am -1 on deprecating the StreamReader/Writer parts of the codec API
as I've mentioned numerous times and *don't* want to see these
deprecated in the code or the documentation.

I'm -0 on the change to codecs.open(). Have you checked whether the
returned objects are compatible ?

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

________________________________________________________________________
2011-05-23: Released eGenix mx Base 3.2.0      http://python.egenix.com/
2011-05-25: Released mxODBC 3.1.1              http://python.egenix.com/
2011-06-20: EuroPython 2011, Florence, Italy               24 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/
msg137058 - (view) Author: Roundup Robot (python-dev) Date: 2011-05-27 14:50
New changeset 4d2ddd86b531 by Victor Stinner in branch 'default':
Revert my commit 3555cf6f9c98: "Issue #8796: codecs.open() calls the builtin
http://hg.python.org/cpython/rev/4d2ddd86b531
msg185126 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2013-03-24 10:38
I suggest to deprecated codecs.open() in 3.4, and possibly remove it in a later release.  The implementation shouldn't be changed to use the builtin open(), but the deprecation note should point to it, and possibly mention the shortcomings of codecs.open().
History
Date User Action Args
2014-01-05 13:49:32vadmiumsetnosy: + vadmium
2013-03-24 13:48:51floxsetnosy: + flox
2013-03-24 10:38:13ezio.melottisetstatus: closed -> open

type: behavior
versions: + Python 3.4, - Python 3.2, Python 3.3
nosy: + ezio.melotti

messages: + msg185126
resolution: postponed ->
stage: needs patch
2011-05-27 14:50:50python-devsetmessages: + msg137058
2011-05-27 07:39:49lemburgsetmessages: + msg137031
2011-05-26 23:54:03python-devsetnosy: + python-dev
messages: + msg137017
2011-05-24 07:41:45lemburgsettitle: Deprecate codecs.open(), codecs.StreamReader and codecs.StreamWriter -> Deprecate codecs.open()
2011-05-24 06:53:39petri.lehtinensetnosy: + petri.lehtinen
2011-05-23 22:02:04lemburgsetmessages: + msg136700
title: Deprecate codecs.open(), codecs.StreamReader and codecs.StreamWriter -> Deprecate codecs.open(), codecs.StreamReader and codecs.StreamWriter
2011-05-23 21:23:30pitrousetmessages: + msg136698
2011-05-23 16:21:20lemburgsetmessages: + msg136672
2011-05-23 16:20:43hayposetmessages: + msg136671
2011-05-23 16:11:58lemburgsetstatus: open -> closed
resolution: postponed
messages: + msg136666
2011-05-23 14:35:26hayposetstatus: open
2011-05-23 14:34:57hayposettitle: Deprecate codecs.open() -> Deprecate codecs.open(), codecs.StreamReader and codecs.StreamWriter
2011-05-23 14:33:46hayposetfiles: + deprecate_codecs.patch
keywords: + patch
messages: + msg136649
2011-05-23 11:59:49pitrousetmessages: + msg136617
2011-05-18 09:44:03hayposetmessages: + msg136216
2011-05-18 08:25:55lemburgsetmessages: + msg136212
2011-05-18 02:11:38rhettingersetnosy: + rhettinger
2011-05-18 00:00:27hayposetmessages: + msg136200
2011-05-17 23:56:51hayposetstatus: closed -> (no value)
2011-05-17 23:56:45hayposetresolution: postponed -> (no value)
messages: + msg136199
2010-09-13 07:46:52lemburgsetstatus: open -> closed
resolution: postponed
messages: + msg116286
2010-09-13 01:32:02eric.araujosetnosy: + eric.araujo
2010-05-25 21:49:57brett.cannonsetmessages: + msg106481
2010-05-25 21:45:22hayposetmessages: + msg106480
2010-05-25 21:43:27brett.cannonsetnosy: + brett.cannon
messages: + msg106479
2010-05-25 00:49:07meatballhatsetnosy: + meatballhat
2010-05-23 19:40:31pitrousetnosy: + lemburg, loewis
2010-05-23 19:27:53haypocreate