msg106339 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-05-23 19:27 |
codecs module (and codecs.open() function) was added to Python 2.0. codecs.open() creates a StreamReaderWriter object which use two other objects: StreamReader and StreamWriter.
Python 2.6 and 3.0 have a new API: the io module. io.open() creates a TextIOWrapper object which is fully compatible with the file object API (it *is* the (text) file object API :-)). TextIOWrapper supports univeral newline and does better support reading+writing than StreamReaderWriter. TextIOWrapper has a better test suite and is used by default to read and write text files in Python3 (since Python 3.0). The io module has an *optimized* design and the io module was rewritten in C (in Python 2.7 and 3.1).
codecs.open() should be deprecated in Python 3.2 and removed in Python 3.3 (not in Python 2.7). Maybe also StreamReader, StreamWriter and StreamReaderWriter: I don't know if any program use directly these classes, but I think that TextIOWrapper can be used instead.
|
msg106479 - (view) |
Author: Brett Cannon (brett.cannon) *  |
Date: 2010-05-25 21:43 |
That deprecation is way too fast. If someone wants to write code that works in Python 2.5 or older *and* Python 3 then codecs.open will most likely be how they keep compatibility for reading in encoded files.
But yes, overall it should get deprecated. Probably a PendingDeprecationWarning to start is good and then eventually switch to a DeprecationWarning once most Linux distributions have moved to Python 2.6.
|
msg106480 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-05-25 21:45 |
> If someone wants to write code that works in Python 2.5
> or older *and* Python 3 then codecs.open will most likely
> be how they keep compatibility for reading in encoded files.
Can't 2to3 do the conversion? (codecs.open => open)
|
msg106481 - (view) |
Author: Brett Cannon (brett.cannon) *  |
Date: 2010-05-25 21:49 |
I'm not talking about those people who use 2to3, I'm talking about those who want source-compatibility between Python 2 and Python 3. So they don't run 2to3 as it just works in Python 3 without modification.
|
msg116286 - (view) |
Author: Marc-Andre Lemburg (lemburg) *  |
Date: 2010-09-13 07:46 |
We can reconsider this at some later time, when Python 2.x is not really used much anymore.
|
msg136199 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2011-05-17 23:56 |
Python 3.2 has been published. Can we start deprecating StreamWriter and StreamReader in Python 3.3 (to remove them from Python 3.4)? The doc should explain how to convert code using codecs into code using the io module (it should be simple), and using a StreamReader/StreamWriter should emit a warning.
--
codecs.StreamWriter writes twice the BOM of UTF-8-SIG, UTF-16, UTF-32 encodings if the file is opened in append mode or after a seek(0). Bug fixed in io.TextIOWrapper (issue #5006). io.TextIOWrapper calls also encoder.setstate(0) on a seek different than seek(0), whereas codecs.StreamWriter doesn't (it is not an incremental encoder, it doesn't have the setstate method).
codecs.StreamReader doesn't ignore the BOM of UTF-8-SIG, UTF-16 or UTF-32 encodings after seek(0). Bug fixed in io.TextIOWrapper (issue #4862).
These bugs should maybe be mentioned in the codecs doc, with a pointer to the io module saying that the io module handles these encodings correctly.
|
msg136200 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2011-05-18 00:00 |
> ... once most Linux distributions have moved to Python 2.6
Debian uses Python 2.6 by default since it's last stable release (Squeeze). I think that it was the last distro using Python 2.5 by default.
|
msg136212 - (view) |
Author: Marc-Andre Lemburg (lemburg) *  |
Date: 2011-05-18 08:25 |
STINNER Victor wrote:
>
> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
>
> Python 3.2 has been published. Can we start deprecating StreamWriter and StreamReader in Python 3.3 (to remove them from Python 3.4)? The doc should explain how to convert code using codecs into code using the io module (it should be simple), and using a StreamReader/StreamWriter should emit a warning.
This ticket is about deprecating codecs.open(), not about
StreamWriter and StreamReader.
The arguments mentioned here against doing that anytime soon
still stand.
I'm -1 on deprecating StreamWriter and StreamReader as they provide
different mechanisms than the io layer which has a specific focus
on files and buffers.
> --
>
> codecs.StreamWriter writes twice the BOM of UTF-8-SIG, UTF-16, UTF-32 encodings if the file is opened in append mode or after a seek(0). Bug fixed in io.TextIOWrapper (issue #5006). io.TextIOWrapper calls also encoder.setstate(0) on a seek different than seek(0), whereas codecs.StreamWriter doesn't (it is not an incremental encoder, it doesn't have the setstate method).
>
> codecs.StreamReader doesn't ignore the BOM of UTF-8-SIG, UTF-16 or UTF-32 encodings after seek(0). Bug fixed in io.TextIOWrapper (issue #4862).
>
> These bugs should maybe be mentioned in the codecs doc, with a pointer to the io module saying that the io module handles these encodings correctly.
Those are not bugs of the generic codecs.StreamWriter/StreamReader
implementations or their concept. They are bugs in those specific
codecs.
The codecs StreamWriter and StreamReader concept was explicitly
designed to be able to have state. However, the generic implementation
does not make use of such state for the purpose of writing special
beginning-of-file markers - that's just way to specific for general
purpose implementations. They do use state to implement buffered
reads.
It would certainly be possible to make the implementations of
the codecs you mentioned smarter to handle writing BOMs correctly,
e.g. by making use of the incremental encoder/decoders, if there's
interest.
|
msg136216 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2011-05-18 09:44 |
> This ticket is about deprecating codecs.open(), not about
> StreamWriter and StreamReader.
Right. I may open a different issue.
Can we start by modifying codecs.open() to use the builtin open() (to reuse TextIOWrapper)?
> I'm -1 on deprecating StreamWriter and StreamReader as they provide
> different mechanisms than the io layer which has a specific focus
> on files and buffers.
What are the usecases of StreamReader and StreamWriter, not covered by TextIOWrapper?
TextIOWrapper are used in Python for:
- files (e.g. open)
- processes (e.g. open.popen)
- emails (e.g. mailbox.Message)
- sockets (e.g. socket.makefile)
- and maybe other things
StreamReader and StreamWriter are used for:
- read/write files in Sphinx 1.0.7 (written for Python 2)
- write the output in pygment 1.3.1 (written for Python 2)
- but not in the Python interpreter or standard library
*Quick* search of other usages of StreamReader and StreamWriter on the WWW:
- twisted/mail/imap4.py
- feeds2imap implements a 'mod-utf-7' codec, pyflag implements a 'ms-pst' codec, pygsm implements a 'gsm0338' codec, so they have StreamReader and StreamWriter classes (but I don't know if these classes are used)
> It would certainly be possible to make the implementations of
> the codecs you mentioned smarter to handle writing BOMs correctly,
> e.g. by making use of the incremental encoder/decoders, if there's
> interest.
Yes, it is possible to fix StreamReader and StreamWriter classes of the mentionned codecs, but it's not possible to write a generic fix in codecs.py. This is exactly why I dislike StreamReader and StreamWriter: they are not incremental and so don't have reset() or setstate() methods. When you implement a StreamReader or StreamWriter class, you have to reimpelment a pseudo-incremental encoder. Compare for example IncrementalEncoder and StreamWriter classes of UTF-16: most code is duplicated.
Because StreamReader and StreamWriter are not incremental, they are not efficient, and it's difficult to handle some issues like BOM which require to handle the codec state.
TextIOWrapper "simply" reuses incremental encoders and decoders, and so use reset() and setstate() methods.
|
msg136617 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2011-05-23 11:59 |
If there are use cases of Stream{Reader,Writer} which are not covered by TextIOWrapper, it would be nice to know so that we can improve TextIOWrapper. After all, there should be one obvious way to do it ;)
By the way, something interesting (probably unintended):
>>> codecs.open("LICENSE", "r")
<_io.TextIOWrapper name='LICENSE' mode='r' encoding='UTF-8'>
>>> codecs.open("LICENSE", "r", encoding="utf-8")
<codecs.StreamReaderWriter object at 0x7f71846ac840>
|
msg136649 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2011-05-23 14:33 |
deprecate_codecs.patch: "Deprecate open(), StreamReader, StreamWriter, StreamReaderWriter, StreamRecord and EncodedFile() of the codec module. Use the builtin open() function or io.TextIOWrapper instead."
EncodedFile() and StreamRecord cannot be replaced easily by open() or TextIOWrapper. But do we still need this function? In 2002, Martin von Loewis wrote "I never found this class useful."
http://mail.python.org/pipermail/python-dev/2002-August/027491.html
It is maybe no more useful with Python 3 which process all text data as Unicode, copy/paste of the mail thread:
------------
> In a well-designed designed application, you should not need to say
> this. The inside world should use Unicode objects.
Agreed, but if you want to port an existing application to
the Unicode world, it sometimes helps.
------------
Deprecated in Python 3.3, the related code will be removed in Python 3.4.
|
msg136666 - (view) |
Author: Marc-Andre Lemburg (lemburg) *  |
Date: 2011-05-23 16:11 |
Closing the ticket again.
We still need codecs.open() to support applications that target Python 2.x and 3.x.
You can reopen it after Python 2.x has been end-of-life'd.
|
msg136671 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2011-05-23 16:20 |
Le lundi 23 mai 2011 à 16:11 +0000, Marc-Andre Lemburg a écrit :
> We still need codecs.open() to support applications that target Python 2.x and 3.x.
io.TextIOWrapper exists in Python 2.6 and 2.7, and 2to3 can simply
replace codecs.open() by open().
|
msg136672 - (view) |
Author: Marc-Andre Lemburg (lemburg) *  |
Date: 2011-05-23 16:21 |
Correcting the title: this ticket is about codecs.open(), not StreamRead and StreamWriter, both of which are essential parts of the Python codec machinery and are needed to be able to implement per-codec implementations of codecs which read from and write to streams.
TextIOWrapper() is conceptually something completely different. It's more something like StreamReaderWriter().
The point about having them use incremental codecs for encoding and decoding is a good one and would need to be investigated. If possible, we could use incremental encoders/decoders for the standard StreamReader/Writer base classes or add new IncrementalStreamReader/Writer classes which then use the IncrementalEncode/Decoder per default.
Please open a new ticket for this.
Thanks.
|
msg136698 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2011-05-23 21:23 |
> TextIOWrapper() is conceptually something completely different. It's
> more something like StreamReaderWriter().
That's a rather strange assertion. Can you expand?
TextIOWrapper supports read-only, write-only, read-write, unseekable and
seekable streams.
|
msg136700 - (view) |
Author: Marc-Andre Lemburg (lemburg) *  |
Date: 2011-05-23 22:02 |
Antoine Pitrou wrote:
>
> Antoine Pitrou <pitrou@free.fr> added the comment:
>
>> TextIOWrapper() is conceptually something completely different. It's
>> more something like StreamReaderWriter().
>
> That's a rather strange assertion. Can you expand?
> TextIOWrapper supports read-only, write-only, read-write, unseekable and
> seekable streams.
StreamReader and StreamWriter classes provide the base codec
implementations for stateful interaction with streams. They
define the interface and provide a working implementation for
those codecs that choose not to implement their own variants.
Each codec can, however, implement variants which are optimized
for the specific encoding or intercept certain stream methods
to add functionality or improve the encoding/decoding
performance.
Both are essential parts of the codec interface.
TextIOWrapper and StreamReaderWriter are merely wrappers
around streams that make use of the codecs. They don't
provide any codec logic themselves. That's the conceptual
difference.
|
msg137017 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2011-05-26 23:54 |
New changeset 3555cf6f9c98 by Victor Stinner in branch 'default':
Issue #8796: codecs.open() calls the builtin open() function instead of using
http://hg.python.org/cpython/rev/3555cf6f9c98
|
msg137031 - (view) |
Author: Marc-Andre Lemburg (lemburg) *  |
Date: 2011-05-27 07:39 |
Roundup Robot wrote:
>
> Roundup Robot <devnull@devnull> added the comment:
>
> New changeset 3555cf6f9c98 by Victor Stinner in branch 'default':
> Issue #8796: codecs.open() calls the builtin open() function instead of using
> http://hg.python.org/cpython/rev/3555cf6f9c98
Viktor, could you please back out this change again.
I am -1 on deprecating the StreamReader/Writer parts of the codec API
as I've mentioned numerous times and *don't* want to see these
deprecated in the code or the documentation.
I'm -0 on the change to codecs.open(). Have you checked whether the
returned objects are compatible ?
Thanks,
--
Marc-Andre Lemburg
eGenix.com
________________________________________________________________________
2011-05-23: Released eGenix mx Base 3.2.0 http://python.egenix.com/
2011-05-25: Released mxODBC 3.1.1 http://python.egenix.com/
2011-06-20: EuroPython 2011, Florence, Italy 24 days to go
::: Try our new mxODBC.Connect Python Database Interface for free ! ::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/
|
msg137058 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2011-05-27 14:50 |
New changeset 4d2ddd86b531 by Victor Stinner in branch 'default':
Revert my commit 3555cf6f9c98: "Issue #8796: codecs.open() calls the builtin
http://hg.python.org/cpython/rev/4d2ddd86b531
|
msg185126 - (view) |
Author: Ezio Melotti (ezio.melotti) *  |
Date: 2013-03-24 10:38 |
I suggest to deprecated codecs.open() in 3.4, and possibly remove it in a later release. The implementation shouldn't be changed to use the builtin open(), but the deprecation note should point to it, and possibly mention the shortcomings of codecs.open().
|
msg297124 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2017-06-28 01:36 |
I proposed this idea multiple times, but it's backward incompatible and more generally seen as a bad issue, since there are very specific use cases for codecs.open(). So I just close the issue.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:01 | admin | set | github: 53042 |
2018-02-10 14:07:34 | THRlWiTi | set | nosy:
+ THRlWiTi
|
2017-06-28 01:36:05 | vstinner | set | status: open -> closed resolution: rejected messages:
+ msg297124
stage: needs patch -> resolved |
2015-01-26 04:31:41 | berker.peksag | set | nosy:
+ berker.peksag
|
2014-01-05 13:49:32 | martin.panter | set | nosy:
+ martin.panter
|
2013-03-24 13:48:51 | flox | set | nosy:
+ flox
|
2013-03-24 10:38:13 | ezio.melotti | set | status: closed -> open
type: behavior versions:
+ Python 3.4, - Python 3.2, Python 3.3 nosy:
+ ezio.melotti
messages:
+ msg185126 resolution: postponed -> (no value) stage: needs patch |
2011-05-27 14:50:50 | python-dev | set | messages:
+ msg137058 |
2011-05-27 07:39:49 | lemburg | set | messages:
+ msg137031 |
2011-05-26 23:54:03 | python-dev | set | nosy:
+ python-dev messages:
+ msg137017
|
2011-05-24 07:41:45 | lemburg | set | title: Deprecate codecs.open(), codecs.StreamReader and codecs.StreamWriter -> Deprecate codecs.open() |
2011-05-24 06:53:39 | petri.lehtinen | set | nosy:
+ petri.lehtinen
|
2011-05-23 22:02:04 | lemburg | set | messages:
+ msg136700 title: Deprecate codecs.open(), codecs.StreamReader and codecs.StreamWriter -> Deprecate codecs.open(), codecs.StreamReader and codecs.StreamWriter |
2011-05-23 21:23:30 | pitrou | set | messages:
+ msg136698 |
2011-05-23 16:21:20 | lemburg | set | messages:
+ msg136672 |
2011-05-23 16:20:43 | vstinner | set | messages:
+ msg136671 |
2011-05-23 16:11:58 | lemburg | set | status: open -> closed resolution: postponed messages:
+ msg136666
|
2011-05-23 14:35:26 | vstinner | set | status: open |
2011-05-23 14:34:57 | vstinner | set | title: Deprecate codecs.open() -> Deprecate codecs.open(), codecs.StreamReader and codecs.StreamWriter |
2011-05-23 14:33:46 | vstinner | set | files:
+ deprecate_codecs.patch keywords:
+ patch messages:
+ msg136649
|
2011-05-23 11:59:49 | pitrou | set | messages:
+ msg136617 |
2011-05-18 09:44:03 | vstinner | set | messages:
+ msg136216 |
2011-05-18 08:25:55 | lemburg | set | messages:
+ msg136212 |
2011-05-18 02:11:38 | rhettinger | set | nosy:
+ rhettinger
|
2011-05-18 00:00:27 | vstinner | set | messages:
+ msg136200 |
2011-05-17 23:56:51 | vstinner | set | status: closed -> (no value) |
2011-05-17 23:56:45 | vstinner | set | resolution: postponed -> (no value) messages:
+ msg136199 |
2010-09-13 07:46:52 | lemburg | set | status: open -> closed resolution: postponed messages:
+ msg116286
|
2010-09-13 01:32:02 | eric.araujo | set | nosy:
+ eric.araujo
|
2010-05-25 21:49:57 | brett.cannon | set | messages:
+ msg106481 |
2010-05-25 21:45:22 | vstinner | set | messages:
+ msg106480 |
2010-05-25 21:43:27 | brett.cannon | set | nosy:
+ brett.cannon messages:
+ msg106479
|
2010-05-25 00:49:07 | meatballhat | set | nosy:
+ meatballhat
|
2010-05-23 19:40:31 | pitrou | set | nosy:
+ lemburg, loewis
|
2010-05-23 19:27:53 | vstinner | create | |