Issue 7475: codecs missing: base64 bz2 hex zlib hex_codec ...

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/51724

classification

Title:	codecs missing: base64 bz2 hex zlib hex_codec ...
Type:	enhancement	Stage:	resolved
Components:	Library (Lib), Unicode	Versions:	Python 3.4

process

Status:	closed	Resolution:	fixed
Dependencies:	17828 17839 17844	Superseder:
Assigned To:	ncoghlan	Nosy List:	barry, belopolsky, benjamin.peterson, cben, eric.araujo, ezio.melotti, flox, georg.brandl, gregory.p.smith, isoschiz, jcea, jwilk, lemburg, loewis, ncoghlan, pconnell, petri.lehtinen, python-dev, r.david.murray, serhiy.storchaka, ssbarnea, vstinner
Priority:	normal	Keywords:	patch

Created on 2009-12-10 22:27 by flox, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
issue7475_warning.diff	flox, 2009-12-11 09:26	Patch for documentation and warnings in 2.7	review
issue7475_missing_codecs_py3k.diff	flox, 2009-12-11 17:05	Patch, apply to trunk
issue7475_restore_codec_aliases_in_py34.diff	ncoghlan, 2013-11-17 07:41	Patch to restore the transform aliases.	review

Messages (95)
msg96218 - (view)	Author: Florent Xicluna (flox) *	Date: 2009-12-10 22:27
AFAIK these codecs were not ported to Python 3. 1. I found no hint in documentation on this matter. 2. Is it possible to contribute some of them, or there's a good reason to look elsewhere?
msg96223 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2009-12-10 23:15
These are not encodings, in that they don't convert characters to bytes. It was a mistake that they were integrated into the codecs interfaces in Python 2.x; this mistake is corrected in 3.x.
msg96226 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2009-12-10 23:25
Martin v. Löwis wrote: > > Martin v. Löwis <martin@v.loewis.de> added the comment: > > These are not encodings, in that they don't convert characters to bytes. > It was a mistake that they were integrated into the codecs interfaces in > Python 2.x; this mistake is corrected in 3.x. Martin, I beg your pardon, but these codecs indeed implement valid encodings and the fact that these codecs were removed was a mistake. They should be readded to Python 3.x. Note that just because a codec doesn't convert between bytes and characters only, doesn't make it wrong in any way. The codec architecture in Python is designed to support same type encodings just as well as ones between bytes and characters.
msg96227 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2009-12-10 23:26
Reopening the ticket.
msg96228 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2009-12-10 23:28
It's not possible to add these codecs back. Bytes objects (correctly) don't have an encode method, and string objects (correctly) don't have a decode method. The codec architecture of Python 3.x just doesn't support this kind of application; the codec architecture of 2.x was flawed.
msg96232 - (view)	Author: Benjamin Peterson (benjamin.peterson) *	Date: 2009-12-11 02:09
I agree with Martin. gzip and bz2 convert bytes to bytes. Encodings deal strictly with unicode -> bytes.
msg96236 - (view)	Author: Florent Xicluna (flox) *	Date: 2009-12-11 08:21
«Everything you thought you knew about binary data and Unicode has changed.» Reopening for the documentation part. This "mistake" deserves some words in the documentation: docs.python.org/dev/py3k/whatsnew/3.0.html #text-vs-data-instead-of-unicode-vs-8-bit And the conversion may be automated with 2to3, maybe.
msg96237 - (view)	Author: Florent Xicluna (flox) *	Date: 2009-12-11 08:31
Is it possible to add "DeprecationWarning" for these codecs when using "python -3" ? >>> {}.has_key('a') __main__:1: DeprecationWarning: dict.has_key() not supported in 3.x; use the in operator False >>> print `123` <stdin>:1: SyntaxWarning: backquote not supported in 3.x; use repr() 123 >>> 'abc'.encode('base64') 'YWJj\n'
msg96240 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2009-12-11 09:46
Martin v. Löwis wrote: > > Martin v. Löwis <martin@v.loewis.de> added the comment: > > It's not possible to add these codecs back. Bytes objects (correctly) > don't have an encode method, and string objects (correctly) don't have a > decode method. The codec architecture of Python 3.x just doesn't support > this kind of application; the codec architecture of 2.x was flawed. Of course it does support these kinds of codecs. The codec architecture hasn't changed between 2.x and 3.x, just the way a few methods work. All we agreed to is that unicode.encode() will only return bytes, while bytes.decode() will only return unicode. So the methods won't support same type conversions, because Guido didn't want to have methods that return different types based on the chosen parameter (the codec name in this case). However, you can still use codecs.encode() and codecs.decode() to work with codecs that return different combinations of types. I explicitly added that support back to 3.0. You can't argue that just because two methods don't support a certain type combination, the whole architecture doesn't support this anymore. Also note that codecs allow a much more far-reaching use than just through the unicode and bytes methods: you can use them as seamless wrappers for streams, subclass from them, use their methods directly, etc. etc. So your argument that just because the two methods don't support these codecs anymore is just not good enough to warrant their removal.
msg96242 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2009-12-11 09:56
Benjamin Peterson wrote: > > Benjamin Peterson <benjamin@python.org> added the comment: > > I agree with Martin. gzip and bz2 convert bytes to bytes. Encodings deal > strictly with unicode -> bytes. Sorry, Bejamin, but that's simply not true. Codecs can work with arbitrary types, it's just that the helper methods on unicode and bytes objects only support one combination of types in Python 3.x. codecs.encode()/.decode() provide access to all codecs, regardless of their supported type combinations and of course, you can use them directly via the codec registry, subclass from them, etc.
msg96243 - (view)	Author: Florent Xicluna (flox) *	Date: 2009-12-11 10:22
Thinking about it, I am +1 to reimplement the codecs. We could implement new methods to replace the old one. (similar to base64.encodebytes and base64.decodebytes) >>> b'abc'.encodebytes('base64') b'YWJj\n' >>> b'abc'.encodebytes('zlib').encodebytes('base64') b'eJxLTEoGAAJNASc=\n' >>> b'UHl0aG9u'.decodebytes('base64').decode('utf-8') 'Python'
msg96251 - (view)	Author: Benjamin Peterson (benjamin.peterson) *	Date: 2009-12-11 12:54
2009/12/11 Marc-Andre Lemburg <report@bugs.python.org>: > codecs.encode()/.decode() provide access to all codecs, regardless > of their supported type combinations and of course, you can use > them directly via the codec registry, subclass from them, etc. Didn't you have a proposal for bytes.transform/untransform for operations like this?
msg96253 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2009-12-11 13:13
Benjamin Peterson wrote: > > Benjamin Peterson <benjamin@python.org> added the comment: > > 2009/12/11 Marc-Andre Lemburg <report@bugs.python.org>: >> codecs.encode()/.decode() provide access to all codecs, regardless >> of their supported type combinations and of course, you can use >> them directly via the codec registry, subclass from them, etc. > > Didn't you have a proposal for bytes.transform/untransform for > operations like this? Yes. At the time it was postponed, since I brought it up late in the 3.0 release process. Perhaps I should bring it up again. Note that those methods are just convenient helpers to access the codecs and as such only provide limited functionality. The full machinery itself is accessible via the codecs module and the code in the encodings package. Any decision to include a codec or not needs to be based on whether it fits the framework in those modules/packages, not the functionality we expose on unicode and bytes objects.
msg96265 - (view)	Author: Florent Xicluna (flox) *	Date: 2009-12-11 17:05
I've ported the codecs from Py2: base64, bytes_escape, bz2, hex, quopri, rot13, uu and zlib It's not a big deal. Basically: - StringIO.StringIO --> io.BytesIO - 'string_escape' --> 'bytes_escape' Will add documentation if we agree on the feature.
msg96277 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2009-12-11 23:09
> codecs.encode()/.decode() provide access to all codecs, regardless > of their supported type combinations and of course, you can use > them directly via the codec registry, subclass from them, etc. I presume that the OP didn't talk about codecs.encode, but about the methods on string objects. flox, can you clarify what precisely it is that you miss?
msg96295 - (view)	Author: Florent Xicluna (flox) *	Date: 2009-12-12 15:40
Martin, actually, I was trying to convert some piece of code from python2 to python3. And this statement was not converted by 2to3: "x.decode('base64').decode('zlib')" So, I read the official documentation, and found no hint about the removal of these codecs. For my specific use case, I can use "zlib.decompress" and "base64.decodebytes", but I find that the ".encode()" and ".decode()" helpers were useful in Python 2. I don't know all the background of the removal of these codecs. But I try to contribute to Python, and help Python 3 become at least as featureful, and useful, as Python 2. So, after reading the above comments, I think we may end up with following changes: * restore the "bytes-to-bytes" codecs in the "encodings" package * then create new helpers on bytes objects (either ".transform()/.untransform()" or ".encodebytes()/.decodebytes")
msg96296 - (view)	Author: Florent Xicluna (flox) *	Date: 2009-12-12 15:44
> And this statement was not converted s/this statement/this method call/
msg96301 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2009-12-12 19:25
> So, after reading the above comments, I think we may end up with > following changes: > * restore the "bytes-to-bytes" codecs in the "encodings" package > * then create new helpers on bytes objects (either > ".transform()/.untransform()" or ".encodebytes()/.decodebytes") I would still be opposed to such a change, and I think it needs a PEP. If the codecs are restored, one half of them becomes available to .encode/.decode methods, since the codec registry cannot tell which ones implement real character encodings, and which ones are other conversion methods. So adding them would be really confusing. I also wonder why you are opposed to the import statement. My recommendation is indeed that you use the official API for these libraries (and indeed, there is an official API for each of them, unlike real codecs, which don't have any other documented API).
msg96374 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2009-12-14 10:30
Martin v. Löwis wrote: > > Martin v. Löwis <martin@v.loewis.de> added the comment: > >> So, after reading the above comments, I think we may end up with >> following changes: >> * restore the "bytes-to-bytes" codecs in the "encodings" package +1 >> * then create new helpers on bytes objects (either >> ".transform()/.untransform()" or ".encodebytes()/.decodebytes") +1 - the names are still up for debate, IIRC. > I would still be opposed to such a change, and I think it needs a PEP. All this has already been discussed and the only reason it didn't go in earlier was timing. No need for a PEP. > If the codecs are restored, one half of them becomes available to > .encode/.decode methods, since the codec registry cannot tell which > ones implement real character encodings, and which ones are other > conversion methods. So adding them would be really confusing. Not at all. The helper methods check the return types and raise an exception if the types don't match the expected types. The codecs registry itself doesn't need to know about the possible input/output types of codecs, since this information is not required to match a name to an implementation. What we could do, is add that information to the CodecInfo object used for registering the codec. codecs.lookup() would then return the information to the application. E.g. .encode_input_types = (str,) .encode_output_types = (bytes,) .decode_input_types = (bytes,) .decode_output_types = (str,) Codecs not supporting these CodecInfo attributes would simply return None. > I also wonder why you are opposed to the import statement. My > recommendation is indeed that you use the official API for these > libraries (and indeed, there is an official API for each of them, > unlike real codecs, which don't have any other documented API). That's not the point. The codec API provides a standardized API for all these encodings. The hex, zlib, bz2, etc. codecs are just adapters of the different pre-existing APIs to the codec API.
msg96632 - (view)	Author: Georg Brandl (georg.brandl) *	Date: 2009-12-19 18:09
I also seem to recall that adding .transform()/.untransform() was already accepted at some point.
msg106669 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-05-28 13:45
I agree with Martin: codecs choosed the wrong direction in Python2, and it's fixed in Python3. The codecs module is related to charsets (encodings), should encode str to bytes, and should decode bytes (or any read buffer) to str. Eg. rot13 "encodes" str to str. "base64 bz2 hex zlib ...": use base64, bz2, binascii and zlib modules for that. The documentation should be fixed (explain how to port code from Python2 to Python3). It's maybe possible for write some 2to3 fixers for the following examples: "...".encode("base64") => base64.b64encode("...") "...".encode("rot13") => do nothing (but display a warning?) "...".encode("zlib") => zlib.compress("...") "...".encode("hex") => base64.b16encode("...") "...".encode("bz2") => bz2.compress("...") "...".decode("base64") => base64.b64decode("...") "...".decode("rot13") => do nothing (but display a warning?) "...".decode("zlib") => zlib.decompress("...") "...".decode("hex") => base64.b16decode("...") "...".decode("bz2") => bz2.decompress("...")
msg106670 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-05-28 13:48
Explanation the change in Python3 by Guido: "We are adopting a slightly different approach to codecs: while in Python 2, codecs can accept either Unicode or 8-bits as input and produce either as output, in Py3k, encoding is always a translation from a Unicode (text) string to an array of bytes, and decoding always goes the opposite direction. This means that we had to drop a few codecs that don't fit in this model, for example rot13, base64 and bz2 (those conversions are still supported, just not through the encode/decode API)." http://www.artima.com/weblogs/viewpost.jsp?thread=208549 -- See also issue #8838.
msg106674 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2010-05-28 14:17
STINNER Victor wrote: > > STINNER Victor <victor.stinner@haypocalc.com> added the comment: > > I agree with Martin: codecs choosed the wrong direction in Python2, and it's fixed in Python3. The codecs module is related to charsets (encodings), should encode str to bytes, and should decode bytes (or any read buffer) to str. No, that's just not right: the codec system in Python does not mandate the types used or accepted by the codecs. The only change that was applied in Python3 was to make sure that the str.encode() and bytes.decode() methods always return the same type to assure type-safety. Python2 does not apply that check, but instead provides a direct interface to codecs.encode() and codecs.decode(). Please don't mix the helper methods on those objects with what the codec system was designed for. The helper methods apply a strategy that's more constrained than the codec system. The addition of .transform() and .untransform() for same type conversions was discussed in 2008, but didn't make it into 3.0 since I hadn't had time to add the methods: http://mail.python.org/pipermail/python-3000/2008-August/014533.html http://mail.python.org/pipermail/python-3000/2008-August/014533.html http://mail.python.org/pipermail/python-3000/2008-August/014534.html The removed codecs don't rely on the helper methods in any way. They are easily usable via codecs.encode() and codecs.decode() even without .transform() and .untransform(). Esp. the hex codec is very handy and at least in our eGenix code base in wide-spread use. Using a single well-defined interface to such encodings is just much more user friendly than having to research the different APIs for each of them.
msg107057 - (view)	Author: Éric Araujo (eric.araujo) *	Date: 2010-06-04 14:12
Related: bytes vs. str for base64 encoding in email, #8896
msg107794 - (view)	Author: Sorin Sbarnea (ssbarnea) *	Date: 2010-06-14 15:35
I would like to know what happened with hex_codec and what is the new py3 for this. Also, it would be really helpful to see DeprecationWarnings for all these codecs in py2x and include a note in py3 changelist. The official python documentation from http://docs.python.org/library/codecs.html lists them as valid without any signs of them as being dropped or replaced.
msg109872 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2010-07-10 14:24
> I would like to know what happened with hex_codec and what is the new py3 for this. If you had read this bug report, you'd know that the codec was removed in Python 3. Use binascii.hexlify/binascii.unhexlify instead (as you should in 2.x, also).
msg109876 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2010-07-10 15:24
Martin v. Löwis wrote: > > Martin v. Löwis <martin@v.loewis.de> added the comment: > >> I would like to know what happened with hex_codec and what is the new py3 for this. > > If you had read this bug report, you'd know that the codec was removed > in Python 3. Use binascii.hexlify/binascii.unhexlify instead (as you > should in 2.x, also). ... or wait for Python 3.2 which will readd them :-)
msg109879 - (view)	Author: Georg Brandl (georg.brandl) *	Date: 2010-07-10 15:36
... but don't wait to long to add them!
msg109894 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2010-07-10 17:06
Georg Brandl wrote: > > Georg Brandl <georg@python.org> added the comment: > > ... but don't wait to long to add them! I plan to work on that after EuroPython. Florent already provided the patch for the codecs, so what's left is adding the .transform()/ .untransform() methods, and perhaps tweak the codec input/output types in a couple of cases.
msg109904 - (view)	Author: Éric Araujo (eric.araujo) *	Date: 2010-07-10 18:14
I am confused by MvL’s reply. From the first paragraph documentation for binascii: “Normally, you will not use these functions directly but use wrapper modules like uu, base64, or binhex instead. The binascii module contains low-level functions written in C for greater speed that are used by the higher-level modules.” Is the doc not accurate? Also, can someone not unsure about the status of this report edit the type, stage, component and resolution? It would be helpful.
msg109905 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2010-07-10 18:35
> I am confused by MvL’s reply. From the first paragraph documentation > for binascii: “Normally, you will not use these functions directly > but use wrapper modules like uu, base64, or binhex instead. The > binascii module contains low-level functions written in C for greater > speed that are used by the higher-level modules.” > > Is the doc not accurate? It is correct. So use base64.b16encode/b16decode then. It's just that I personally prefer hexlify/unhexlify, because I can memorize the function name better.
msg123090 - (view)	Author: Georg Brandl (georg.brandl) *	Date: 2010-12-02 18:08
Codecs brought back and (un)transform implemented in r86934.
msg123154 - (view)	Author: Alexander Belopolsky (belopolsky) *	Date: 2010-12-03 01:40
I am probably a bit late to this discussion, but why these things should be called "codecs" and why should they share the registry with the encodings? It looks like the proper term would be "transformations" or "transforms".
msg123206 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2010-12-03 08:46
Alexander Belopolsky wrote: > > Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment: > > I am probably a bit late to this discussion, but why these things should be called "codecs" and why should they share the registry with the encodings? It looks like the proper term would be "transformations" or "transforms". .transform() is just the name of the method. The codecs are still just that: codecs, i.e. objects that encode and decode data. The types they support are defined by the codecs, not by the helper methods. In Python3, the str and bytes methods .encode() and .decode() will only support str->bytes->str conversions. The new str and bytes .transform() method adds back str->str and bytes->bytes. The codec subsystem does not impose restrictions on the type combinations a codec can support, and that's per design.
msg123435 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2010-12-05 19:04
As per http://mail.python.org/pipermail/python-dev/2010-December/106374.html I think this checkin should be reverted, as it's breaking the language moratorium.
msg123436 - (view)	Author: Georg Brandl (georg.brandl) *	Date: 2010-12-05 19:12
I leave this to MAL, on whose behalf I finished this to be in time for beta.
msg123462 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2010-12-06 11:49
Martin v. Löwis wrote: > > Martin v. Löwis <martin@v.loewis.de> added the comment: > > As per > > http://mail.python.org/pipermail/python-dev/2010-December/106374.html > > I think this checkin should be reverted, as it's breaking the language moratorium. I've asked Guido. We may have to revert the addition of the new methods and then readd them for 3.3, but I don't really see them as difficult to implement for the other Python implementations, since they are just interfaces to the codec sub-system. The readdition of the codecs and changes to support them in the codec system do not fall under the moratorium, since they are stdlib changes.
msg123693 - (view)	Author: Alexander Belopolsky (belopolsky) *	Date: 2010-12-09 18:43
With Georg's approval, I am reopening this issue until a decision is made on whether {str,bytes,bytearray}.{transform,untransform} methods should go into 3.2. I am adding Guido to "nosy" because the decision may turn on the interpretation of his post. [1] I also started a python-dev thread on this issue. [2] [1] http://mail.python.org/pipermail/python-dev/2010-December/106374.html [2] http://mail.python.org/pipermail/python-dev/2010-December/106617.html
msg125073 - (view)	Author: STINNER Victor (vstinner) *	Date: 2011-01-02 19:01
See issue #10807: 'base64' can be used with bytes.decode() (and str.encode()), but it raises a confusing exception (TypeError: expected bytes, not memoryview).
msg145246 - (view)	Author: Éric Araujo (eric.araujo) *	Date: 2011-10-09 09:18
So. This was reverted before 3.2 was out, right? What is the status for 3.3?
msg145656 - (view)	Author: STINNER Victor (vstinner) *	Date: 2011-10-17 00:53
What is the status of this issue? rot13 codecs & friends were added back to Python 3.2 with {bytes,str}.(un)transform() methods: commit 7e4833764c88. Codecs were disabled because of surprising error messages before the release of Python 3.2 final: issue #10807, commit ff1261a14573. transform() and untransform() methods were also removed, I don't remember why/how exactly, maybe because new codecs were disabled. So we have rot13 & friends in Python 3.2 and 3.3, but they cannot be used with the regular str.encode('rot13'), you have to write (for example): >>> codecs.getdecoder('rot_13')('rot13') ('ebg13', 5) >>> codecs.getencoder('rot_13')('ebg13') ('rot13', 5) The major issue with {bytes,str}.(un)transform() is that we have only one registry for all codecs, and the registry was changed in Python 3 to ensure: * encode: str->bytes * decode: bytes->str To implement str.transform(), we need another register. Marc-Andre suggested (msg96374) to add tags to codecs: """ .encode_input_types = (str,) .encode_output_types = (bytes,) .decode_input_types = (bytes,) .decode_output_types = (str,) """ I'm still opposed to str->str (rot13) and bytes->bytes (hex, gzip, ...) operations using the codecs API. Developers have to use the right module. If the API of these modules is too complex, we should add helpers to these modules, but not to builtin types. Builtin types have to be and stay simple and well defined.
msg145693 - (view)	Author: Éric Araujo (eric.araujo) *	Date: 2011-10-17 13:38
> transform() and untransform() methods were also removed, I don't remember why/how exactly, I don’t remember either; maybe it was too late in the release process, or we lacked enough consensus. > So we have rot13 & friends in Python 3.2 and 3.3, but they cannot be used with the regular > str.encode('rot13'), you have to write (for example): codecs.getdecoder('rot_13') Ah, great, I thought they were not available at all! > The major issue with {bytes,str}.(un)transform() is that we have only one registry for all > codecs, and the registry was changed in Python 3 [...] To implement str.transform(), we need > another register. Marc-Andre suggested (msg96374) to add tags to codecs I’m confused: does the tags idea replace the idea of adding another registry? > I'm still opposed to str->str (rot13) and bytes->bytes (hex, gzip, ...) operations using the > codecs API. Developers have to use the right module. Well, here I disagree with you and agree with MAL: str.encode and bytes.decode are strict, but the codec API in general is not restricted to str→bytes and bytes→str directions. Using the zlib or base64 modules vs. the codecs is a matter of style; sometimes you think it looks hacky, sometimes you think it’s very handy. And rot13 only exists as a codec!
msg145897 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2011-10-19 11:35
They were removed because adding new methods to builtin types violated the language moratorium. Now that the language moratorium is over, the transform/untransform convenience APIs should be added again for 3.3. It's an approved change, the original timing was just wrong.
msg145900 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2011-10-19 11:58
Sorry, I meant to state my rationale for the unassignment - I'm assuming this issue is covered by MAL's recent decision to step away from Unicode and codec maintenance issues. If that's incorrect, MAL can reclaim the issue, otherwise unassigning leaves it open for whoever wants to move it forward.
msg145979 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2011-10-19 22:09
Some further comments after getting back up to speed with the actual status of this problem (i.e. that we had issues with the error checking and reporting in the original 3.2 commit). 1. I agree with the position that the codecs module itself is intended to be a type neutral codec registry. It encodes and decodes things, but shouldn't actually care about the types involved. If that is currently not the case in 3.x, it needs to be fixed. This type neutrality was blurred in 2.x by the fact that it only implemented str->str translations, and even further obscured by the coupling to the .encode() and .decode() convenience APIs. The fact that the type neutrality of the registry itself is currently broken in 3.x is a regression, not an improvement. (The convenience APIs, on the other hand, are definitely not type neutral, and aren't intended to be) 2. To assist in producing nice error messages, and to allow restrictions to be enforced on type-specific convenience APIs, the CodecInfo objects should grow additional state as MAL suggests. To avoid redundancy (and inaccurate overspecification), my suggested colour for that particular bikeshed is: Character encoding codec: .decoded_format = 'text' .encoded_format = 'binary' Binary transform codec: .decoded_format = 'binary' .encoded_format = 'binary' Text transform codec: .decoded_format = 'text' .encoded_format = 'text' I suggest using the fuzzy format labels mainly due to the existence of the buffer API - most codec operations that consume binary data will accept anything that implements the buffer API, so referring specifically to 'bytes' in error messages would be inaccurate. The convenience APIs can then emit errors like: 'a'.encode('rot_13') ==> CodecLookupError: text <-> binary codec expected ('rot_13' is text <-> text) 'a'.decode('rot_13') ==> CodecLookupError: text <-> binary codec expected ('rot_13' is text <-> text) 'a'.transform('bz2') ==> CodecLookupError: text <-> text codec expected ('bz2' is binary <-> binary) 'a'.transform('ascii') ==> CodecLookupError: text <-> text codec expected ('ascii' is text <-> binary) b'a'.transform('ascii') ==> CodecLookupError: binary <-> binary codec expected ('ascii' is text <-> binary) For backwards compatibility with 3.2, codecs that do not specify their formats should be treated as character encoding codecs (i.e. decoded format is 'text', encoded format is 'binary')
msg145980 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2011-10-19 22:12
Oops, typo in my second error example. The command should be: b'a'.decode('rot_13') (Since str objects don't offer a decode() method any more)
msg145982 - (view)	Author: STINNER Victor (vstinner) *	Date: 2011-10-19 22:34
> .encode('rot_13') ==> CodecLookupError I like the idea of raising a lookup error on .encode/.decode if the codec is not a classic text codec (like ASCII or UTF-8). > .transform('ascii') ==> CodecLookupError Same comment. > str.transform('bz2') ==> CodecLookupError A lookup error is surprising here. It may be a TypeError instead. The bz2 can be used with .transform, but not on str. So: - Lookup error if the codec cannot be used with encode/decode or transform/untransform - Type error if the value type is invalid (CodecLookupError doesn't exist, you propose to define a new exception who inherits from LookupError?)
msg145986 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2011-10-19 22:54
On Thu, Oct 20, 2011 at 8:34 AM, STINNER Victor <report@bugs.python.org> wrote: >> str.transform('bz2') ==> CodecLookupError > > A lookup error is surprising here. It may be a TypeError instead. The bz2 can be used with .transform, but not on str. So: No, it's the same concept as the other cases - we found a codec with the requested name, but it's not the kind of codec we wanted in the current context (i.e. str.transform). It may be that the problem is the user has a str when they expected to have a bytearray or a bytes object, but there's no way for the codec lookup process to know that. > - Lookup error if the codec cannot be used with encode/decode or transform/untransform > - Type error if the value type is invalid There's no way for str.transform to tell the difference between "I asked for the wrong codec" and "I expected to have a bytes object here, not a str object". That's why I think we need to think in terms of format checks rather than type checks. > (CodecLookupError doesn't exist, you propose to define a new exception who inherits from LookupError?) Yeah, and I'd get that to handle the process of creating the nice error messages. I think it may even make sense to build the filtering options into codecs.lookup() itself: def lookup(encoding, decoded_format=None, encoded_format=None): info = _lookup(encoding) # The existing codec lookup algorithm if ((decoded_format is not None and decoded_format != info.decoded_format) or (encoded_format is not None and encoded_format != info.encoded_format)): raise CodecLookupError(info, decoded_format, encoded_format) Then the various encode, decode and transform methods can just pass the appropriate arguments to 'codecs.lookup' without all having to reimplement the format checking logic.
msg145991 - (view)	Author: STINNER Victor (vstinner) *	Date: 2011-10-19 23:10
> I think it may even make sense to build the filtering > options into codecs.lookup() itself: > > def lookup(encoding, decoded_format=None, encoded_format=None): > info = _lookup(encoding) # The existing codec lookup algorithm > if ((decoded_format is not None and decoded_format != > info.decoded_format) or > (encoded_format is not None and encoded_format != > info.encoded_format)): > raise CodecLookupError(info, decoded_format, encoded_format) lookup('rot13') should fail with a lookup error to keep backward compatibility. You can just change the default values to: def lookup(encoding, decoded_format='text', encoded_format='binary'): ... If you patch lookup, what about the following functions? - getencoder() - getdecoder() - getincrementalencoder() - getincrementaldecoder() - getread() - getwriter() - itereencode()
msg145998 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2011-10-20 01:53
I'm fine with people needing to drop down to the lower level lookup() API if they want the filtering functionality in Python code. For most purposes, constraining the expected codec input and output formats really isn't a major issue - we just need it in the core in order to emit sane error messages when people misuse the convenience APIs based on things that used to work in 2.x (like 'a'.encode('base64')). At the C level, I'd adjust _PyCodec_Lookup to accept the two extra arguments and add _PyCodec_EncodeText, _PyCodec_DecodeBinary, _PyCodec_TransformText and _PyCodec_TransformBinary to support the convenience APIs (rather than needing the individual objects to know about the details of the codec tagging mechanism). Making new codecs available isn't a backwards compatibility problem - anyone relying on a particular key being absent from an extensible registry is clearly doing the wrong thing. Regarding the particular formats, I'd suggest that hex, base64, quopri, uu, bz2 and zlib all be flagged as binary transforms, but rot13 be implemented as a text transform (Florent's patch has rot13 as another binary transform, but it makes more sense in the text domain - this should just be a matter of adjusting some of the data types in the implementation from bytes to str)
msg149439 - (view)	Author: Petri Lehtinen (petri.lehtinen) *	Date: 2011-12-14 10:51
Issue 13600 has been marked as a duplicate of this issue. FRT, +1 to the idea of adding encoded_format and decoded_format attributes to CodecInfo, and also to adding {str,bytes}.{transform,untransform} back.
msg153304 - (view)	Author: STINNER Victor (vstinner) *	Date: 2012-02-13 21:17
What is the status of this issue? Is there still a fan of this issue motivated to write a PEP, a patch or something like that?
msg153317 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2012-02-14 03:25
It's still on my radar to come back and have a look at it. Feedback from the web folks doing Python 3 migrations is that it would have helped them in quite a few cases. I want to get a couple of other open PEPs out of the way first, though (mainly 394 and 409)
msg164224 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2012-06-28 07:13
My current opinion is that this should be a PEP for 3.4, to make sure we flush out all the corner cases and other details correctly.
msg164226 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2012-06-28 07:26
For that matter, with the relevant codecs restored in 3.2, a transform() helper could probably be added to six (or a new project on PyPI) to prototype the approach.
msg164237 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2012-06-28 10:41
Setting as a release blocker for 3.4 - this is important.
msg165435 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2012-07-14 07:36
FWIW it's, I've been thinking further about this recently and I think implementing this feature as builtin methods is the wrong way to approach it. Instead, I propose the addition of codecs.encode and codecs.decode methods that are type neutral (leaving any type checks entirely up to the codecs themselves), while the str.encode and bytes.decode methods retain their current strict test model related type restrictions. Also, I now think my previous proposal for nice error messages was massively over-engineered. A much simpler approach is to just replace the status quo: >>> "".encode("bz2_codec") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/ncoghlan/devel/py3k/Lib/encodings/bz2_codec.py", line 17, in bz2_encode return (bz2.compress(input), len(input)) File "/home/ncoghlan/devel/py3k/Lib/bz2.py", line 443, in compress return comp.compress(data) + comp.flush() TypeError: 'str' does not support the buffer interface with a better error with more context like: UnicodeEncodeError: encoding='bz2_codec', errors='strict', codec_error="TypeError: 'str' does not support the buffer interface" A similar change would be straightforward on the decoding side. This would be a good use case for __cause__, but the codec error should still be included in the string representation.
msg170414 - (view)	Author: Uzume (uzume)	Date: 2012-09-12 19:09
Many have chimed in on this topic but I thought I would lend my stance--for whatever it is worth. I also believe most of these do not fit concept of a character codec and some sort of transforms would likely be useful, however most are sort of specialized (e.g., there should probably be a generalized compression library interface al la hashlib): rot13: a (albeit simplistic) text cipher (str to str; though bytes to bytes could be argued since since many crypto functions do that) zlib, bz2, etc. (lzma/xz should also be here): all bytes to bytes compression transforms hex(adecimal) uu, base64, etc.: these more or less fit the description of a character codec as they map between bytes and str, however, I am not sure they are really the same thing as these are basically doing a radix transformation to character symbols and the mapping it not strictly from bytes to a single character and back as a true character codec seems to imply. As evidenced by by int() format() and bytes.fromhex(), float.hex(), float.fromhex(), etc., these are more generalized conversions for serializing strings of bits into a textual representation (possibly for human consumption). I personally feel any <type/class>.hex(), etc. method would be better off as a format() style formatter if they are to exist in such a space at all (i.e., not some more generalized conversion library--which we have but since 3.x could probably use to be updated and cleaned up).
msg187630 - (view)	Author: Florent Xicluna (flox) *	Date: 2013-04-23 12:05
Another rant, because it matters to many of us: http://lucumr.pocoo.org/2012/8/11/codec-confusion/ IMHO, the solution to restore str.decode and bytes.encode and return TypeError for improper use is probably the most obvious for the average user.
msg187631 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2013-04-23 12:15
-1 I see encoding as the process to go from text to bytes, and decoding the process to go from bytes to text, so (ab)using these terms for other kind of conversions is not an option IMHO. Anyway I think someone should write a PEP and list the possible options and their pro and cons, and then a decision can be taken on python-dev. FTR in Python 2 you can use decode for bytes->text, text->text, bytes->bytes, and even text->bytes: u'DEADBEEF'.decode('hex') '\xde\xad\xbe\xef'
msg187634 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2013-04-23 12:42
transform/untransform has approval-in-principle, adding encode/decode to the type that doesn't have them has been explicitly (and repeatedly :) rejected. (I don't know about anybody else, but at this point I have written code that assumes that if an object has an 'encode' method, calling it will get me a bytes, and vice versa with 'decode'...an assumption I know is not "safe", but that I feel is useful duck typing in the contexts in which I used it.) Nick wants a PEP, other people have said a PEP isn't necessary. What is certainly necessary is for someone to pick up the ball and run with it.
msg187636 - (view)	Author: Florent Xicluna (flox) *	Date: 2013-04-23 12:54
I am not a native english speaker, but it seems that the common usage of encode/decode is wider than the restricted definition applied for Python 3.3: Some examples: * RFC 4648 specifies "Base16, Base32, and Base64 Data Encodings" http://tools.ietf.org/html/rfc4648 * About rot13: "the same code can be used for encoding and decoding" http://www.catb.org/~esr/jargon/html/R/rot13.html * The Huffman coding is "an entropy encoding algorithm" (used for DEFLATE) http://en.wikipedia.org/wiki/Huffman_coding * RFC 2616 lists (zlib's) deflate or gzip as "encoding transformations" http://tools.ietf.org/html/rfc2616#section-3.5 However, I acknowledge that there are valid reasons to choose a different verb too.
msg187638 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2013-04-23 12:59
While not strictly necessary, a PEP would be certainly useful and will help reaching a consensus. The PEP should provide a summary of the available options (transform/untransforms, reintroducing encode/decode for bytes/str, maybe others), their intended behavior (e.g. is type(x.transform()) == type(x) always true?), and possible issues (e.g. Should some transformations be limited to str or bytes? Should rot13 work with both transform and untransform?). Even if we all agreed on a solution, such document would still be useful IMHO.
msg187644 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2013-04-23 13:46
+1 for someone stepping up to write a PEP on this if they would like to see the situation improved in 3.4. transform/untransform has at least one core developer with an explicit -1 on the proposal at the moment (me). We definitely need a generic object->object convenience API in the codecs module (codecs.decode, codecs.encode). I even accept that those two functions could be worthy of elevation to be new builtin functions. I'm far from convinced that awkwardly named methods that only handle str->object, bytes->object and bytearray->object are a good idea. Should memoryview gain transform/untransform methods as well? transform/untransform as proposed aren't even inverse operations, since they don't swap the valid input and output types (that is, transform is str/bytes/bytearray to arbitrary objects, while untransform is also str/bytes/bytearray to arbitrary objects. Inverses can't have a domain/range mismatch like that). Those names are also ambiguous about which one corresponds to "encoding" and which to "decoding". encode() and decode(), whether as functions in the codecs module or as builtins, have no such issue. Personally, the more I think about it, the more I'm in favour of adding encode and decode as builtin functions for 3.4. If you want arbitrary object->object conversions, use the builtins, if you want strict str->bytes or bytes/bytearray->str use the methods. Python 3 has been around long enough now, and Python 3.2 and 3.3 are sufficiently well known that I think we can add the full power builtins without people getting confused.
msg187649 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2013-04-23 14:41
I was visualizing transform/untransform as being restricted to buffertype->bytes and stringtype->string, which at least for binascii-type transforms is all the modules support. After all, you don't get to choose what type of object you get back from encode or decode. A more generalized transformation (encode/decode) utility is also interesting, but how many non-string non-bytes transformations do we actually support?
msg187651 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2013-04-23 14:55
If transform is a method, how do you plan to accept arbitrary buffer supporting types as input? This is why I mentioned memoryview: it doesn't provide decode(), but there's no good reason you should have to copy the data from the view before decoding it. Similarly, you shouldn't have to make an unaltered copy before creating a compressed (or decompressed) copy. With codecs.encode and codecs.decode as functions, supporting memoryview as an input for bytes->str decoding, binary->bytes encoding (e.g. gzip compression) and binary->bytes decoding (e.g. gzip decompression) is trivial. Ditto for array.array and anything else that supports the buffer protocol. With transform/untransform as methods? No such luck. And once you're using functions rather than methods, it's best to define the API as object -> object, and leave any type constraints up to the individual codecs (with the error handling improved to provide more context and a more meaningful exception type, as I described earlier in the thread)
msg187652 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2013-04-23 15:02
I agree with you. transform/untransform are parallel to encode/decode, and I wouldn't expect them to exist on any type that didn't support either encode or decode. They are convenience methods, just as encode/decode are. I am also probably not invested enough in it to write the PEP :)
msg187653 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2013-04-23 15:42
str.decode() and bytes.encode() are not coming back. Any proposal had better take into account the API design rule that the type of a method's return value should not depend on the value of one of the arguments. (The Python 2 design failed this test, and that's why we changed it.) It is however fine to let the return type depend on one of the argument types. So e.g. bytes.transform(enc) -> bytes and str.transform(enc) -> str are fine. And so are e.g. transform(bytes, enc) -> bytes and transform(str, enc) -> str. But a transform() taking bytes that can return either str or bytes depending on the encoding name would be a problem. Personally I don't think transformations are so important or ubiquitous so as to deserve being made new bytes/str methods. I'd be happy with a convenience function, for example transform(input, codecname), that would have to be imported from somewhere (maybe the codecs module). My guess is that in almost all cases where people are demanding to say e.g. x = y.transform('rot13') the codec name is a fixed literal, and they are really after minimizing the number of imports. Personally, disregarding the extra import line, I think x = rot13.transform(y) looks better though. Such custom APIs also give the API designer (of the transformation) more freedom to take additional optional parameters affecting the transformation, offer a set of variants, or a richer API.
msg187660 - (view)	Author: Georg Brandl (georg.brandl) *	Date: 2013-04-23 17:38
FWIW, I'm not interested in seeing this added anymore.
msg187668 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2013-04-23 19:26
consensus here appears to be "bad idea... don't do this."
msg187670 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2013-04-23 21:46
No, transform/untransform as methods are a bad idea, but these codecs should definitely come back. The minimal change needed for that to be feasible is to give errors raised during encoding and decoding more context information (at least the codec name and error mode, and switching to the right kind of error). MAL also stated on python-dev that codecs.encode and codecs.decode already exist, so it should just be a matter of documenting them properly.
msg187673 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2013-04-23 22:19
okay, but i don't personally find any of these to be good ideas as "codecs" given they don't have anything to do with translating between bytes<->unicode.
msg187676 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2013-04-23 23:07
The codecs module is generic, text encodings are just the most common use case (hence the associated method API).
msg187695 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2013-04-24 11:45
I don't see any point in merely bringing the codecs back, without any convenience API to use them. If I need to do import codecs result = codecs.getencoder("base64").encode(data) I don't think people would actually prefer this over import base64 result = base64.encodebytes(data) I't (IMO) only the convenience method (.encode) that made people love these codecs.
msg187696 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2013-04-24 12:20
IMHO it's also a documentation problem. Once people figure out that they can't use encode/decode anymore, it's not immediately clear what they should do instead. By reading the codecs docs[0] it's not obvious that it can be done with codecs.getencoder("...").encode/decode, so people waste time finding a solution, get annoyed, and blame Python 3 because it removed a simple way to use these codecs without making clear what should be used instead. FWIW I don't care about having to do an extra import, but indeed something simpler than codecs.getencoder("...").encode/decode would be nice. [0]: http://docs.python.org/3/library/codecs.html
msg187698 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2013-04-24 13:43
It turns out MAL added the convenience API I'm looking for back in 2004, it just didn't get documented, and is hidden behind the "from _codecs import *" call in the codecs.py source code: http://hg.python.org/cpython-fullhistory/rev/8ea2cb1ec598 So, all the way from 2.4 to 2.7 you can write: from codecs import encode result = encode(data, "base64") It works in 3.x as well, you just need to add the "_codec" to the end to account for the missing aliases: >>> encode(b"example", "base64_codec") b'ZXhhbXBsZQ==\n' >>> decode(b"ZXhhbXBsZQ==\n", "base64_codec") b'example' Note that the convenience functions omit the extra checks that are part of the methods (although I admit the specific error here is rather quirky): >>> b"ZXhhbXBsZQ==\n".decode("base64_codec") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python3.2/encodings/base64_codec.py", line 20, in base64_decode return (base64.decodebytes(input), len(input)) File "/usr/lib64/python3.2/base64.py", line 359, in decodebytes raise TypeError("expected bytes, not %s" % s.__class__.__name__) TypeError: expected bytes, not memoryview I'me going to create some additional issues, so this one can return to just being about restoring the missing aliases.
msg187701 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2013-04-24 13:47
Just copying some details here about codecs.encode() and codec.decode() from python-dev: """ Just as reminder: we have the general purpose encode()/decode() functions in the codecs module: import codecs r13 = codecs.encode('hello world', 'rot-13') These interface directly to the codec interfaces, without enforcing type restrictions. The codec defines the supported input and output types. """ As Nick found, these aren't documented, which is a documentation bug (I probably forgot to add documentation back then). They have been in Python since 2004: http://hg.python.org/cpython-fullhistory/rev/8ea2cb1ec598 These API are nice for general purpose codec work and that's why I added them back in 2004. For the codecs in question, it would still be nice to have a more direct way to access them via methods on the types that you typically use them with.
msg187702 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2013-04-24 13:53
> It works in 3.x as well, you just need to add the "_codec" to the end > to account for the missing aliases: FTR this is because of ff1261a14573 (see #10807).
msg187705 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2013-04-24 14:11
Issue 17827 covers adding documentation for codecs.encode and codecs.decode Issue 17828 covers adding exception handling improvements for all encoding and decoding operations
msg187707 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2013-04-24 14:22
For me, the killer argument against a method based API is memoryview (and, equivalently, array.array). It should be possible to use those as inputs for the bytes->bytes codecs, and once you endorse codecs.encode and codecs.decode for that use case, it's hard to justify adding more exclusive methods to the already broad bytes and bytearray APIs (particularly given the problems with conveying direction of conversion unambiguously). By contrast, I think "the codecs functions are generic while the str, bytes and bytearray methods are specific to text encodings" is something we can explain fairly easily, thus allowing the aliases mentioned in this issue to be restored for use with the codecs module functions. To avoid reintroducing the quirky errors described in issue 10807, the encoding and decoding error messages should first be improved as discussed in issue 17828.
msg187764 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2013-04-25 07:49
Also adding 17839 as a dependency, since part of the reason the base64 errors in particular are so cryptic is because the base64 module doesn't accept arbitrary PEP 3118 compliant objects as input.
msg187770 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2013-04-25 08:31
I also created issue 17841 to cover that that the 3.3 documentation incorrectly states that these aliases still exist, even though they were removed before 3.2 was released.
msg198845 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2013-10-02 15:08
With issue 17839 fixed, the error from invoking the base64 codec through the method API is now substantially more sensible: >>> b"ZXhhbXBsZQ==\n".decode("base64_codec") Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: decoder did not return a str object (type=bytes)
msg198846 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2013-10-02 15:13
I just wanted to note something I realised in chatting to Armin Ronacher recently: in both Python 2.x and 3.x, the encode/decode method APIs are constrained by the text model, it's just that in 2.x that model was effectively basestring<->basestring, and thus still covered every codec in the standard library. This greatly limited the use cases for the codecs.encode/decode convenience functions, which is why the fact they were undocumented went unnoticed. In 3.x, the changed text model meant the method API become limited to the Unicode codecs, making the function based API more important.
msg202130 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2013-11-04 13:21
For anyone interested, I have a patch up on issue 17828 that produces the following output for various codec usage errors: >>> import codecs >>> codecs.encode(b"hello", "bz2_codec").decode("bz2_codec") Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'bz2_codec' decoder returned 'bytes' instead of 'str'; use codecs.decode to decode to arbitrary types >>> "hello".encode("bz2_codec") TypeError: 'str' does not support the buffer interface The above exception was the direct cause of the following exception: Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: invalid input type for 'bz2_codec' codec (TypeError: 'str' does not support the buffer interface) >>> "hello".encode("rot_13") Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'rot_13' encoder returned 'str' instead of 'bytes'; use codecs.encode to encode to arbitrary types
msg202264 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2013-11-06 12:41
Providing the 2to3 fixers in issue 17823 now depends on this issue rather than the other way around (since not having to translate the names simplifies the fixer a bit).
msg202515 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2013-11-10 09:25
Issue 17823 is now closed, but not because it has been implemented. It turns out that the data driven nature of the incompatibility means it isn't really amenable to being detected and fixed automatically via 2to3. Issue 19543 is a replacement proposal for the introduction of some additional codec related Py3k warnings in Python 2.7.7.
msg203124 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2013-11-17 07:41
Attached patch restores the aliases for the binary and text transforms, adds a test to ensure they exist and restores the "Aliases" column to the relevant tables in the documentation. It also updates the relevant section in the What's New document. I also tweaked the wording in the docs to use the phrases "binary transform" and "text transform" for the affected tables and version added/changed notices. Given the discussions on python-dev, the main condition that needs to be met before I commit this is for Victor to change his current -1 to a -0 or higher.
msg203378 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2013-11-19 14:25
Victor is still -1, so to Python 3.5 it goes.
msg203751 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2013-11-22 12:44
The 3.4 portion of issue 19619 has been addressed, so removing it as a dependency again.
msg203936 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2013-11-23 00:46
With issue 19619 resolved for Python 3.4 (the issue itself remains open awaiting a backport to 3.3), Victor has softened his stance on this topic and given the go ahead to restore the codec aliases: http://bugs.python.org/issue19619#msg203897 I'll be committing this shortly, after adjusting the patch to account for the issue 19619 changes to the tests and What's New.
msg203942 - (view)	Author: Roundup Robot (python-dev)	Date: 2013-11-23 01:14
New changeset 5e960d2c2156 by Nick Coghlan in branch 'default': Close #7475: Restore binary & text transform codecs http://hg.python.org/cpython/rev/5e960d2c2156
msg203944 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2013-11-23 01:16
Note that I still plan to do a documentation-only PEP for 3.4, proposing some adjustments to the way the codecs module is documented, making binary and test transform defined terms in the glossary, etc. I'll probably aim for beta 2 for that.
msg207283 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2014-01-04 13:34
Docstrings for new codecs mention bytes.transform() and bytes.untransform() which are nonexistent.
msg213502 - (view)	Author: Roundup Robot (python-dev)	Date: 2014-03-14 00:55
New changeset d7950e916f20 by R David Murray in branch '3.3': #7475: Remove references to '.transform' from transform codec docstrings. http://hg.python.org/cpython/rev/d7950e916f20 New changeset 83d54ab5c696 by R David Murray in branch 'default': Merge #7475: Remove references to '.transform' from transform codec docstrings. http://hg.python.org/cpython/rev/83d54ab5c696

History
Date	User	Action	Args
2022-04-11 14:56:55	admin	set	github: 51724
2014-03-14 00:55:23	python-dev	set	messages: + msg213502
2014-01-04 13:34:04	serhiy.storchaka	set	nosy: + serhiy.storchaka messages: + msg207283
2014-01-02 12:42:36	jwilk	set	nosy: + jwilk
2013-11-23 01:16:23	ncoghlan	set	messages: + msg203944
2013-11-23 01:14:37	python-dev	set	status: open -> closed nosy: + python-dev messages: + msg203942 resolution: fixed stage: resolved
2013-11-23 00:46:51	ncoghlan	set	assignee: ncoghlan messages: + msg203936 versions: + Python 3.4, - Python 3.5
2013-11-22 12:44:25	ncoghlan	set	dependencies: - Blacklist base64, hex, ... codecs from bytes.decode() and str.encode() messages: + msg203751
2013-11-21 13:35:20	ncoghlan	set	dependencies: + Blacklist base64, hex, ... codecs from bytes.decode() and str.encode()
2013-11-19 14:25:41	ncoghlan	set	messages: + msg203378 versions: + Python 3.5, - Python 3.4
2013-11-17 07:41:29	ncoghlan	set	files: + issue7475_restore_codec_aliases_in_py34.diff messages: + msg203124
2013-11-10 09:25:10	ncoghlan	set	messages: + msg202515
2013-11-10 09:22:10	ncoghlan	unlink	issue17823 dependencies
2013-11-06 12:41:41	ncoghlan	set	dependencies: - 2to3 fixers for missing codecs messages: + msg202264
2013-11-06 12:40:42	ncoghlan	link	issue17823 dependencies
2013-11-04 13:21:33	ncoghlan	set	messages: + msg202130
2013-10-02 15:18:13	ncoghlan	set	versions: - Python 2.7, Python 3.3
2013-10-02 15:17:00	ncoghlan	set	messages: - msg198847
2013-10-02 15:16:36	ncoghlan	set	messages: + msg198847 versions: + Python 2.7, Python 3.3
2013-10-02 15:13:49	ncoghlan	set	messages: + msg198846
2013-10-02 15:08:16	ncoghlan	set	messages: + msg198845
2013-05-02 22:46:38	isoschiz	set	nosy: + isoschiz
2013-04-25 16:34:15	gvanrossum	set	nosy: - gvanrossum
2013-04-25 11:43:30	serhiy.storchaka	set	dependencies: + Add link to alternatives for bytes-to-bytes codecs
2013-04-25 08:31:46	ncoghlan	set	messages: + msg187770
2013-04-25 07:53:34	serhiy.storchaka	set	dependencies: + 2to3 fixers for missing codecs
2013-04-25 07:49:12	ncoghlan	set	dependencies: + base64 module should use memoryview messages: + msg187764
2013-04-24 14:22:38	ncoghlan	set	dependencies: + More informative error handling when encoding and decoding messages: + msg187707
2013-04-24 14:11:28	ncoghlan	set	messages: + msg187705
2013-04-24 13:53:35	ezio.melotti	set	messages: + msg187702
2013-04-24 13:47:10	lemburg	set	messages: + msg187701
2013-04-24 13:43:13	ncoghlan	set	messages: + msg187698
2013-04-24 12:20:46	ezio.melotti	set	messages: + msg187696
2013-04-24 11:45:23	loewis	set	messages: + msg187695
2013-04-23 23:07:32	ncoghlan	set	messages: + msg187676
2013-04-23 22:19:41	gregory.p.smith	set	status: closed -> open resolution: wont fix -> (no value) messages: + msg187673 stage: resolved -> (no value)
2013-04-23 21:46:42	ncoghlan	set	messages: + msg187670
2013-04-23 19:26:47	gregory.p.smith	set	status: open -> closed priority: high -> normal nosy: + gregory.p.smith messages: + msg187668 resolution: wont fix stage: resolved
2013-04-23 17:38:31	georg.brandl	set	messages: + msg187660
2013-04-23 15:42:31	gvanrossum	set	messages: + msg187653
2013-04-23 15:02:13	r.david.murray	set	messages: + msg187652
2013-04-23 14:55:55	ncoghlan	set	messages: + msg187651
2013-04-23 14:41:42	r.david.murray	set	messages: + msg187649
2013-04-23 13:46:22	ncoghlan	set	messages: + msg187644
2013-04-23 12:59:21	ezio.melotti	set	messages: + msg187638
2013-04-23 12:54:04	flox	set	messages: + msg187636
2013-04-23 12:42:27	r.david.murray	set	nosy: + r.david.murray messages: + msg187634
2013-04-23 12:15:06	ezio.melotti	set	messages: + msg187631
2013-04-23 12:05:43	flox	set	messages: + msg187630
2013-04-22 18:39:06	pconnell	set	nosy: + pconnell
2013-04-01 18:06:30	flox	set	nosy: + flox
2013-04-01 18:06:19	flox	set	nosy: - flox
2012-09-12 19:11:51	uzume	set	nosy: - uzume
2012-09-12 19:09:57	uzume	set	nosy: + uzume messages: + msg170414
2012-08-25 07:52:33	ncoghlan	set	priority: release blocker -> high
2012-07-14 10:51:15	ezio.melotti	set	nosy: + ezio.melotti
2012-07-14 07:36:42	ncoghlan	set	messages: + msg165435
2012-06-28 10:41:30	ncoghlan	set	priority: normal -> release blocker messages: + msg164237 stage: commit review -> (no value)
2012-06-28 07:26:31	ncoghlan	set	messages: + msg164226
2012-06-28 07:13:02	ncoghlan	set	messages: + msg164224 versions: + Python 3.4, - Python 3.3
2012-02-19 04:16:27	jcea	set	nosy: + jcea
2012-02-14 03:25:58	ncoghlan	set	messages: + msg153317
2012-02-13 21:17:55	vstinner	set	messages: + msg153304
2012-02-13 21:11:44	barry	set	nosy: + barry
2011-12-14 10:51:53	petri.lehtinen	set	nosy: + petri.lehtinen messages: + msg149439
2011-12-14 10:48:42	petri.lehtinen	link	issue13600 superseder
2011-10-20 01:53:08	ncoghlan	set	messages: + msg145998
2011-10-19 23:10:52	vstinner	set	messages: + msg145991
2011-10-19 22:54:41	ncoghlan	set	messages: + msg145986
2011-10-19 22:34:48	vstinner	set	messages: + msg145982
2011-10-19 22:12:37	ncoghlan	set	messages: + msg145980
2011-10-19 22:09:43	ncoghlan	set	messages: + msg145979
2011-10-19 11:58:38	ncoghlan	set	messages: + msg145900
2011-10-19 11:35:38	ncoghlan	set	assignee: lemburg -> (no value) messages: + msg145897 nosy: + ncoghlan
2011-10-17 13:38:20	eric.araujo	set	messages: + msg145693
2011-10-17 00:53:29	vstinner	set	messages: + msg145656
2011-10-09 09:18:13	eric.araujo	set	messages: + msg145246 components: - Documentation, 2to3 (2.x to 3.x conversion tool)
2011-09-22 15:36:27	cben	set	nosy: + cben
2011-07-19 13:13:46	eric.araujo	set	versions: + Python 3.3, - Python 3.2
2011-01-02 19:01:49	vstinner	set	nosy: lemburg, gvanrossum, loewis, georg.brandl, belopolsky, vstinner, benjamin.peterson, eric.araujo, ssbarnea, flox messages: + msg125073
2010-12-30 01:53:47	belopolsky	link	issue3232 dependencies
2010-12-09 18:43:33	belopolsky	set	status: closed -> open type: enhancement components: + Unicode nosy: + gvanrossum messages: + msg123693 resolution: fixed -> (no value) stage: commit review
2010-12-06 11:49:37	lemburg	set	messages: + msg123462
2010-12-05 19:12:13	georg.brandl	set	assignee: lemburg messages: + msg123436
2010-12-05 19:04:43	loewis	set	messages: + msg123435
2010-12-03 08:46:28	lemburg	set	messages: + msg123206
2010-12-03 01:40:10	belopolsky	set	nosy: + belopolsky messages: + msg123154
2010-12-02 18:08:08	georg.brandl	set	status: open -> closed resolution: fixed messages: + msg123090
2010-07-31 17:44:57	flox	link	issue3532 superseder
2010-07-10 18:35:06	loewis	set	messages: + msg109905
2010-07-10 18:14:32	eric.araujo	set	messages: + msg109904
2010-07-10 17:07:40	lemburg	set	versions: - Python 3.1, Python 2.7
2010-07-10 17:06:57	lemburg	set	messages: + msg109894
2010-07-10 15:36:30	georg.brandl	set	messages: + msg109879
2010-07-10 15:36:19	georg.brandl	set	messages: - msg109878
2010-07-10 15:36:07	georg.brandl	set	messages: + msg109878
2010-07-10 15:24:32	lemburg	set	messages: + msg109876
2010-07-10 14:24:54	loewis	set	messages: + msg109872
2010-06-14 15:35:05	ssbarnea	set	nosy: + ssbarnea messages: + msg107794 title: codecs missing: base64 bz2 hex zlib ... -> codecs missing: base64 bz2 hex zlib hex_codec ...
2010-06-04 14:12:06	eric.araujo	set	messages: + msg107057
2010-05-28 14:25:47	eric.araujo	set	nosy: + eric.araujo
2010-05-28 14:17:52	lemburg	set	messages: + msg106674
2010-05-28 13:48:54	vstinner	set	messages: + msg106670
2010-05-28 13:45:56	vstinner	set	messages: + msg106669
2010-05-28 13:18:57	vstinner	set	nosy: + vstinner
2010-05-20 20:33:01	skip.montanaro	set	nosy: - skip.montanaro
2009-12-19 18:09:41	georg.brandl	set	assignee: georg.brandl -> (no value)
2009-12-19 18:09:28	georg.brandl	set	messages: + msg96632
2009-12-14 10:30:11	lemburg	set	messages: + msg96374
2009-12-12 19:25:17	loewis	set	messages: + msg96301
2009-12-12 15:44:22	flox	set	messages: + msg96296
2009-12-12 15:40:27	flox	set	messages: + msg96295
2009-12-11 23:09:08	loewis	set	messages: + msg96277
2009-12-11 17:05:47	flox	set	files: + issue7475_missing_codecs_py3k.diff messages: + msg96265
2009-12-11 13:13:50	lemburg	set	messages: + msg96253
2009-12-11 12:54:39	benjamin.peterson	set	messages: + msg96251
2009-12-11 10:22:23	flox	set	nosy: lemburg, loewis, skip.montanaro, georg.brandl, benjamin.peterson, flox messages: + msg96243 components: + Library (Lib)
2009-12-11 09:56:57	lemburg	set	messages: + msg96242
2009-12-11 09:47:23	lemburg	set	resolution: not a bug -> (no value)
2009-12-11 09:46:55	lemburg	set	messages: + msg96240 title: No hint about codecs removed: base64 bz2 hex zlib ... -> codecs missing: base64 bz2 hex zlib ...
2009-12-11 09:26:31	flox	set	files: + issue7475_warning.diff keywords: + patch
2009-12-11 08:33:13	flox	set	title: No hint about codecs removed : base64 bz2 hex zlib ... -> No hint about codecs removed: base64 bz2 hex zlib ...
2009-12-11 08:31:52	flox	set	versions: + Python 2.7
2009-12-11 08:31:17	flox	set	messages: + msg96237
2009-12-11 08:21:39	flox	set	status: closed -> open assignee: georg.brandl components: + Documentation, 2to3 (2.x to 3.x conversion tool), - Library (Lib) title: codecs missing: base64 bz2 hex zlib ... -> No hint about codecs removed : base64 bz2 hex zlib ... nosy: + georg.brandl messages: + msg96236
2009-12-11 02:09:19	benjamin.peterson	set	status: open -> closed nosy: + benjamin.peterson messages: + msg96232
2009-12-10 23:28:52	loewis	set	messages: + msg96228
2009-12-10 23:26:10	lemburg	set	status: closed -> open messages: + msg96227
2009-12-10 23:25:03	lemburg	set	nosy: + lemburg messages: + msg96226
2009-12-10 23:15:12	loewis	set	status: open -> closed nosy: + loewis messages: + msg96223 resolution: not a bug
2009-12-10 22:52:04	skip.montanaro	set	nosy: + skip.montanaro
2009-12-10 22:27:38	flox	create