Issue7475
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2009-12-10 22:27 by flox, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
issue7475_warning.diff | flox, 2009-12-11 09:26 | Patch for documentation and warnings in 2.7 | review | |
issue7475_missing_codecs_py3k.diff | flox, 2009-12-11 17:05 | Patch, apply to trunk | ||
issue7475_restore_codec_aliases_in_py34.diff | ncoghlan, 2013-11-17 07:41 | Patch to restore the transform aliases. | review |
Messages (95) | |||
---|---|---|---|
msg96218 - (view) | Author: Florent Xicluna (flox) * ![]() |
Date: 2009-12-10 22:27 | |
AFAIK these codecs were not ported to Python 3. 1. I found no hint in documentation on this matter. 2. Is it possible to contribute some of them, or there's a good reason to look elsewhere? |
|||
msg96223 - (view) | Author: Martin v. Löwis (loewis) * ![]() |
Date: 2009-12-10 23:15 | |
These are not encodings, in that they don't convert characters to bytes. It was a mistake that they were integrated into the codecs interfaces in Python 2.x; this mistake is corrected in 3.x. |
|||
msg96226 - (view) | Author: Marc-Andre Lemburg (lemburg) * ![]() |
Date: 2009-12-10 23:25 | |
Martin v. Löwis wrote: > > Martin v. Löwis <martin@v.loewis.de> added the comment: > > These are not encodings, in that they don't convert characters to bytes. > It was a mistake that they were integrated into the codecs interfaces in > Python 2.x; this mistake is corrected in 3.x. Martin, I beg your pardon, but these codecs indeed implement valid encodings and the fact that these codecs were removed was a mistake. They should be readded to Python 3.x. Note that just because a codec doesn't convert between bytes and characters only, doesn't make it wrong in any way. The codec architecture in Python is designed to support same type encodings just as well as ones between bytes and characters. |
|||
msg96227 - (view) | Author: Marc-Andre Lemburg (lemburg) * ![]() |
Date: 2009-12-10 23:26 | |
Reopening the ticket. |
|||
msg96228 - (view) | Author: Martin v. Löwis (loewis) * ![]() |
Date: 2009-12-10 23:28 | |
It's not possible to add these codecs back. Bytes objects (correctly) don't have an encode method, and string objects (correctly) don't have a decode method. The codec architecture of Python 3.x just doesn't support this kind of application; the codec architecture of 2.x was flawed. |
|||
msg96232 - (view) | Author: Benjamin Peterson (benjamin.peterson) * ![]() |
Date: 2009-12-11 02:09 | |
I agree with Martin. gzip and bz2 convert bytes to bytes. Encodings deal strictly with unicode -> bytes. |
|||
msg96236 - (view) | Author: Florent Xicluna (flox) * ![]() |
Date: 2009-12-11 08:21 | |
«Everything you thought you knew about binary data and Unicode has changed.» Reopening for the documentation part. This "mistake" deserves some words in the documentation: docs.python.org/dev/py3k/whatsnew/3.0.html #text-vs-data-instead-of-unicode-vs-8-bit And the conversion may be automated with 2to3, maybe. |
|||
msg96237 - (view) | Author: Florent Xicluna (flox) * ![]() |
Date: 2009-12-11 08:31 | |
Is it possible to add "DeprecationWarning" for these codecs when using "python -3" ? >>> {}.has_key('a') __main__:1: DeprecationWarning: dict.has_key() not supported in 3.x; use the in operator False >>> print `123` <stdin>:1: SyntaxWarning: backquote not supported in 3.x; use repr() 123 >>> 'abc'.encode('base64') 'YWJj\n' |
|||
msg96240 - (view) | Author: Marc-Andre Lemburg (lemburg) * ![]() |
Date: 2009-12-11 09:46 | |
Martin v. Löwis wrote: > > Martin v. Löwis <martin@v.loewis.de> added the comment: > > It's not possible to add these codecs back. Bytes objects (correctly) > don't have an encode method, and string objects (correctly) don't have a > decode method. The codec architecture of Python 3.x just doesn't support > this kind of application; the codec architecture of 2.x was flawed. Of course it does support these kinds of codecs. The codec architecture hasn't changed between 2.x and 3.x, just the way a few methods work. All we agreed to is that unicode.encode() will only return bytes, while bytes.decode() will only return unicode. So the methods won't support same type conversions, because Guido didn't want to have methods that return different types based on the chosen parameter (the codec name in this case). However, you can still use codecs.encode() and codecs.decode() to work with codecs that return different combinations of types. I explicitly added that support back to 3.0. You can't argue that just because two methods don't support a certain type combination, the whole architecture doesn't support this anymore. Also note that codecs allow a much more far-reaching use than just through the unicode and bytes methods: you can use them as seamless wrappers for streams, subclass from them, use their methods directly, etc. etc. So your argument that just because the two methods don't support these codecs anymore is just not good enough to warrant their removal. |
|||
msg96242 - (view) | Author: Marc-Andre Lemburg (lemburg) * ![]() |
Date: 2009-12-11 09:56 | |
Benjamin Peterson wrote: > > Benjamin Peterson <benjamin@python.org> added the comment: > > I agree with Martin. gzip and bz2 convert bytes to bytes. Encodings deal > strictly with unicode -> bytes. Sorry, Bejamin, but that's simply not true. Codecs can work with arbitrary types, it's just that the helper methods on unicode and bytes objects only support one combination of types in Python 3.x. codecs.encode()/.decode() provide access to all codecs, regardless of their supported type combinations and of course, you can use them directly via the codec registry, subclass from them, etc. |
|||
msg96243 - (view) | Author: Florent Xicluna (flox) * ![]() |
Date: 2009-12-11 10:22 | |
Thinking about it, I am +1 to reimplement the codecs. We could implement new methods to replace the old one. (similar to base64.encodebytes and base64.decodebytes) >>> b'abc'.encodebytes('base64') b'YWJj\n' >>> b'abc'.encodebytes('zlib').encodebytes('base64') b'eJxLTEoGAAJNASc=\n' >>> b'UHl0aG9u'.decodebytes('base64').decode('utf-8') 'Python' |
|||
msg96251 - (view) | Author: Benjamin Peterson (benjamin.peterson) * ![]() |
Date: 2009-12-11 12:54 | |
2009/12/11 Marc-Andre Lemburg <report@bugs.python.org>: > codecs.encode()/.decode() provide access to all codecs, regardless > of their supported type combinations and of course, you can use > them directly via the codec registry, subclass from them, etc. Didn't you have a proposal for bytes.transform/untransform for operations like this? |
|||
msg96253 - (view) | Author: Marc-Andre Lemburg (lemburg) * ![]() |
Date: 2009-12-11 13:13 | |
Benjamin Peterson wrote: > > Benjamin Peterson <benjamin@python.org> added the comment: > > 2009/12/11 Marc-Andre Lemburg <report@bugs.python.org>: >> codecs.encode()/.decode() provide access to all codecs, regardless >> of their supported type combinations and of course, you can use >> them directly via the codec registry, subclass from them, etc. > > Didn't you have a proposal for bytes.transform/untransform for > operations like this? Yes. At the time it was postponed, since I brought it up late in the 3.0 release process. Perhaps I should bring it up again. Note that those methods are just convenient helpers to access the codecs and as such only provide limited functionality. The full machinery itself is accessible via the codecs module and the code in the encodings package. Any decision to include a codec or not needs to be based on whether it fits the framework in those modules/packages, not the functionality we expose on unicode and bytes objects. |
|||
msg96265 - (view) | Author: Florent Xicluna (flox) * ![]() |
Date: 2009-12-11 17:05 | |
I've ported the codecs from Py2: base64, bytes_escape, bz2, hex, quopri, rot13, uu and zlib It's not a big deal. Basically: - StringIO.StringIO --> io.BytesIO - 'string_escape' --> 'bytes_escape' Will add documentation if we agree on the feature. |
|||
msg96277 - (view) | Author: Martin v. Löwis (loewis) * ![]() |
Date: 2009-12-11 23:09 | |
> codecs.encode()/.decode() provide access to all codecs, regardless > of their supported type combinations and of course, you can use > them directly via the codec registry, subclass from them, etc. I presume that the OP didn't talk about codecs.encode, but about the methods on string objects. flox, can you clarify what precisely it is that you miss? |
|||
msg96295 - (view) | Author: Florent Xicluna (flox) * ![]() |
Date: 2009-12-12 15:40 | |
Martin, actually, I was trying to convert some piece of code from python2 to python3. And this statement was not converted by 2to3: "x.decode('base64').decode('zlib')" So, I read the official documentation, and found no hint about the removal of these codecs. For my specific use case, I can use "zlib.decompress" and "base64.decodebytes", but I find that the ".encode()" and ".decode()" helpers were useful in Python 2. I don't know all the background of the removal of these codecs. But I try to contribute to Python, and help Python 3 become at least as featureful, and useful, as Python 2. So, after reading the above comments, I think we may end up with following changes: * restore the "bytes-to-bytes" codecs in the "encodings" package * then create new helpers on bytes objects (either ".transform()/.untransform()" or ".encodebytes()/.decodebytes") |
|||
msg96296 - (view) | Author: Florent Xicluna (flox) * ![]() |
Date: 2009-12-12 15:44 | |
> And this statement was not converted s/this statement/this method call/ |
|||
msg96301 - (view) | Author: Martin v. Löwis (loewis) * ![]() |
Date: 2009-12-12 19:25 | |
> So, after reading the above comments, I think we may end up with > following changes: > * restore the "bytes-to-bytes" codecs in the "encodings" package > * then create new helpers on bytes objects (either > ".transform()/.untransform()" or ".encodebytes()/.decodebytes") I would still be opposed to such a change, and I think it needs a PEP. If the codecs are restored, one half of them becomes available to .encode/.decode methods, since the codec registry cannot tell which ones implement real character encodings, and which ones are other conversion methods. So adding them would be really confusing. I also wonder why you are opposed to the import statement. My recommendation is indeed that you use the official API for these libraries (and indeed, there is an official API for each of them, unlike real codecs, which don't have any other documented API). |
|||
msg96374 - (view) | Author: Marc-Andre Lemburg (lemburg) * ![]() |
Date: 2009-12-14 10:30 | |
Martin v. Löwis wrote: > > Martin v. Löwis <martin@v.loewis.de> added the comment: > >> So, after reading the above comments, I think we may end up with >> following changes: >> * restore the "bytes-to-bytes" codecs in the "encodings" package +1 >> * then create new helpers on bytes objects (either >> ".transform()/.untransform()" or ".encodebytes()/.decodebytes") +1 - the names are still up for debate, IIRC. > I would still be opposed to such a change, and I think it needs a PEP. All this has already been discussed and the only reason it didn't go in earlier was timing. No need for a PEP. > If the codecs are restored, one half of them becomes available to > .encode/.decode methods, since the codec registry cannot tell which > ones implement real character encodings, and which ones are other > conversion methods. So adding them would be really confusing. Not at all. The helper methods check the return types and raise an exception if the types don't match the expected types. The codecs registry itself doesn't need to know about the possible input/output types of codecs, since this information is not required to match a name to an implementation. What we could do, is add that information to the CodecInfo object used for registering the codec. codecs.lookup() would then return the information to the application. E.g. .encode_input_types = (str,) .encode_output_types = (bytes,) .decode_input_types = (bytes,) .decode_output_types = (str,) Codecs not supporting these CodecInfo attributes would simply return None. > I also wonder why you are opposed to the import statement. My > recommendation is indeed that you use the official API for these > libraries (and indeed, there is an official API for each of them, > unlike real codecs, which don't have any other documented API). That's not the point. The codec API provides a standardized API for all these encodings. The hex, zlib, bz2, etc. codecs are just adapters of the different pre-existing APIs to the codec API. |
|||
msg96632 - (view) | Author: Georg Brandl (georg.brandl) * ![]() |
Date: 2009-12-19 18:09 | |
I also seem to recall that adding .transform()/.untransform() was already accepted at some point. |
|||
msg106669 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2010-05-28 13:45 | |
I agree with Martin: codecs choosed the wrong direction in Python2, and it's fixed in Python3. The codecs module is related to charsets (encodings), should encode str to bytes, and should decode bytes (or any read buffer) to str. Eg. rot13 "encodes" str to str. "base64 bz2 hex zlib ...": use base64, bz2, binascii and zlib modules for that. The documentation should be fixed (explain how to port code from Python2 to Python3). It's maybe possible for write some 2to3 fixers for the following examples: "...".encode("base64") => base64.b64encode("...") "...".encode("rot13") => do nothing (but display a warning?) "...".encode("zlib") => zlib.compress("...") "...".encode("hex") => base64.b16encode("...") "...".encode("bz2") => bz2.compress("...") "...".decode("base64") => base64.b64decode("...") "...".decode("rot13") => do nothing (but display a warning?) "...".decode("zlib") => zlib.decompress("...") "...".decode("hex") => base64.b16decode("...") "...".decode("bz2") => bz2.decompress("...") |
|||
msg106670 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2010-05-28 13:48 | |
Explanation the change in Python3 by Guido: "We are adopting a slightly different approach to codecs: while in Python 2, codecs can accept either Unicode or 8-bits as input and produce either as output, in Py3k, encoding is always a translation from a Unicode (text) string to an array of bytes, and decoding always goes the opposite direction. This means that we had to drop a few codecs that don't fit in this model, for example rot13, base64 and bz2 (those conversions are still supported, just not through the encode/decode API)." http://www.artima.com/weblogs/viewpost.jsp?thread=208549 -- See also issue #8838. |
|||
msg106674 - (view) | Author: Marc-Andre Lemburg (lemburg) * ![]() |
Date: 2010-05-28 14:17 | |
STINNER Victor wrote: > > STINNER Victor <victor.stinner@haypocalc.com> added the comment: > > I agree with Martin: codecs choosed the wrong direction in Python2, and it's fixed in Python3. The codecs module is related to charsets (encodings), should encode str to bytes, and should decode bytes (or any read buffer) to str. No, that's just not right: the codec system in Python does not mandate the types used or accepted by the codecs. The only change that was applied in Python3 was to make sure that the str.encode() and bytes.decode() methods always return the same type to assure type-safety. Python2 does not apply that check, but instead provides a direct interface to codecs.encode() and codecs.decode(). Please don't mix the helper methods on those objects with what the codec system was designed for. The helper methods apply a strategy that's more constrained than the codec system. The addition of .transform() and .untransform() for same type conversions was discussed in 2008, but didn't make it into 3.0 since I hadn't had time to add the methods: http://mail.python.org/pipermail/python-3000/2008-August/014533.html http://mail.python.org/pipermail/python-3000/2008-August/014533.html http://mail.python.org/pipermail/python-3000/2008-August/014534.html The removed codecs don't rely on the helper methods in any way. They are easily usable via codecs.encode() and codecs.decode() even without .transform() and .untransform(). Esp. the hex codec is very handy and at least in our eGenix code base in wide-spread use. Using a single well-defined interface to such encodings is just much more user friendly than having to research the different APIs for each of them. |
|||
msg107057 - (view) | Author: Éric Araujo (eric.araujo) * ![]() |
Date: 2010-06-04 14:12 | |
Related: bytes vs. str for base64 encoding in email, #8896 |
|||
msg107794 - (view) | Author: Sorin Sbarnea (ssbarnea) * | Date: 2010-06-14 15:35 | |
I would like to know what happened with hex_codec and what is the new py3 for this. Also, it would be really helpful to see DeprecationWarnings for all these codecs in py2x and include a note in py3 changelist. The official python documentation from http://docs.python.org/library/codecs.html lists them as valid without any signs of them as being dropped or replaced. |
|||
msg109872 - (view) | Author: Martin v. Löwis (loewis) * ![]() |
Date: 2010-07-10 14:24 | |
> I would like to know what happened with hex_codec and what is the new py3 for this. If you had read this bug report, you'd know that the codec was removed in Python 3. Use binascii.hexlify/binascii.unhexlify instead (as you should in 2.x, also). |
|||
msg109876 - (view) | Author: Marc-Andre Lemburg (lemburg) * ![]() |
Date: 2010-07-10 15:24 | |
Martin v. Löwis wrote: > > Martin v. Löwis <martin@v.loewis.de> added the comment: > >> I would like to know what happened with hex_codec and what is the new py3 for this. > > If you had read this bug report, you'd know that the codec was removed > in Python 3. Use binascii.hexlify/binascii.unhexlify instead (as you > should in 2.x, also). ... or wait for Python 3.2 which will readd them :-) |
|||
msg109879 - (view) | Author: Georg Brandl (georg.brandl) * ![]() |
Date: 2010-07-10 15:36 | |
... but don't wait to long to add them! |
|||
msg109894 - (view) | Author: Marc-Andre Lemburg (lemburg) * ![]() |
Date: 2010-07-10 17:06 | |
Georg Brandl wrote: > > Georg Brandl <georg@python.org> added the comment: > > ... but don't wait to long to add them! I plan to work on that after EuroPython. Florent already provided the patch for the codecs, so what's left is adding the .transform()/ .untransform() methods, and perhaps tweak the codec input/output types in a couple of cases. |
|||
msg109904 - (view) | Author: Éric Araujo (eric.araujo) * ![]() |
Date: 2010-07-10 18:14 | |
I am confused by MvL’s reply. From the first paragraph documentation for binascii: “Normally, you will not use these functions directly but use wrapper modules like uu, base64, or binhex instead. The binascii module contains low-level functions written in C for greater speed that are used by the higher-level modules.” Is the doc not accurate? Also, can someone not unsure about the status of this report edit the type, stage, component and resolution? It would be helpful. |
|||
msg109905 - (view) | Author: Martin v. Löwis (loewis) * ![]() |
Date: 2010-07-10 18:35 | |
> I am confused by MvL’s reply. From the first paragraph documentation > for binascii: “Normally, you will not use these functions directly > but use wrapper modules like uu, base64, or binhex instead. The > binascii module contains low-level functions written in C for greater > speed that are used by the higher-level modules.” > > Is the doc not accurate? It is correct. So use base64.b16encode/b16decode then. It's just that I personally prefer hexlify/unhexlify, because I can memorize the function name better. |
|||
msg123090 - (view) | Author: Georg Brandl (georg.brandl) * ![]() |
Date: 2010-12-02 18:08 | |
Codecs brought back and (un)transform implemented in r86934. |
|||
msg123154 - (view) | Author: Alexander Belopolsky (belopolsky) * ![]() |
Date: 2010-12-03 01:40 | |
I am probably a bit late to this discussion, but why these things should be called "codecs" and why should they share the registry with the encodings? It looks like the proper term would be "transformations" or "transforms". |
|||
msg123206 - (view) | Author: Marc-Andre Lemburg (lemburg) * ![]() |
Date: 2010-12-03 08:46 | |
Alexander Belopolsky wrote: > > Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment: > > I am probably a bit late to this discussion, but why these things should be called "codecs" and why should they share the registry with the encodings? It looks like the proper term would be "transformations" or "transforms". .transform() is just the name of the method. The codecs are still just that: codecs, i.e. objects that encode and decode data. The types they support are defined by the codecs, not by the helper methods. In Python3, the str and bytes methods .encode() and .decode() will only support str->bytes->str conversions. The new str and bytes .transform() method adds back str->str and bytes->bytes. The codec subsystem does not impose restrictions on the type combinations a codec can support, and that's per design. |
|||
msg123435 - (view) | Author: Martin v. Löwis (loewis) * ![]() |
Date: 2010-12-05 19:04 | |
As per http://mail.python.org/pipermail/python-dev/2010-December/106374.html I think this checkin should be reverted, as it's breaking the language moratorium. |
|||
msg123436 - (view) | Author: Georg Brandl (georg.brandl) * ![]() |
Date: 2010-12-05 19:12 | |
I leave this to MAL, on whose behalf I finished this to be in time for beta. |
|||
msg123462 - (view) | Author: Marc-Andre Lemburg (lemburg) * ![]() |
Date: 2010-12-06 11:49 | |
Martin v. Löwis wrote: > > Martin v. Löwis <martin@v.loewis.de> added the comment: > > As per > > http://mail.python.org/pipermail/python-dev/2010-December/106374.html > > I think this checkin should be reverted, as it's breaking the language moratorium. I've asked Guido. We may have to revert the addition of the new methods and then readd them for 3.3, but I don't really see them as difficult to implement for the other Python implementations, since they are just interfaces to the codec sub-system. The readdition of the codecs and changes to support them in the codec system do not fall under the moratorium, since they are stdlib changes. |
|||
msg123693 - (view) | Author: Alexander Belopolsky (belopolsky) * ![]() |
Date: 2010-12-09 18:43 | |
With Georg's approval, I am reopening this issue until a decision is made on whether {str,bytes,bytearray}.{transform,untransform} methods should go into 3.2. I am adding Guido to "nosy" because the decision may turn on the interpretation of his post. [1] I also started a python-dev thread on this issue. [2] [1] http://mail.python.org/pipermail/python-dev/2010-December/106374.html [2] http://mail.python.org/pipermail/python-dev/2010-December/106617.html |
|||
msg125073 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2011-01-02 19:01 | |
See issue #10807: 'base64' can be used with bytes.decode() (and str.encode()), but it raises a confusing exception (TypeError: expected bytes, not memoryview). |
|||
msg145246 - (view) | Author: Éric Araujo (eric.araujo) * ![]() |
Date: 2011-10-09 09:18 | |
So. This was reverted before 3.2 was out, right? What is the status for 3.3? |
|||
msg145656 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2011-10-17 00:53 | |
What is the status of this issue? rot13 codecs & friends were added back to Python 3.2 with {bytes,str}.(un)transform() methods: commit 7e4833764c88. Codecs were disabled because of surprising error messages before the release of Python 3.2 final: issue #10807, commit ff1261a14573. transform() and untransform() methods were also removed, I don't remember why/how exactly, maybe because new codecs were disabled. So we have rot13 & friends in Python 3.2 and 3.3, but they cannot be used with the regular str.encode('rot13'), you have to write (for example): >>> codecs.getdecoder('rot_13')('rot13') ('ebg13', 5) >>> codecs.getencoder('rot_13')('ebg13') ('rot13', 5) The major issue with {bytes,str}.(un)transform() is that we have only one registry for all codecs, and the registry was changed in Python 3 to ensure: * encode: str->bytes * decode: bytes->str To implement str.transform(), we need another register. Marc-Andre suggested (msg96374) to add tags to codecs: """ .encode_input_types = (str,) .encode_output_types = (bytes,) .decode_input_types = (bytes,) .decode_output_types = (str,) """ I'm still opposed to str->str (rot13) and bytes->bytes (hex, gzip, ...) operations using the codecs API. Developers have to use the right module. If the API of these modules is too complex, we should add helpers to these modules, but not to builtin types. Builtin types have to be and stay simple and well defined. |
|||
msg145693 - (view) | Author: Éric Araujo (eric.araujo) * ![]() |
Date: 2011-10-17 13:38 | |
> transform() and untransform() methods were also removed, I don't remember why/how exactly, I don’t remember either; maybe it was too late in the release process, or we lacked enough consensus. > So we have rot13 & friends in Python 3.2 and 3.3, but they cannot be used with the regular > str.encode('rot13'), you have to write (for example): codecs.getdecoder('rot_13') Ah, great, I thought they were not available at all! > The major issue with {bytes,str}.(un)transform() is that we have only one registry for all > codecs, and the registry was changed in Python 3 [...] To implement str.transform(), we need > another register. Marc-Andre suggested (msg96374) to add tags to codecs I’m confused: does the tags idea replace the idea of adding another registry? > I'm still opposed to str->str (rot13) and bytes->bytes (hex, gzip, ...) operations using the > codecs API. Developers have to use the right module. Well, here I disagree with you and agree with MAL: str.encode and bytes.decode are strict, but the codec API in general is not restricted to str→bytes and bytes→str directions. Using the zlib or base64 modules vs. the codecs is a matter of style; sometimes you think it looks hacky, sometimes you think it’s very handy. And rot13 only exists as a codec! |
|||
msg145897 - (view) | Author: Alyssa Coghlan (ncoghlan) * ![]() |
Date: 2011-10-19 11:35 | |
They were removed because adding new methods to builtin types violated the language moratorium. Now that the language moratorium is over, the transform/untransform convenience APIs should be added again for 3.3. It's an approved change, the original timing was just wrong. |
|||
msg145900 - (view) | Author: Alyssa Coghlan (ncoghlan) * ![]() |
Date: 2011-10-19 11:58 | |
Sorry, I meant to state my rationale for the unassignment - I'm assuming this issue is covered by MAL's recent decision to step away from Unicode and codec maintenance issues. If that's incorrect, MAL can reclaim the issue, otherwise unassigning leaves it open for whoever wants to move it forward. |
|||
msg145979 - (view) | Author: Alyssa Coghlan (ncoghlan) * ![]() |
Date: 2011-10-19 22:09 | |
Some further comments after getting back up to speed with the actual status of this problem (i.e. that we had issues with the error checking and reporting in the original 3.2 commit). 1. I agree with the position that the codecs module itself is intended to be a type neutral codec registry. It encodes and decodes things, but shouldn't actually care about the types involved. If that is currently not the case in 3.x, it needs to be fixed. This type neutrality was blurred in 2.x by the fact that it only implemented str->str translations, and even further obscured by the coupling to the .encode() and .decode() convenience APIs. The fact that the type neutrality of the registry itself is currently broken in 3.x is a *regression*, not an improvement. (The convenience APIs, on the other hand, are definitely *not* type neutral, and aren't intended to be) 2. To assist in producing nice error messages, and to allow restrictions to be enforced on type-specific convenience APIs, the CodecInfo objects should grow additional state as MAL suggests. To avoid redundancy (and inaccurate overspecification), my suggested colour for that particular bikeshed is: Character encoding codec: .decoded_format = 'text' .encoded_format = 'binary' Binary transform codec: .decoded_format = 'binary' .encoded_format = 'binary' Text transform codec: .decoded_format = 'text' .encoded_format = 'text' I suggest using the fuzzy format labels mainly due to the existence of the buffer API - most codec operations that consume binary data will accept anything that implements the buffer API, so referring specifically to 'bytes' in error messages would be inaccurate. The convenience APIs can then emit errors like: 'a'.encode('rot_13') ==> CodecLookupError: text <-> binary codec expected ('rot_13' is text <-> text) 'a'.decode('rot_13') ==> CodecLookupError: text <-> binary codec expected ('rot_13' is text <-> text) 'a'.transform('bz2') ==> CodecLookupError: text <-> text codec expected ('bz2' is binary <-> binary) 'a'.transform('ascii') ==> CodecLookupError: text <-> text codec expected ('ascii' is text <-> binary) b'a'.transform('ascii') ==> CodecLookupError: binary <-> binary codec expected ('ascii' is text <-> binary) For backwards compatibility with 3.2, codecs that do not specify their formats should be treated as character encoding codecs (i.e. decoded format is 'text', encoded format is 'binary') |
|||
msg145980 - (view) | Author: Alyssa Coghlan (ncoghlan) * ![]() |
Date: 2011-10-19 22:12 | |
Oops, typo in my second error example. The command should be: b'a'.decode('rot_13') (Since str objects don't offer a decode() method any more) |
|||
msg145982 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2011-10-19 22:34 | |
> *.encode('rot_13') ==> CodecLookupError I like the idea of raising a lookup error on .encode/.decode if the codec is not a classic text codec (like ASCII or UTF-8). > *.transform('ascii') ==> CodecLookupError Same comment. > str.transform('bz2') ==> CodecLookupError A lookup error is surprising here. It may be a TypeError instead. The bz2 can be used with .transform, but not on str. So: - Lookup error if the codec cannot be used with encode/decode or transform/untransform - Type error if the value type is invalid (CodecLookupError doesn't exist, you propose to define a new exception who inherits from LookupError?) |
|||
msg145986 - (view) | Author: Alyssa Coghlan (ncoghlan) * ![]() |
Date: 2011-10-19 22:54 | |
On Thu, Oct 20, 2011 at 8:34 AM, STINNER Victor <report@bugs.python.org> wrote: >> str.transform('bz2') ==> CodecLookupError > > A lookup error is surprising here. It may be a TypeError instead. The bz2 can be used with .transform, but not on str. So: No, it's the same concept as the other cases - we found a codec with the requested name, but it's not the kind of codec we wanted in the current context (i.e. str.transform). It may be that the problem is the user has a str when they expected to have a bytearray or a bytes object, but there's no way for the codec lookup process to know that. > - Lookup error if the codec cannot be used with encode/decode or transform/untransform > - Type error if the value type is invalid There's no way for str.transform to tell the difference between "I asked for the wrong codec" and "I expected to have a bytes object here, not a str object". That's why I think we need to think in terms of format checks rather than type checks. > (CodecLookupError doesn't exist, you propose to define a new exception who inherits from LookupError?) Yeah, and I'd get that to handle the process of creating the nice error messages. I think it may even make sense to build the filtering options into codecs.lookup() itself: def lookup(encoding, decoded_format=None, encoded_format=None): info = _lookup(encoding) # The existing codec lookup algorithm if ((decoded_format is not None and decoded_format != info.decoded_format) or (encoded_format is not None and encoded_format != info.encoded_format)): raise CodecLookupError(info, decoded_format, encoded_format) Then the various encode, decode and transform methods can just pass the appropriate arguments to 'codecs.lookup' without all having to reimplement the format checking logic. |
|||
msg145991 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2011-10-19 23:10 | |
> I think it may even make sense to build the filtering > options into codecs.lookup() itself: > > def lookup(encoding, decoded_format=None, encoded_format=None): > info = _lookup(encoding) # The existing codec lookup algorithm > if ((decoded_format is not None and decoded_format != > info.decoded_format) or > (encoded_format is not None and encoded_format != > info.encoded_format)): > raise CodecLookupError(info, decoded_format, encoded_format) lookup('rot13') should fail with a lookup error to keep backward compatibility. You can just change the default values to: def lookup(encoding, decoded_format='text', encoded_format='binary'): ... If you patch lookup, what about the following functions? - getencoder() - getdecoder() - getincrementalencoder() - getincrementaldecoder() - getread() - getwriter() - itereencode() |
|||
msg145998 - (view) | Author: Alyssa Coghlan (ncoghlan) * ![]() |
Date: 2011-10-20 01:53 | |
I'm fine with people needing to drop down to the lower level lookup() API if they want the filtering functionality in Python code. For most purposes, constraining the expected codec input and output formats really isn't a major issue - we just need it in the core in order to emit sane error messages when people misuse the convenience APIs based on things that used to work in 2.x (like 'a'.encode('base64')). At the C level, I'd adjust _PyCodec_Lookup to accept the two extra arguments and add _PyCodec_EncodeText, _PyCodec_DecodeBinary, _PyCodec_TransformText and _PyCodec_TransformBinary to support the convenience APIs (rather than needing the individual objects to know about the details of the codec tagging mechanism). Making new codecs available isn't a backwards compatibility problem - anyone relying on a particular key being absent from an extensible registry is clearly doing the wrong thing. Regarding the particular formats, I'd suggest that hex, base64, quopri, uu, bz2 and zlib all be flagged as binary transforms, but rot13 be implemented as a text transform (Florent's patch has rot13 as another binary transform, but it makes more sense in the text domain - this should just be a matter of adjusting some of the data types in the implementation from bytes to str) |
|||
msg149439 - (view) | Author: Petri Lehtinen (petri.lehtinen) * ![]() |
Date: 2011-12-14 10:51 | |
Issue 13600 has been marked as a duplicate of this issue. FRT, +1 to the idea of adding encoded_format and decoded_format attributes to CodecInfo, and also to adding {str,bytes}.{transform,untransform} back. |
|||
msg153304 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2012-02-13 21:17 | |
What is the status of this issue? Is there still a fan of this issue motivated to write a PEP, a patch or something like that? |
|||
msg153317 - (view) | Author: Alyssa Coghlan (ncoghlan) * ![]() |
Date: 2012-02-14 03:25 | |
It's still on my radar to come back and have a look at it. Feedback from the web folks doing Python 3 migrations is that it would have helped them in quite a few cases. I want to get a couple of other open PEPs out of the way first, though (mainly 394 and 409) |
|||
msg164224 - (view) | Author: Alyssa Coghlan (ncoghlan) * ![]() |
Date: 2012-06-28 07:13 | |
My current opinion is that this should be a PEP for 3.4, to make sure we flush out all the corner cases and other details correctly. |
|||
msg164226 - (view) | Author: Alyssa Coghlan (ncoghlan) * ![]() |
Date: 2012-06-28 07:26 | |
For that matter, with the relevant codecs restored in 3.2, a transform() helper could probably be added to six (or a new project on PyPI) to prototype the approach. |
|||
msg164237 - (view) | Author: Alyssa Coghlan (ncoghlan) * ![]() |
Date: 2012-06-28 10:41 | |
Setting as a release blocker for 3.4 - this is important. |
|||
msg165435 - (view) | Author: Alyssa Coghlan (ncoghlan) * ![]() |
Date: 2012-07-14 07:36 | |
FWIW it's, I've been thinking further about this recently and I think implementing this feature as builtin methods is the wrong way to approach it. Instead, I propose the addition of codecs.encode and codecs.decode methods that are type neutral (leaving any type checks entirely up to the codecs themselves), while the str.encode and bytes.decode methods retain their current strict test model related type restrictions. Also, I now think my previous proposal for nice error messages was massively over-engineered. A much simpler approach is to just replace the status quo: >>> "".encode("bz2_codec") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/ncoghlan/devel/py3k/Lib/encodings/bz2_codec.py", line 17, in bz2_encode return (bz2.compress(input), len(input)) File "/home/ncoghlan/devel/py3k/Lib/bz2.py", line 443, in compress return comp.compress(data) + comp.flush() TypeError: 'str' does not support the buffer interface with a better error with more context like: UnicodeEncodeError: encoding='bz2_codec', errors='strict', codec_error="TypeError: 'str' does not support the buffer interface" A similar change would be straightforward on the decoding side. This would be a good use case for __cause__, but the codec error should still be included in the string representation. |
|||
msg170414 - (view) | Author: Uzume (uzume) | Date: 2012-09-12 19:09 | |
Many have chimed in on this topic but I thought I would lend my stance--for whatever it is worth. I also believe most of these do not fit concept of a character codec and some sort of transforms would likely be useful, however most are sort of specialized (e.g., there should probably be a generalized compression library interface al la hashlib): rot13: a (albeit simplistic) text cipher (str to str; though bytes to bytes could be argued since since many crypto functions do that) zlib, bz2, etc. (lzma/xz should also be here): all bytes to bytes compression transforms hex(adecimal) uu, base64, etc.: these more or less fit the description of a character codec as they map between bytes and str, however, I am not sure they are really the same thing as these are basically doing a radix transformation to character symbols and the mapping it not strictly from bytes to a single character and back as a true character codec seems to imply. As evidenced by by int() format() and bytes.fromhex(), float.hex(), float.fromhex(), etc., these are more generalized conversions for serializing strings of bits into a textual representation (possibly for human consumption). I personally feel any <type/class>.hex(), etc. method would be better off as a format() style formatter if they are to exist in such a space at all (i.e., not some more generalized conversion library--which we have but since 3.x could probably use to be updated and cleaned up). |
|||
msg187630 - (view) | Author: Florent Xicluna (flox) * ![]() |
Date: 2013-04-23 12:05 | |
Another rant, because it matters to many of us: http://lucumr.pocoo.org/2012/8/11/codec-confusion/ IMHO, the solution to restore str.decode and bytes.encode and return TypeError for improper use is probably the most obvious for the average user. |
|||
msg187631 - (view) | Author: Ezio Melotti (ezio.melotti) * ![]() |
Date: 2013-04-23 12:15 | |
-1 I see encoding as the process to go from text to bytes, and decoding the process to go from bytes to text, so (ab)using these terms for other kind of conversions is not an option IMHO. Anyway I think someone should write a PEP and list the possible options and their pro and cons, and then a decision can be taken on python-dev. FTR in Python 2 you can use decode for bytes->text, text->text, bytes->bytes, and even text->bytes: u'DEADBEEF'.decode('hex') '\xde\xad\xbe\xef' |
|||
msg187634 - (view) | Author: R. David Murray (r.david.murray) * ![]() |
Date: 2013-04-23 12:42 | |
transform/untransform has approval-in-principle, adding encode/decode to the type that doesn't have them has been explicitly (and repeatedly :) rejected. (I don't know about anybody else, but at this point I have written code that assumes that if an object has an 'encode' method, calling it will get me a bytes, and vice versa with 'decode'...an assumption I know is not "safe", but that I feel is useful duck typing in the contexts in which I used it.) Nick wants a PEP, other people have said a PEP isn't necessary. What is certainly necessary is for someone to pick up the ball and run with it. |
|||
msg187636 - (view) | Author: Florent Xicluna (flox) * ![]() |
Date: 2013-04-23 12:54 | |
I am not a native english speaker, but it seems that the common usage of encode/decode is wider than the restricted definition applied for Python 3.3: Some examples: * RFC 4648 specifies "Base16, Base32, and Base64 Data Encodings" http://tools.ietf.org/html/rfc4648 * About rot13: "the same code can be used for encoding and decoding" http://www.catb.org/~esr/jargon/html/R/rot13.html * The Huffman coding is "an entropy encoding algorithm" (used for DEFLATE) http://en.wikipedia.org/wiki/Huffman_coding * RFC 2616 lists (zlib's) deflate or gzip as "encoding transformations" http://tools.ietf.org/html/rfc2616#section-3.5 However, I acknowledge that there are valid reasons to choose a different verb too. |
|||
msg187638 - (view) | Author: Ezio Melotti (ezio.melotti) * ![]() |
Date: 2013-04-23 12:59 | |
While not strictly necessary, a PEP would be certainly useful and will help reaching a consensus. The PEP should provide a summary of the available options (transform/untransforms, reintroducing encode/decode for bytes/str, maybe others), their intended behavior (e.g. is type(x.transform()) == type(x) always true?), and possible issues (e.g. Should some transformations be limited to str or bytes? Should rot13 work with both transform and untransform?). Even if we all agreed on a solution, such document would still be useful IMHO. |
|||
msg187644 - (view) | Author: Alyssa Coghlan (ncoghlan) * ![]() |
Date: 2013-04-23 13:46 | |
+1 for someone stepping up to write a PEP on this if they would like to see the situation improved in 3.4. transform/untransform has at least one core developer with an explicit -1 on the proposal at the moment (me). We *definitely* need a generic object->object convenience API in the codecs module (codecs.decode, codecs.encode). I even accept that those two functions could be worthy of elevation to be new builtin functions. I'm *far* from convinced that awkwardly named methods that only handle str->object, bytes->object and bytearray->object are a good idea. Should memoryview gain transform/untransform methods as well? transform/untransform as proposed aren't even inverse operations, since they don't swap the valid input and output types (that is, transform is str/bytes/bytearray to arbitrary objects, while untransform is *also* str/bytes/bytearray to arbitrary objects. Inverses can't have a domain/range mismatch like that). Those names are also ambiguous about which one corresponds to "encoding" and which to "decoding". encode() and decode(), whether as functions in the codecs module or as builtins, have no such issue. Personally, the more I think about it, the more I'm in favour of adding encode and decode as builtin functions for 3.4. If you want arbitrary object->object conversions, use the builtins, if you want strict str->bytes or bytes/bytearray->str use the methods. Python 3 has been around long enough now, and Python 3.2 and 3.3 are sufficiently well known that I think we can add the full power builtins without people getting confused. |
|||
msg187649 - (view) | Author: R. David Murray (r.david.murray) * ![]() |
Date: 2013-04-23 14:41 | |
I was visualizing transform/untransform as being restricted to buffertype->bytes and stringtype->string, which at least for binascii-type transforms is all the modules support. After all, you don't get to choose what type of object you get back from encode or decode. A more generalized transformation (encode/decode) utility is also interesting, but how many non-string non-bytes transformations do we actually support? |
|||
msg187651 - (view) | Author: Alyssa Coghlan (ncoghlan) * ![]() |
Date: 2013-04-23 14:55 | |
If transform is a method, how do you plan to accept arbitrary buffer supporting types as input? This is why I mentioned memoryview: it doesn't provide decode(), but there's no good reason you should have to copy the data from the view before decoding it. Similarly, you shouldn't have to make an unaltered copy before creating a compressed (or decompressed) copy. With codecs.encode and codecs.decode as functions, supporting memoryview as an input for bytes->str decoding, binary->bytes encoding (e.g. gzip compression) and binary->bytes decoding (e.g. gzip decompression) is trivial. Ditto for array.array and anything else that supports the buffer protocol. With transform/untransform as methods? No such luck. And once you're using functions rather than methods, it's best to define the API as object -> object, and leave any type constraints up to the individual codecs (with the error handling improved to provide more context and a more meaningful exception type, as I described earlier in the thread) |
|||
msg187652 - (view) | Author: R. David Murray (r.david.murray) * ![]() |
Date: 2013-04-23 15:02 | |
I agree with you. transform/untransform are parallel to encode/decode, and I wouldn't expect them to exist on any type that didn't support either encode or decode. They are convenience methods, just as encode/decode are. I am also probably not invested enough in it to write the PEP :) |
|||
msg187653 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2013-04-23 15:42 | |
str.decode() and bytes.encode() are not coming back. Any proposal had better take into account the API design rule that the *type* of a method's return value should not depend on the *value* of one of the arguments. (The Python 2 design failed this test, and that's why we changed it.) It is however fine to let the return type depend on one of the argument *types*. So e.g. bytes.transform(enc) -> bytes and str.transform(enc) -> str are fine. And so are e.g. transform(bytes, enc) -> bytes and transform(str, enc) -> str. But a transform() taking bytes that can return either str or bytes depending on the encoding name would be a problem. Personally I don't think transformations are so important or ubiquitous so as to deserve being made new bytes/str methods. I'd be happy with a convenience function, for example transform(input, codecname), that would have to be imported from somewhere (maybe the codecs module). My guess is that in almost all cases where people are demanding to say e.g. x = y.transform('rot13') the codec name is a fixed literal, and they are really after minimizing the number of imports. Personally, disregarding the extra import line, I think x = rot13.transform(y) looks better though. Such custom APIs also give the API designer (of the transformation) more freedom to take additional optional parameters affecting the transformation, offer a set of variants, or a richer API. |
|||
msg187660 - (view) | Author: Georg Brandl (georg.brandl) * ![]() |
Date: 2013-04-23 17:38 | |
FWIW, I'm not interested in seeing this added anymore. |
|||
msg187668 - (view) | Author: Gregory P. Smith (gregory.p.smith) * ![]() |
Date: 2013-04-23 19:26 | |
consensus here appears to be "bad idea... don't do this." |
|||
msg187670 - (view) | Author: Alyssa Coghlan (ncoghlan) * ![]() |
Date: 2013-04-23 21:46 | |
No, transform/untransform as methods are a bad idea, but these *codecs* should definitely come back. The minimal change needed for that to be feasible is to give errors raised during encoding and decoding more context information (at least the codec name and error mode, and switching to the right kind of error). MAL also stated on python-dev that codecs.encode and codecs.decode already exist, so it should just be a matter of documenting them properly. |
|||
msg187673 - (view) | Author: Gregory P. Smith (gregory.p.smith) * ![]() |
Date: 2013-04-23 22:19 | |
okay, but i don't personally find any of these to be good ideas as "codecs" given they don't have anything to do with translating between bytes<->unicode. |
|||
msg187676 - (view) | Author: Alyssa Coghlan (ncoghlan) * ![]() |
Date: 2013-04-23 23:07 | |
The codecs module is generic, text encodings are just the most common use case (hence the associated method API). |
|||
msg187695 - (view) | Author: Martin v. Löwis (loewis) * ![]() |
Date: 2013-04-24 11:45 | |
I don't see any point in merely bringing the codecs back, without any convenience API to use them. If I need to do import codecs result = codecs.getencoder("base64").encode(data) I don't think people would actually prefer this over import base64 result = base64.encodebytes(data) I't (IMO) only the convenience method (.encode) that made people love these codecs. |
|||
msg187696 - (view) | Author: Ezio Melotti (ezio.melotti) * ![]() |
Date: 2013-04-24 12:20 | |
IMHO it's also a documentation problem. Once people figure out that they can't use encode/decode anymore, it's not immediately clear what they should do instead. By reading the codecs docs[0] it's not obvious that it can be done with codecs.getencoder("...").encode/decode, so people waste time finding a solution, get annoyed, and blame Python 3 because it removed a simple way to use these codecs without making clear what should be used instead. FWIW I don't care about having to do an extra import, but indeed something simpler than codecs.getencoder("...").encode/decode would be nice. [0]: http://docs.python.org/3/library/codecs.html |
|||
msg187698 - (view) | Author: Alyssa Coghlan (ncoghlan) * ![]() |
Date: 2013-04-24 13:43 | |
It turns out MAL added the convenience API I'm looking for back in 2004, it just didn't get documented, and is hidden behind the "from _codecs import *" call in the codecs.py source code: http://hg.python.org/cpython-fullhistory/rev/8ea2cb1ec598 So, all the way from 2.4 to 2.7 you can write: from codecs import encode result = encode(data, "base64") It works in 3.x as well, you just need to add the "_codec" to the end to account for the missing aliases: >>> encode(b"example", "base64_codec") b'ZXhhbXBsZQ==\n' >>> decode(b"ZXhhbXBsZQ==\n", "base64_codec") b'example' Note that the convenience functions omit the extra checks that are part of the methods (although I admit the specific error here is rather quirky): >>> b"ZXhhbXBsZQ==\n".decode("base64_codec") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python3.2/encodings/base64_codec.py", line 20, in base64_decode return (base64.decodebytes(input), len(input)) File "/usr/lib64/python3.2/base64.py", line 359, in decodebytes raise TypeError("expected bytes, not %s" % s.__class__.__name__) TypeError: expected bytes, not memoryview I'me going to create some additional issues, so this one can return to just being about restoring the missing aliases. |
|||
msg187701 - (view) | Author: Marc-Andre Lemburg (lemburg) * ![]() |
Date: 2013-04-24 13:47 | |
Just copying some details here about codecs.encode() and codec.decode() from python-dev: """ Just as reminder: we have the general purpose encode()/decode() functions in the codecs module: import codecs r13 = codecs.encode('hello world', 'rot-13') These interface directly to the codec interfaces, without enforcing type restrictions. The codec defines the supported input and output types. """ As Nick found, these aren't documented, which is a documentation bug (I probably forgot to add documentation back then). They have been in Python since 2004: http://hg.python.org/cpython-fullhistory/rev/8ea2cb1ec598 These API are nice for general purpose codec work and that's why I added them back in 2004. For the codecs in question, it would still be nice to have a more direct way to access them via methods on the types that you typically use them with. |
|||
msg187702 - (view) | Author: Ezio Melotti (ezio.melotti) * ![]() |
Date: 2013-04-24 13:53 | |
> It works in 3.x as well, you just need to add the "_codec" to the end > to account for the missing aliases: FTR this is because of ff1261a14573 (see #10807). |
|||
msg187705 - (view) | Author: Alyssa Coghlan (ncoghlan) * ![]() |
Date: 2013-04-24 14:11 | |
Issue 17827 covers adding documentation for codecs.encode and codecs.decode Issue 17828 covers adding exception handling improvements for all encoding and decoding operations |
|||
msg187707 - (view) | Author: Alyssa Coghlan (ncoghlan) * ![]() |
Date: 2013-04-24 14:22 | |
For me, the killer argument *against* a method based API is memoryview (and, equivalently, array.array). It should be possible to use those as inputs for the bytes->bytes codecs, and once you endorse codecs.encode and codecs.decode for that use case, it's hard to justify adding more exclusive methods to the already broad bytes and bytearray APIs (particularly given the problems with conveying direction of conversion unambiguously). By contrast, I think "the codecs functions are generic while the str, bytes and bytearray methods are specific to text encodings" is something we can explain fairly easily, thus allowing the aliases mentioned in this issue to be restored for use with the codecs module functions. To avoid reintroducing the quirky errors described in issue 10807, the encoding and decoding error messages should first be improved as discussed in issue 17828. |
|||
msg187764 - (view) | Author: Alyssa Coghlan (ncoghlan) * ![]() |
Date: 2013-04-25 07:49 | |
Also adding 17839 as a dependency, since part of the reason the base64 errors in particular are so cryptic is because the base64 module doesn't accept arbitrary PEP 3118 compliant objects as input. |
|||
msg187770 - (view) | Author: Alyssa Coghlan (ncoghlan) * ![]() |
Date: 2013-04-25 08:31 | |
I also created issue 17841 to cover that that the 3.3 documentation incorrectly states that these aliases still exist, even though they were removed before 3.2 was released. |
|||
msg198845 - (view) | Author: Alyssa Coghlan (ncoghlan) * ![]() |
Date: 2013-10-02 15:08 | |
With issue 17839 fixed, the error from invoking the base64 codec through the method API is now substantially more sensible: >>> b"ZXhhbXBsZQ==\n".decode("base64_codec") Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: decoder did not return a str object (type=bytes) |
|||
msg198846 - (view) | Author: Alyssa Coghlan (ncoghlan) * ![]() |
Date: 2013-10-02 15:13 | |
I just wanted to note something I realised in chatting to Armin Ronacher recently: in both Python 2.x and 3.x, the encode/decode method APIs are constrained by the text model, it's just that in 2.x that model was effectively basestring<->basestring, and thus still covered every codec in the standard library. This greatly limited the use cases for the codecs.encode/decode convenience functions, which is why the fact they were undocumented went unnoticed. In 3.x, the changed text model meant the method API become limited to the Unicode codecs, making the function based API more important. |
|||
msg202130 - (view) | Author: Alyssa Coghlan (ncoghlan) * ![]() |
Date: 2013-11-04 13:21 | |
For anyone interested, I have a patch up on issue 17828 that produces the following output for various codec usage errors: >>> import codecs >>> codecs.encode(b"hello", "bz2_codec").decode("bz2_codec") Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'bz2_codec' decoder returned 'bytes' instead of 'str'; use codecs.decode to decode to arbitrary types >>> "hello".encode("bz2_codec") TypeError: 'str' does not support the buffer interface The above exception was the direct cause of the following exception: Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: invalid input type for 'bz2_codec' codec (TypeError: 'str' does not support the buffer interface) >>> "hello".encode("rot_13") Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'rot_13' encoder returned 'str' instead of 'bytes'; use codecs.encode to encode to arbitrary types |
|||
msg202264 - (view) | Author: Alyssa Coghlan (ncoghlan) * ![]() |
Date: 2013-11-06 12:41 | |
Providing the 2to3 fixers in issue 17823 now depends on this issue rather than the other way around (since not having to translate the names simplifies the fixer a bit). |
|||
msg202515 - (view) | Author: Alyssa Coghlan (ncoghlan) * ![]() |
Date: 2013-11-10 09:25 | |
Issue 17823 is now closed, but not because it has been implemented. It turns out that the data driven nature of the incompatibility means it isn't really amenable to being detected and fixed automatically via 2to3. Issue 19543 is a replacement proposal for the introduction of some additional codec related Py3k warnings in Python 2.7.7. |
|||
msg203124 - (view) | Author: Alyssa Coghlan (ncoghlan) * ![]() |
Date: 2013-11-17 07:41 | |
Attached patch restores the aliases for the binary and text transforms, adds a test to ensure they exist and restores the "Aliases" column to the relevant tables in the documentation. It also updates the relevant section in the What's New document. I also tweaked the wording in the docs to use the phrases "binary transform" and "text transform" for the affected tables and version added/changed notices. Given the discussions on python-dev, the main condition that needs to be met before I commit this is for Victor to change his current -1 to a -0 or higher. |
|||
msg203378 - (view) | Author: Alyssa Coghlan (ncoghlan) * ![]() |
Date: 2013-11-19 14:25 | |
Victor is still -1, so to Python 3.5 it goes. |
|||
msg203751 - (view) | Author: Alyssa Coghlan (ncoghlan) * ![]() |
Date: 2013-11-22 12:44 | |
The 3.4 portion of issue 19619 has been addressed, so removing it as a dependency again. |
|||
msg203936 - (view) | Author: Alyssa Coghlan (ncoghlan) * ![]() |
Date: 2013-11-23 00:46 | |
With issue 19619 resolved for Python 3.4 (the issue itself remains open awaiting a backport to 3.3), Victor has softened his stance on this topic and given the go ahead to restore the codec aliases: http://bugs.python.org/issue19619#msg203897 I'll be committing this shortly, after adjusting the patch to account for the issue 19619 changes to the tests and What's New. |
|||
msg203942 - (view) | Author: Roundup Robot (python-dev) ![]() |
Date: 2013-11-23 01:14 | |
New changeset 5e960d2c2156 by Nick Coghlan in branch 'default': Close #7475: Restore binary & text transform codecs http://hg.python.org/cpython/rev/5e960d2c2156 |
|||
msg203944 - (view) | Author: Alyssa Coghlan (ncoghlan) * ![]() |
Date: 2013-11-23 01:16 | |
Note that I still plan to do a documentation-only PEP for 3.4, proposing some adjustments to the way the codecs module is documented, making binary and test transform defined terms in the glossary, etc. I'll probably aim for beta 2 for that. |
|||
msg207283 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2014-01-04 13:34 | |
Docstrings for new codecs mention bytes.transform() and bytes.untransform() which are nonexistent. |
|||
msg213502 - (view) | Author: Roundup Robot (python-dev) ![]() |
Date: 2014-03-14 00:55 | |
New changeset d7950e916f20 by R David Murray in branch '3.3': #7475: Remove references to '.transform' from transform codec docstrings. http://hg.python.org/cpython/rev/d7950e916f20 New changeset 83d54ab5c696 by R David Murray in branch 'default': Merge #7475: Remove references to '.transform' from transform codec docstrings. http://hg.python.org/cpython/rev/83d54ab5c696 |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:55 | admin | set | github: 51724 |
2014-03-14 00:55:23 | python-dev | set | messages: + msg213502 |
2014-01-04 13:34:04 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages: + msg207283 |
2014-01-02 12:42:36 | jwilk | set | nosy:
+ jwilk |
2013-11-23 01:16:23 | ncoghlan | set | messages: + msg203944 |
2013-11-23 01:14:37 | python-dev | set | status: open -> closed nosy: + python-dev messages: + msg203942 resolution: fixed stage: resolved |
2013-11-23 00:46:51 | ncoghlan | set | assignee: ncoghlan messages: + msg203936 versions: + Python 3.4, - Python 3.5 |
2013-11-22 12:44:25 | ncoghlan | set | dependencies:
- Blacklist base64, hex, ... codecs from bytes.decode() and str.encode() messages: + msg203751 |
2013-11-21 13:35:20 | ncoghlan | set | dependencies: + Blacklist base64, hex, ... codecs from bytes.decode() and str.encode() |
2013-11-19 14:25:41 | ncoghlan | set | messages:
+ msg203378 versions: + Python 3.5, - Python 3.4 |
2013-11-17 07:41:29 | ncoghlan | set | files:
+ issue7475_restore_codec_aliases_in_py34.diff messages: + msg203124 |
2013-11-10 09:25:10 | ncoghlan | set | messages: + msg202515 |
2013-11-10 09:22:10 | ncoghlan | unlink | issue17823 dependencies |
2013-11-06 12:41:41 | ncoghlan | set | dependencies:
- 2to3 fixers for missing codecs messages: + msg202264 |
2013-11-06 12:40:42 | ncoghlan | link | issue17823 dependencies |
2013-11-04 13:21:33 | ncoghlan | set | messages: + msg202130 |
2013-10-02 15:18:13 | ncoghlan | set | versions: - Python 2.7, Python 3.3 |
2013-10-02 15:17:00 | ncoghlan | set | messages: - msg198847 |
2013-10-02 15:16:36 | ncoghlan | set | messages:
+ msg198847 versions: + Python 2.7, Python 3.3 |
2013-10-02 15:13:49 | ncoghlan | set | messages: + msg198846 |
2013-10-02 15:08:16 | ncoghlan | set | messages: + msg198845 |
2013-05-02 22:46:38 | isoschiz | set | nosy:
+ isoschiz |
2013-04-25 16:34:15 | gvanrossum | set | nosy:
- gvanrossum |
2013-04-25 11:43:30 | serhiy.storchaka | set | dependencies: + Add link to alternatives for bytes-to-bytes codecs |
2013-04-25 08:31:46 | ncoghlan | set | messages: + msg187770 |
2013-04-25 07:53:34 | serhiy.storchaka | set | dependencies: + 2to3 fixers for missing codecs |
2013-04-25 07:49:12 | ncoghlan | set | dependencies:
+ base64 module should use memoryview messages: + msg187764 |
2013-04-24 14:22:38 | ncoghlan | set | dependencies:
+ More informative error handling when encoding and decoding messages: + msg187707 |
2013-04-24 14:11:28 | ncoghlan | set | messages: + msg187705 |
2013-04-24 13:53:35 | ezio.melotti | set | messages: + msg187702 |
2013-04-24 13:47:10 | lemburg | set | messages: + msg187701 |
2013-04-24 13:43:13 | ncoghlan | set | messages: + msg187698 |
2013-04-24 12:20:46 | ezio.melotti | set | messages: + msg187696 |
2013-04-24 11:45:23 | loewis | set | messages: + msg187695 |
2013-04-23 23:07:32 | ncoghlan | set | messages: + msg187676 |
2013-04-23 22:19:41 | gregory.p.smith | set | status: closed -> open resolution: wont fix -> (no value) messages: + msg187673 stage: resolved -> (no value) |
2013-04-23 21:46:42 | ncoghlan | set | messages: + msg187670 |
2013-04-23 19:26:47 | gregory.p.smith | set | status: open -> closed priority: high -> normal nosy: + gregory.p.smith messages: + msg187668 resolution: wont fix stage: resolved |
2013-04-23 17:38:31 | georg.brandl | set | messages: + msg187660 |
2013-04-23 15:42:31 | gvanrossum | set | messages: + msg187653 |
2013-04-23 15:02:13 | r.david.murray | set | messages: + msg187652 |
2013-04-23 14:55:55 | ncoghlan | set | messages: + msg187651 |
2013-04-23 14:41:42 | r.david.murray | set | messages: + msg187649 |
2013-04-23 13:46:22 | ncoghlan | set | messages: + msg187644 |
2013-04-23 12:59:21 | ezio.melotti | set | messages: + msg187638 |
2013-04-23 12:54:04 | flox | set | messages: + msg187636 |
2013-04-23 12:42:27 | r.david.murray | set | nosy:
+ r.david.murray messages: + msg187634 |
2013-04-23 12:15:06 | ezio.melotti | set | messages: + msg187631 |
2013-04-23 12:05:43 | flox | set | messages: + msg187630 |
2013-04-22 18:39:06 | pconnell | set | nosy:
+ pconnell |
2013-04-01 18:06:30 | flox | set | nosy:
+ flox |
2013-04-01 18:06:19 | flox | set | nosy:
- flox |
2012-09-12 19:11:51 | uzume | set | nosy:
- uzume |
2012-09-12 19:09:57 | uzume | set | nosy:
+ uzume messages: + msg170414 |
2012-08-25 07:52:33 | ncoghlan | set | priority: release blocker -> high |
2012-07-14 10:51:15 | ezio.melotti | set | nosy:
+ ezio.melotti |
2012-07-14 07:36:42 | ncoghlan | set | messages: + msg165435 |
2012-06-28 10:41:30 | ncoghlan | set | priority: normal -> release blocker messages: + msg164237 stage: commit review -> (no value) |
2012-06-28 07:26:31 | ncoghlan | set | messages: + msg164226 |
2012-06-28 07:13:02 | ncoghlan | set | messages:
+ msg164224 versions: + Python 3.4, - Python 3.3 |
2012-02-19 04:16:27 | jcea | set | nosy:
+ jcea |
2012-02-14 03:25:58 | ncoghlan | set | messages: + msg153317 |
2012-02-13 21:17:55 | vstinner | set | messages: + msg153304 |
2012-02-13 21:11:44 | barry | set | nosy:
+ barry |
2011-12-14 10:51:53 | petri.lehtinen | set | nosy:
+ petri.lehtinen messages: + msg149439 |
2011-12-14 10:48:42 | petri.lehtinen | link | issue13600 superseder |
2011-10-20 01:53:08 | ncoghlan | set | messages: + msg145998 |
2011-10-19 23:10:52 | vstinner | set | messages: + msg145991 |
2011-10-19 22:54:41 | ncoghlan | set | messages: + msg145986 |
2011-10-19 22:34:48 | vstinner | set | messages: + msg145982 |
2011-10-19 22:12:37 | ncoghlan | set | messages: + msg145980 |
2011-10-19 22:09:43 | ncoghlan | set | messages: + msg145979 |
2011-10-19 11:58:38 | ncoghlan | set | messages: + msg145900 |
2011-10-19 11:35:38 | ncoghlan | set | assignee: lemburg -> (no value) messages: + msg145897 nosy: + ncoghlan |
2011-10-17 13:38:20 | eric.araujo | set | messages: + msg145693 |
2011-10-17 00:53:29 | vstinner | set | messages: + msg145656 |
2011-10-09 09:18:13 | eric.araujo | set | messages:
+ msg145246 components: - Documentation, 2to3 (2.x to 3.x conversion tool) |
2011-09-22 15:36:27 | cben | set | nosy:
+ cben |
2011-07-19 13:13:46 | eric.araujo | set | versions: + Python 3.3, - Python 3.2 |
2011-01-02 19:01:49 | vstinner | set | nosy:
lemburg, gvanrossum, loewis, georg.brandl, belopolsky, vstinner, benjamin.peterson, eric.araujo, ssbarnea, flox messages: + msg125073 |
2010-12-30 01:53:47 | belopolsky | link | issue3232 dependencies |
2010-12-09 18:43:33 | belopolsky | set | status: closed -> open type: enhancement components: + Unicode nosy: + gvanrossum messages: + msg123693 resolution: fixed -> (no value) stage: commit review |
2010-12-06 11:49:37 | lemburg | set | messages: + msg123462 |
2010-12-05 19:12:13 | georg.brandl | set | assignee: lemburg messages: + msg123436 |
2010-12-05 19:04:43 | loewis | set | messages: + msg123435 |
2010-12-03 08:46:28 | lemburg | set | messages: + msg123206 |
2010-12-03 01:40:10 | belopolsky | set | nosy:
+ belopolsky messages: + msg123154 |
2010-12-02 18:08:08 | georg.brandl | set | status: open -> closed resolution: fixed messages: + msg123090 |
2010-07-31 17:44:57 | flox | link | issue3532 superseder |
2010-07-10 18:35:06 | loewis | set | messages: + msg109905 |
2010-07-10 18:14:32 | eric.araujo | set | messages: + msg109904 |
2010-07-10 17:07:40 | lemburg | set | versions: - Python 3.1, Python 2.7 |
2010-07-10 17:06:57 | lemburg | set | messages: + msg109894 |
2010-07-10 15:36:30 | georg.brandl | set | messages: + msg109879 |
2010-07-10 15:36:19 | georg.brandl | set | messages: - msg109878 |
2010-07-10 15:36:07 | georg.brandl | set | messages: + msg109878 |
2010-07-10 15:24:32 | lemburg | set | messages: + msg109876 |
2010-07-10 14:24:54 | loewis | set | messages: + msg109872 |
2010-06-14 15:35:05 | ssbarnea | set | nosy:
+ ssbarnea messages: + msg107794 title: codecs missing: base64 bz2 hex zlib ... -> codecs missing: base64 bz2 hex zlib hex_codec ... |
2010-06-04 14:12:06 | eric.araujo | set | messages: + msg107057 |
2010-05-28 14:25:47 | eric.araujo | set | nosy:
+ eric.araujo |
2010-05-28 14:17:52 | lemburg | set | messages: + msg106674 |
2010-05-28 13:48:54 | vstinner | set | messages: + msg106670 |
2010-05-28 13:45:56 | vstinner | set | messages: + msg106669 |
2010-05-28 13:18:57 | vstinner | set | nosy:
+ vstinner |
2010-05-20 20:33:01 | skip.montanaro | set | nosy:
- skip.montanaro |
2009-12-19 18:09:41 | georg.brandl | set | assignee: georg.brandl -> (no value) |
2009-12-19 18:09:28 | georg.brandl | set | messages: + msg96632 |
2009-12-14 10:30:11 | lemburg | set | messages: + msg96374 |
2009-12-12 19:25:17 | loewis | set | messages: + msg96301 |
2009-12-12 15:44:22 | flox | set | messages: + msg96296 |
2009-12-12 15:40:27 | flox | set | messages: + msg96295 |
2009-12-11 23:09:08 | loewis | set | messages: + msg96277 |
2009-12-11 17:05:47 | flox | set | files:
+ issue7475_missing_codecs_py3k.diff messages: + msg96265 |
2009-12-11 13:13:50 | lemburg | set | messages: + msg96253 |
2009-12-11 12:54:39 | benjamin.peterson | set | messages: + msg96251 |
2009-12-11 10:22:23 | flox | set | nosy:
lemburg, loewis, skip.montanaro, georg.brandl, benjamin.peterson, flox messages: + msg96243 components: + Library (Lib) |
2009-12-11 09:56:57 | lemburg | set | messages: + msg96242 |
2009-12-11 09:47:23 | lemburg | set | resolution: not a bug -> (no value) |
2009-12-11 09:46:55 | lemburg | set | messages:
+ msg96240 title: No hint about codecs removed: base64 bz2 hex zlib ... -> codecs missing: base64 bz2 hex zlib ... |
2009-12-11 09:26:31 | flox | set | files:
+ issue7475_warning.diff keywords: + patch |
2009-12-11 08:33:13 | flox | set | title: No hint about codecs removed : base64 bz2 hex zlib ... -> No hint about codecs removed: base64 bz2 hex zlib ... |
2009-12-11 08:31:52 | flox | set | versions: + Python 2.7 |
2009-12-11 08:31:17 | flox | set | messages: + msg96237 |
2009-12-11 08:21:39 | flox | set | status: closed -> open assignee: georg.brandl components: + Documentation, 2to3 (2.x to 3.x conversion tool), - Library (Lib) title: codecs missing: base64 bz2 hex zlib ... -> No hint about codecs removed : base64 bz2 hex zlib ... nosy: + georg.brandl messages: + msg96236 |
2009-12-11 02:09:19 | benjamin.peterson | set | status: open -> closed nosy: + benjamin.peterson messages: + msg96232 |
2009-12-10 23:28:52 | loewis | set | messages: + msg96228 |
2009-12-10 23:26:10 | lemburg | set | status: closed -> open messages: + msg96227 |
2009-12-10 23:25:03 | lemburg | set | nosy:
+ lemburg messages: + msg96226 |
2009-12-10 23:15:12 | loewis | set | status: open -> closed nosy: + loewis messages: + msg96223 resolution: not a bug |
2009-12-10 22:52:04 | skip.montanaro | set | nosy:
+ skip.montanaro |
2009-12-10 22:27:38 | flox | create |