classification
Title: Wrong str->bytes conversion in Lib/encodings/idna.py
Type: enhancement Stage: test needed
Components: Library (Lib), Unicode Versions: Python 3.4
process
Status: open Resolution:
Dependencies: 7475 Superseder:
Assigned To: Nosy List: belopolsky, doerwalter, ezio.melotti, loewis, pitrou, r.david.murray, vstinner
Priority: normal Keywords:

Created on 2008-06-29 01:03 by pitrou, last changed 2013-02-23 06:46 by ezio.melotti.

Messages (7)
msg68931 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-06-29 01:03
Lib/encodings/idna.py claims to do the following when `input` is a
string object (lines 183-184, and see comment line 178: "IDNA allows
decoding to operate on Unicode strings, too"):

            # Force to bytes
            input = bytes(input)

This is obviously wrong, lacking an encoding parameter. It doesn't seem
to be covered in the test suite, and I don't know what the proper
semantics should be, so I leave it to someone else to find a fix.
msg69219 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-07-03 18:05
Martin, you seem to be the author of that module.
msg124895 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-12-30 00:48
Martin's original code (r32301) was pretty clear:

 32301     loewis         # IDNA allows decoding to operate on Unicode strings, too.
 32301     loewis         if isinstance(input, unicode):
 32301     loewis             labels = dots.split(input)
 32301     loewis         else:
 32301     loewis             # Must be ASCII string
 32301     loewis             unicode(input, "ascii")
 32301     loewis             labels = input.split(".")

but the py3k port, r55215, was clearly incomplete and the log message is explicit about it:


r55215 | guido.van.rossum | 2007-05-09 19:40:37 -0400 (Wed, 09 May 2007) | 3 lines

Random modifications that slightly improve the chances of this not blowing up.
Walter will fix it for real.

I hope I picked the right Walter for the "nosy" list.
msg124899 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-12-30 01:53
Arguably, it is not a bug if codec's decode method rejects unicode strings with a TypeError.  The 2.x implementation seems to allow decoding of ASCII-only unicode labels joined by arbitrary RFC 3490 separators.  I am not sure what the use case for this behavior would be.  In any case, supporting this would be a feature and it's acceptance would depend on the outcome of #7475.
msg124913 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-12-30 10:55
> Arguably, it is not a bug if codec's decode method rejects unicode
> strings with a TypeError.

Agreed, but it would be better if it did so deliberately and explicitly, rather than as a result of a bogus forward-port ;)
msg144682 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011-09-30 10:20
I agree that the codec shouldn't "decode" unicode strings. However, the operation performed is still meaningful: users may type ACE (ascii-compatibly-encoded) DNS names into a user interface, and the application may then represent this as a "proper" Unicode name.

So I propose these changes:

- remove support for bytes in codec, but only so for 3.3 (it's actually no change in behavior, since it will continue to raise TypeErrors)
- add a function decode_idna to the module, for users that wish to un-IDNA string objects.
msg144687 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-09-30 11:48
+1.  decode_idna is likely to be useful to the email package.
History
Date User Action Args
2013-02-23 06:46:54ezio.melottisettype: behavior -> enhancement
versions: + Python 3.4, - Python 3.3
2011-09-30 11:48:24r.david.murraysetmessages: + msg144687
2011-09-30 10:35:58vstinnersetnosy: + vstinner
2011-09-30 10:20:04loewissetmessages: + msg144682
2011-09-29 23:05:12ezio.melottisetnosy: + ezio.melotti
2011-07-19 12:50:40pitrousetnosy: + r.david.murray
2010-12-30 10:55:12pitrousetmessages: + msg124913
2010-12-30 01:53:47belopolskysetdependencies: + codecs missing: base64 bz2 hex zlib hex_codec ...
messages: + msg124899
versions: + Python 3.3, - Python 3.1
2010-12-30 00:48:10belopolskysetnosy: + belopolsky, doerwalter
messages: + msg124895
2009-05-16 20:34:11ajaksu2setpriority: normal
stage: test needed
versions: + Python 3.1, - Python 3.0
2008-07-03 18:05:46pitrousetnosy: + loewis
messages: + msg69219
2008-06-29 01:03:31pitroucreate