This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Encoding str to IDNA with ellipsis decomposes to empty labels
Type: behavior Stage:
Components: Library (Lib) Versions:
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: chfoo, loewis, r.david.murray
Priority: normal Keywords:

Created on 2014-03-30 17:57 by chfoo, last changed 2022-04-11 14:58 by admin.

Messages (3)
msg215189 - (view) Author: Christopher Foo (chfoo) Date: 2014-03-30 17:57
When encoding a string with the IDNA codec I expected that it will always raise an exception with empty labels. When I do this

    >>> 'example.c…'.encode('idna').decode('ascii')

it returns

    'example.c...'

instead of raising UnicodeError. The original string ends with U+2026 HORIZONTAL ELLIPSIS if you can't see it clearly. These strings are coming from web pages in a web crawler.

I tested this on Python 3.4, 3.3.2, 2.7.5, 2.6.9.
msg215198 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2014-03-30 19:53
I believe this behavior is correct wrt. RFC 3490. In the input, the last label is "c…", which is not empty. It is passed to ToASCII, which normalizes the ellipsis to "...". If UseSTD3ASCIIRules was true, conversion would fail as it yields "." (\x2E). However, Python choses not to set UseSTD3ASCIIRules (and instead leaves it to the DNS server to decide whether the name is valid).

I believe this is actually a bug in the RFC, which should ban "." from the the set of conversion results regardless of UseSTD3ASCIIRules. However, since this RFC is superseded, you probably won't get anybody to confirm this view.
msg215199 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-03-30 20:50
For whatever it is worth, it looks like rfc 5892 marks U+2026 as disallowed.
History
Date User Action Args
2022-04-11 14:58:01adminsetgithub: 65302
2014-03-30 20:50:37r.david.murraysetnosy: + r.david.murray
messages: + msg215199
2014-03-30 19:53:33loewissetnosy: + loewis
messages: + msg215198
2014-03-30 17:57:53chfoocreate