classification
Title: socket.create_connection error message for domain subpart with invalid length is very confusing
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.3, Python 3.4
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: giampaolo.rodola, loewis, mmilkin, r.david.murray
Priority: low Keywords: easy, patch

Created on 2010-08-25 18:08 by r.david.murray, last changed 2013-05-05 19:26 by mmilkin.

Files
File name Uploaded Description Edit
Issue9682.patch mmilkin, 2013-04-13 17:23 review
Issue9682-42213.patch mmilkin, 2013-04-23 01:04 review
Issue9682-5513.patch mmilkin, 2013-05-05 19:26 review
Messages (14)
msg114923 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-08-25 18:08
>>> socket.create_connection(('a..com', 25))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/rdmurray/python/py3k/Lib/socket.py", line 300, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
  File "/home/rdmurray/python/py3k/Lib/encodings/idna.py", line 167, in encode
    result.extend(ToASCII(label))
  File "/home/rdmurray/python/py3k/Lib/encodings/idna.py", line 73, in ToASCII
    raise UnicodeError("label empty or too long")
UnicodeError: label empty or too long

I have two problems with this: why is it a UnicodeError?  (That confused me into going down a blind alley before finding my typo in the original context where I encountered this).  The other problem is the term 'label'.  I realize this is technically correct and precise, but I doubt most users will recognize it (I didn't remember what it meant until I looked it up).  Could we perhaps change it to 'domain name subpart'?

Note that in 2.x this gives 'name or service not known' unless the input string is unicode, in which case it gives the error above.  So in 2.x the UnicodeError was at least not totally dissociated from the cause of the error, but still strikes me as sub-optimal.  I would expect a ValueError.
msg115029 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-08-26 21:30
> why is it a UnicodeError? 

Because IDNA is an encoding, and codecs are supposed to raise UnicodeErrors. The string you are trying to encode is not supported in the encoding in question.

It would be possible to catch the UnicodeError, and re-raise it as a socket error in the socket module.

> I would expect a ValueError.

Notice that a UnicodeError *is* a ValueError.

> Could we perhaps change it to 'domain name subpart'?

No. As you point out, "label" is the established, wide-spread, technical, RFC-supported term for the thing in question. "domain name subpart" is an ad-hoc wording that nobody else uses. I fail to see the advantage.

It would be possible to include the actual label into the exception (perhaps truncated if it's too long).
msg115036 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-08-26 22:31
Ah, I wondered if it was something like that (encoding=>UnicodeError).  It's not really a *unicode* error, it's a syntax error in the domain name construction (ie: it is invalid whether or not unicode is involved, it just isn't caught in 2.x for non-unicode domain names), but oh well.

As for 'label' vs 'domain subpart', I've never heard anyone refer to the parts of a domain as 'labels' even in technical conversations, but as I think about it further I haven't heard them referred to at all.  So I withdraw that suggestion :)

Putting the label in the error message would be a nice enhancement, but definitely low priority.
msg115040 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-08-26 22:54
> Ah, I wondered if it was something like that
> (encoding=>UnicodeError).  It's not really a *unicode* error, it's a
> syntax error in the domain name construction (ie: it is invalid
> whether or not unicode is involved, it just isn't caught in 2.x for
> non-unicode domain names), but oh well.

It's perhaps dubious whether IDNA is an encoding at all. Punycode
certainly is, but IDNA only applies to (DNS) names. However, if
you take the stance that IDNA converts Unicode names into bytes
names, and is therefore an encoding, then this encoding is undefined
for certain Unicode strings. If the encoding is undefined, the exception
is a UnicodeError (unless there are errors in the code
of the codec itself, such as MemoryErrors, but that's not the case
here).
msg186770 - (view) Author: Mike Milkin (mmilkin) * Date: 2013-04-13 17:23
This patch adds better exception messages.  
If any label other then  the last is empty or is too long the request is added to the exception message.
If the last label is over 64, the label is added to the exception message
msg186811 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-04-13 19:51
Thanks, Mike.  I made some review comments (you should have gotten an email, or look at them via the 'review' link on the patch).
msg186863 - (view) Author: Mike Milkin (mmilkin) * Date: 2013-04-13 22:26
Made changes to to the tests and made changes to the error messages.

I think decode() is valid since the input is already ascii encoded.
msg186918 - (view) Author: Mike Milkin (mmilkin) * Date: 2013-04-14 13:21
I did not mean to take the decode out of the previos patch.  Sorry for the spam.
msg187448 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-04-20 18:09
The message in both branches of the if talk about empty labels, which is probably my fault since I got the sense of the if wrong in my suggestion.  One of them should be about the label being too long.  The one that should be the 'empty' message also doesn't read right to my eyes.  As I said I think it should be something like: '"empty label in %r" % result.decode()'.

Also, in looking at the module code, there are several other places where the size check and simple message are used.  In all of these cases the string has already been confirmed to be (or converted to, in the case of the punycoding) ASCII.  So we can abstract this check into a function and call it from all those locations.

Do you want to update the patch accordingly, Mike?  It will need more tests.
msg187504 - (view) Author: Mike Milkin (mmilkin) * Date: 2013-04-21 14:12
Sure ill modify the patch, thanks for the feedback.
msg187600 - (view) Author: Mike Milkin (mmilkin) * Date: 2013-04-23 01:02
There are a few of interesting parts. 
1.) I noticed that the ToASCII class is not tested.  
2.) I had some unreachable branches due to concatenation of constant variable ace_prefix.  
3) I also found it weird that we only check the max of labels[-1], but decided to perserve the original functionality.
msg187884 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-04-27 01:06
As I said in the review, thanks for working more on this.

For the reason for (3), look at the test just before the ones you added.
msg187906 - (view) Author: Mike Milkin (mmilkin) * Date: 2013-04-27 14:10
Dave, let me know what you think of the tests, ill fix the rest of your comments.
msg188454 - (view) Author: Mike Milkin (mmilkin) * Date: 2013-05-05 19:26
Moved the conditional logic out of the method.  There are no tests for ToASCII function and I was not comfortable making changes to it without adding tests.
History
Date User Action Args
2013-05-05 19:26:37mmilkinsetfiles: + Issue9682-5513.patch

messages: + msg188454
2013-04-27 14:10:13mmilkinsetmessages: + msg187906
2013-04-27 01:06:25r.david.murraysetmessages: + msg187884
2013-04-23 01:04:37mmilkinsetfiles: + Issue9682-42213.patch
2013-04-23 01:04:23mmilkinsetfiles: - Issue9682.patch
2013-04-23 01:03:24mmilkinsetfiles: - Issue9682-full.patch
2013-04-23 01:02:44mmilkinsetfiles: + Issue9682.patch

messages: + msg187600
2013-04-21 14:12:51mmilkinsetmessages: + msg187504
2013-04-20 18:09:09r.david.murraysetassignee: loewis ->
type: enhancement -> behavior
messages: + msg187448
versions: + Python 3.3, Python 3.4, - Python 3.2
2013-04-14 13:21:38mmilkinsetfiles: - Issue9682-full.patch
2013-04-14 13:21:01mmilkinsetfiles: + Issue9682-full.patch

messages: + msg186918
2013-04-13 22:26:49mmilkinsetfiles: + Issue9682-full.patch

messages: + msg186863
2013-04-13 19:51:14r.david.murraysetmessages: + msg186811
2013-04-13 17:23:37mmilkinsetfiles: + Issue9682.patch

nosy: + mmilkin
messages: + msg186770

keywords: + patch
2010-08-26 22:54:56loewissetmessages: + msg115040
title: socket.create_connection error message for domain subpart with invalid length is very confusing -> socket.create_connection error message for domain subpart with invalid length is very confusing
2010-08-26 22:31:52r.david.murraysetmessages: + msg115036
2010-08-26 21:30:54loewissetmessages: + msg115029
2010-08-26 17:43:50giampaolo.rodolasetnosy: + giampaolo.rodola
2010-08-26 13:25:11georg.brandlsetassignee: loewis
2010-08-25 18:08:12r.david.murraycreate