msg114923 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2010-08-25 18:08 |
>>> socket.create_connection(('a..com', 25))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/rdmurray/python/py3k/Lib/socket.py", line 300, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
File "/home/rdmurray/python/py3k/Lib/encodings/idna.py", line 167, in encode
result.extend(ToASCII(label))
File "/home/rdmurray/python/py3k/Lib/encodings/idna.py", line 73, in ToASCII
raise UnicodeError("label empty or too long")
UnicodeError: label empty or too long
I have two problems with this: why is it a UnicodeError? (That confused me into going down a blind alley before finding my typo in the original context where I encountered this). The other problem is the term 'label'. I realize this is technically correct and precise, but I doubt most users will recognize it (I didn't remember what it meant until I looked it up). Could we perhaps change it to 'domain name subpart'?
Note that in 2.x this gives 'name or service not known' unless the input string is unicode, in which case it gives the error above. So in 2.x the UnicodeError was at least not totally dissociated from the cause of the error, but still strikes me as sub-optimal. I would expect a ValueError.
|
msg115029 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2010-08-26 21:30 |
> why is it a UnicodeError?
Because IDNA is an encoding, and codecs are supposed to raise UnicodeErrors. The string you are trying to encode is not supported in the encoding in question.
It would be possible to catch the UnicodeError, and re-raise it as a socket error in the socket module.
> I would expect a ValueError.
Notice that a UnicodeError *is* a ValueError.
> Could we perhaps change it to 'domain name subpart'?
No. As you point out, "label" is the established, wide-spread, technical, RFC-supported term for the thing in question. "domain name subpart" is an ad-hoc wording that nobody else uses. I fail to see the advantage.
It would be possible to include the actual label into the exception (perhaps truncated if it's too long).
|
msg115036 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2010-08-26 22:31 |
Ah, I wondered if it was something like that (encoding=>UnicodeError). It's not really a *unicode* error, it's a syntax error in the domain name construction (ie: it is invalid whether or not unicode is involved, it just isn't caught in 2.x for non-unicode domain names), but oh well.
As for 'label' vs 'domain subpart', I've never heard anyone refer to the parts of a domain as 'labels' even in technical conversations, but as I think about it further I haven't heard them referred to at all. So I withdraw that suggestion :)
Putting the label in the error message would be a nice enhancement, but definitely low priority.
|
msg115040 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2010-08-26 22:54 |
> Ah, I wondered if it was something like that
> (encoding=>UnicodeError). It's not really a *unicode* error, it's a
> syntax error in the domain name construction (ie: it is invalid
> whether or not unicode is involved, it just isn't caught in 2.x for
> non-unicode domain names), but oh well.
It's perhaps dubious whether IDNA is an encoding at all. Punycode
certainly is, but IDNA only applies to (DNS) names. However, if
you take the stance that IDNA converts Unicode names into bytes
names, and is therefore an encoding, then this encoding is undefined
for certain Unicode strings. If the encoding is undefined, the exception
is a UnicodeError (unless there are errors in the code
of the codec itself, such as MemoryErrors, but that's not the case
here).
|
msg186770 - (view) |
Author: Mike Milkin (mmilkin) * |
Date: 2013-04-13 17:23 |
This patch adds better exception messages.
If any label other then the last is empty or is too long the request is added to the exception message.
If the last label is over 64, the label is added to the exception message
|
msg186811 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2013-04-13 19:51 |
Thanks, Mike. I made some review comments (you should have gotten an email, or look at them via the 'review' link on the patch).
|
msg186863 - (view) |
Author: Mike Milkin (mmilkin) * |
Date: 2013-04-13 22:26 |
Made changes to to the tests and made changes to the error messages.
I think decode() is valid since the input is already ascii encoded.
|
msg186918 - (view) |
Author: Mike Milkin (mmilkin) * |
Date: 2013-04-14 13:21 |
I did not mean to take the decode out of the previos patch. Sorry for the spam.
|
msg187448 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2013-04-20 18:09 |
The message in both branches of the if talk about empty labels, which is probably my fault since I got the sense of the if wrong in my suggestion. One of them should be about the label being too long. The one that should be the 'empty' message also doesn't read right to my eyes. As I said I think it should be something like: '"empty label in %r" % result.decode()'.
Also, in looking at the module code, there are several other places where the size check and simple message are used. In all of these cases the string has already been confirmed to be (or converted to, in the case of the punycoding) ASCII. So we can abstract this check into a function and call it from all those locations.
Do you want to update the patch accordingly, Mike? It will need more tests.
|
msg187504 - (view) |
Author: Mike Milkin (mmilkin) * |
Date: 2013-04-21 14:12 |
Sure ill modify the patch, thanks for the feedback.
|
msg187600 - (view) |
Author: Mike Milkin (mmilkin) * |
Date: 2013-04-23 01:02 |
There are a few of interesting parts.
1.) I noticed that the ToASCII class is not tested.
2.) I had some unreachable branches due to concatenation of constant variable ace_prefix.
3) I also found it weird that we only check the max of labels[-1], but decided to perserve the original functionality.
|
msg187884 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2013-04-27 01:06 |
As I said in the review, thanks for working more on this.
For the reason for (3), look at the test just before the ones you added.
|
msg187906 - (view) |
Author: Mike Milkin (mmilkin) * |
Date: 2013-04-27 14:10 |
Dave, let me know what you think of the tests, ill fix the rest of your comments.
|
msg188454 - (view) |
Author: Mike Milkin (mmilkin) * |
Date: 2013-05-05 19:26 |
Moved the conditional logic out of the method. There are no tests for ToASCII function and I was not comfortable making changes to it without adding tests.
|
msg407337 - (view) |
Author: Irit Katriel (iritkatriel) * |
Date: 2021-11-29 23:46 |
Reproduced on 3.11:
Traceback (most recent call last):
File "/Users/iritkatriel/src/cpython-1/Lib/encodings/idna.py", line 165, in encode
raise UnicodeError("label empty or too long")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeError: label empty or too long
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/iritkatriel/src/cpython-1/Lib/socket.py", line 824, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/iritkatriel/src/cpython-1/Lib/socket.py", line 955, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeError: encoding with 'idna' codec failed (UnicodeError: label empty or too long)
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:05 | admin | set | github: 53891 |
2021-11-29 23:46:48 | iritkatriel | set | nosy:
+ iritkatriel
messages:
+ msg407337 versions:
+ Python 3.9, Python 3.10, Python 3.11, - Python 3.3, Python 3.4 |
2013-05-05 19:26:37 | mmilkin | set | files:
+ Issue9682-5513.patch
messages:
+ msg188454 |
2013-04-27 14:10:13 | mmilkin | set | messages:
+ msg187906 |
2013-04-27 01:06:25 | r.david.murray | set | messages:
+ msg187884 |
2013-04-23 01:04:37 | mmilkin | set | files:
+ Issue9682-42213.patch |
2013-04-23 01:04:23 | mmilkin | set | files:
- Issue9682.patch |
2013-04-23 01:03:24 | mmilkin | set | files:
- Issue9682-full.patch |
2013-04-23 01:02:44 | mmilkin | set | files:
+ Issue9682.patch
messages:
+ msg187600 |
2013-04-21 14:12:51 | mmilkin | set | messages:
+ msg187504 |
2013-04-20 18:09:09 | r.david.murray | set | assignee: loewis -> type: enhancement -> behavior messages:
+ msg187448 versions:
+ Python 3.3, Python 3.4, - Python 3.2 |
2013-04-14 13:21:38 | mmilkin | set | files:
- Issue9682-full.patch |
2013-04-14 13:21:01 | mmilkin | set | files:
+ Issue9682-full.patch
messages:
+ msg186918 |
2013-04-13 22:26:49 | mmilkin | set | files:
+ Issue9682-full.patch
messages:
+ msg186863 |
2013-04-13 19:51:14 | r.david.murray | set | messages:
+ msg186811 |
2013-04-13 17:23:37 | mmilkin | set | files:
+ Issue9682.patch
nosy:
+ mmilkin messages:
+ msg186770
keywords:
+ patch |
2010-08-26 22:54:56 | loewis | set | messages:
+ msg115040 title: socket.create_connection error message for domain subpart with invalid length is very confusing -> socket.create_connection error message for domain subpart with invalid length is very confusing |
2010-08-26 22:31:52 | r.david.murray | set | messages:
+ msg115036 |
2010-08-26 21:30:54 | loewis | set | messages:
+ msg115029 |
2010-08-26 17:43:50 | giampaolo.rodola | set | nosy:
+ giampaolo.rodola
|
2010-08-26 13:25:11 | georg.brandl | set | assignee: loewis |
2010-08-25 18:08:12 | r.david.murray | create | |