Issue 9682: socket.create_connection error message for domain subpart with invalid length is very confusing

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/53891

classification

Title:	socket.create_connection error message for domain subpart with invalid length is very confusing
Type:	behavior	Stage:
Components:	Library (Lib)	Versions:	Python 3.11, Python 3.10, Python 3.9

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	giampaolo.rodola, iritkatriel, loewis, mmilkin, r.david.murray
Priority:	low	Keywords:	easy, patch

Created on 2010-08-25 18:08 by r.david.murray, last changed 2022-04-11 14:57 by admin.

Files
File name	Uploaded	Description	Edit
Issue9682.patch	mmilkin, 2013-04-13 17:23		review
Issue9682-42213.patch	mmilkin, 2013-04-23 01:04		review
Issue9682-5513.patch	mmilkin, 2013-05-05 19:26		review

Messages (15)
msg114923 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2010-08-25 18:08
>>> socket.create_connection(('a..com', 25)) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/rdmurray/python/py3k/Lib/socket.py", line 300, in create_connection for res in getaddrinfo(host, port, 0, SOCK_STREAM): File "/home/rdmurray/python/py3k/Lib/encodings/idna.py", line 167, in encode result.extend(ToASCII(label)) File "/home/rdmurray/python/py3k/Lib/encodings/idna.py", line 73, in ToASCII raise UnicodeError("label empty or too long") UnicodeError: label empty or too long I have two problems with this: why is it a UnicodeError? (That confused me into going down a blind alley before finding my typo in the original context where I encountered this). The other problem is the term 'label'. I realize this is technically correct and precise, but I doubt most users will recognize it (I didn't remember what it meant until I looked it up). Could we perhaps change it to 'domain name subpart'? Note that in 2.x this gives 'name or service not known' unless the input string is unicode, in which case it gives the error above. So in 2.x the UnicodeError was at least not totally dissociated from the cause of the error, but still strikes me as sub-optimal. I would expect a ValueError.
msg115029 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2010-08-26 21:30
> why is it a UnicodeError? Because IDNA is an encoding, and codecs are supposed to raise UnicodeErrors. The string you are trying to encode is not supported in the encoding in question. It would be possible to catch the UnicodeError, and re-raise it as a socket error in the socket module. > I would expect a ValueError. Notice that a UnicodeError is a ValueError. > Could we perhaps change it to 'domain name subpart'? No. As you point out, "label" is the established, wide-spread, technical, RFC-supported term for the thing in question. "domain name subpart" is an ad-hoc wording that nobody else uses. I fail to see the advantage. It would be possible to include the actual label into the exception (perhaps truncated if it's too long).
msg115036 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2010-08-26 22:31
Ah, I wondered if it was something like that (encoding=>UnicodeError). It's not really a unicode error, it's a syntax error in the domain name construction (ie: it is invalid whether or not unicode is involved, it just isn't caught in 2.x for non-unicode domain names), but oh well. As for 'label' vs 'domain subpart', I've never heard anyone refer to the parts of a domain as 'labels' even in technical conversations, but as I think about it further I haven't heard them referred to at all. So I withdraw that suggestion :) Putting the label in the error message would be a nice enhancement, but definitely low priority.
msg115040 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2010-08-26 22:54
> Ah, I wondered if it was something like that > (encoding=>UnicodeError). It's not really a unicode error, it's a > syntax error in the domain name construction (ie: it is invalid > whether or not unicode is involved, it just isn't caught in 2.x for > non-unicode domain names), but oh well. It's perhaps dubious whether IDNA is an encoding at all. Punycode certainly is, but IDNA only applies to (DNS) names. However, if you take the stance that IDNA converts Unicode names into bytes names, and is therefore an encoding, then this encoding is undefined for certain Unicode strings. If the encoding is undefined, the exception is a UnicodeError (unless there are errors in the code of the codec itself, such as MemoryErrors, but that's not the case here).
msg186770 - (view)	Author: Mike Milkin (mmilkin) *	Date: 2013-04-13 17:23
This patch adds better exception messages. If any label other then the last is empty or is too long the request is added to the exception message. If the last label is over 64, the label is added to the exception message
msg186811 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2013-04-13 19:51
Thanks, Mike. I made some review comments (you should have gotten an email, or look at them via the 'review' link on the patch).
msg186863 - (view)	Author: Mike Milkin (mmilkin) *	Date: 2013-04-13 22:26
Made changes to to the tests and made changes to the error messages. I think decode() is valid since the input is already ascii encoded.
msg186918 - (view)	Author: Mike Milkin (mmilkin) *	Date: 2013-04-14 13:21
I did not mean to take the decode out of the previos patch. Sorry for the spam.
msg187448 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2013-04-20 18:09
The message in both branches of the if talk about empty labels, which is probably my fault since I got the sense of the if wrong in my suggestion. One of them should be about the label being too long. The one that should be the 'empty' message also doesn't read right to my eyes. As I said I think it should be something like: '"empty label in %r" % result.decode()'. Also, in looking at the module code, there are several other places where the size check and simple message are used. In all of these cases the string has already been confirmed to be (or converted to, in the case of the punycoding) ASCII. So we can abstract this check into a function and call it from all those locations. Do you want to update the patch accordingly, Mike? It will need more tests.
msg187504 - (view)	Author: Mike Milkin (mmilkin) *	Date: 2013-04-21 14:12
Sure ill modify the patch, thanks for the feedback.
msg187600 - (view)	Author: Mike Milkin (mmilkin) *	Date: 2013-04-23 01:02
There are a few of interesting parts. 1.) I noticed that the ToASCII class is not tested. 2.) I had some unreachable branches due to concatenation of constant variable ace_prefix. 3) I also found it weird that we only check the max of labels[-1], but decided to perserve the original functionality.
msg187884 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2013-04-27 01:06
As I said in the review, thanks for working more on this. For the reason for (3), look at the test just before the ones you added.
msg187906 - (view)	Author: Mike Milkin (mmilkin) *	Date: 2013-04-27 14:10
Dave, let me know what you think of the tests, ill fix the rest of your comments.
msg188454 - (view)	Author: Mike Milkin (mmilkin) *	Date: 2013-05-05 19:26
Moved the conditional logic out of the method. There are no tests for ToASCII function and I was not comfortable making changes to it without adding tests.
msg407337 - (view)	Author: Irit Katriel (iritkatriel) *	Date: 2021-11-29 23:46
Reproduced on 3.11: Traceback (most recent call last): File "/Users/iritkatriel/src/cpython-1/Lib/encodings/idna.py", line 165, in encode raise UnicodeError("label empty or too long") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeError: label empty or too long The above exception was the direct cause of the following exception: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/iritkatriel/src/cpython-1/Lib/socket.py", line 824, in create_connection for res in getaddrinfo(host, port, 0, SOCK_STREAM): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/iritkatriel/src/cpython-1/Lib/socket.py", line 955, in getaddrinfo for res in _socket.getaddrinfo(host, port, family, type, proto, flags): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeError: encoding with 'idna' codec failed (UnicodeError: label empty or too long)

History
Date	User	Action	Args
2022-04-11 14:57:05	admin	set	github: 53891
2021-11-29 23:46:48	iritkatriel	set	nosy: + iritkatriel messages: + msg407337 versions: + Python 3.9, Python 3.10, Python 3.11, - Python 3.3, Python 3.4
2013-05-05 19:26:37	mmilkin	set	files: + Issue9682-5513.patch messages: + msg188454
2013-04-27 14:10:13	mmilkin	set	messages: + msg187906
2013-04-27 01:06:25	r.david.murray	set	messages: + msg187884
2013-04-23 01:04:37	mmilkin	set	files: + Issue9682-42213.patch
2013-04-23 01:04:23	mmilkin	set	files: - Issue9682.patch
2013-04-23 01:03:24	mmilkin	set	files: - Issue9682-full.patch
2013-04-23 01:02:44	mmilkin	set	files: + Issue9682.patch messages: + msg187600
2013-04-21 14:12:51	mmilkin	set	messages: + msg187504
2013-04-20 18:09:09	r.david.murray	set	assignee: loewis -> type: enhancement -> behavior messages: + msg187448 versions: + Python 3.3, Python 3.4, - Python 3.2
2013-04-14 13:21:38	mmilkin	set	files: - Issue9682-full.patch
2013-04-14 13:21:01	mmilkin	set	files: + Issue9682-full.patch messages: + msg186918
2013-04-13 22:26:49	mmilkin	set	files: + Issue9682-full.patch messages: + msg186863
2013-04-13 19:51:14	r.david.murray	set	messages: + msg186811
2013-04-13 17:23:37	mmilkin	set	files: + Issue9682.patch nosy: + mmilkin messages: + msg186770 keywords: + patch
2010-08-26 22:54:56	loewis	set	messages: + msg115040 title: socket.create_connection error message for domain subpart with invalid length is very confusing -> socket.create_connection error message for domain subpart with invalid length is very confusing
2010-08-26 22:31:52	r.david.murray	set	messages: + msg115036
2010-08-26 21:30:54	loewis	set	messages: + msg115029
2010-08-26 17:43:50	giampaolo.rodola	set	nosy: + giampaolo.rodola
2010-08-26 13:25:11	georg.brandl	set	assignee: loewis
2010-08-25 18:08:12	r.david.murray	create