Issue32958
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2018-02-26 19:52 by ablack, last changed 2022-04-11 14:58 by admin.
Messages (10) | |||
---|---|---|---|
msg312947 - (view) | Author: Aaron Black (ablack) | Date: 2018-02-26 19:52 | |
While working on a custom conda channel with authentication, I ran into the following UnicodeError: Traceback (most recent call last): File "/Users/ablack/miniconda3/lib/python3.6/site-packages/conda/core/repodata.py", line 402, in fetch_repodata_remote_request timeout=timeout) File "/Users/ablack/miniconda3/lib/python3.6/site-packages/requests/sessions.py", line 521, in get return self.request('GET', url, **kwargs) File "/Users/ablack/miniconda3/lib/python3.6/site-packages/requests/sessions.py", line 499, in request prep.url, proxies, stream, verify, cert File "/Users/ablack/miniconda3/lib/python3.6/site-packages/requests/sessions.py", line 672, in merge_environment_settings env_proxies = get_environ_proxies(url, no_proxy=no_proxy) File "/Users/ablack/miniconda3/lib/python3.6/site-packages/requests/utils.py", line 692, in get_environ_proxies if should_bypass_proxies(url, no_proxy=no_proxy): File "/Users/ablack/miniconda3/lib/python3.6/site-packages/requests/utils.py", line 676, in should_bypass_proxies bypass = proxy_bypass(netloc) File "/Users/ablack/miniconda3/lib/python3.6/urllib/request.py", line 2612, in proxy_bypass return proxy_bypass_macosx_sysconf(host) File "/Users/ablack/miniconda3/lib/python3.6/urllib/request.py", line 2589, in proxy_bypass_macosx_sysconf return _proxy_bypass_macosx_sysconf(host, proxy_settings) File "/Users/ablack/miniconda3/lib/python3.6/urllib/request.py", line 2562, in _proxy_bypass_macosx_sysconf hostIP = socket.gethostbyname(hostonly) UnicodeError: encoding with 'idna' codec failed (UnicodeError: label empty or too long) The error can be consistently reproduced when the first substring of the url hostname is greater than 64 characters long, as in "0123456789012345678901234567890123456789012345678901234567890123.example.com". This wouldn't be a problem, except that it doesn't seem to separate out credentials from the first substring of the hostname so the entire "[user]:[secret]@XXX" section must be less than 65 characters long. This is problematic for services that use longer API keys and expect their submission over basic auth. |
|||
msg313163 - (view) | Author: Ned Deily (ned.deily) * | Date: 2018-03-02 21:32 | |
Thanks for the report. The behavior you see can be further isolated to socket.gethostbyname: >>> import socket >>> h = "0123456789012345678901234567890123456789012345678901234567890123.example.com" >>> socket.gethostbyname(h) Traceback (most recent call last): File "/usr/lib/python3.6/encodings/idna.py", line 165, in encode raise UnicodeError("label empty or too long") UnicodeError: label empty or too long The above exception was the direct cause of the following exception: Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeError: encoding with 'idna' codec failed (UnicodeError: label empty or too long) Other socket module calls accepting host names fail similarly, such as getaddrinfo. |
|||
msg313164 - (view) | Author: Aaron Black (ablack) | Date: 2018-03-02 21:46 | |
Just to be clear, I don't know if the socket needs to support 64 character long host name sections, so here's an example url that is at the root of my problem that I'm pretty sure it should support: >>> import socket >>> h = "username:long_api_key0123456789012345678901234567890123456789@www.example.com" >>> socket.gethostbyname(h) Traceback (most recent call last): File "/Users/ablack/miniconda3/lib/python3.6/encodings/idna.py", line 165, in encode raise UnicodeError("label empty or too long") UnicodeError: label empty or too long The above exception was the direct cause of the following exception: Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeError: encoding with 'idna' codec failed (UnicodeError: label empty or too long) |
|||
msg313323 - (view) | Author: Matt Eaton (agnosticdev) * | Date: 2018-03-06 12:35 | |
Using Ubuntu 16.04 with the 3.6.0 tag I was also able to reproduce the same error reported: import socket h = "0123456789012345678901234567890123456789012345678901234567890123.example.com" socket.gethostbyname(h) Traceback (most recent call last): File "/home/agnosticdev/Documents/code/python/python-dev/cpython-3_6_0/Lib/encodings/idna.py", line 165, in encode raise UnicodeError("label empty or too long") UnicodeError: label empty or too long The above exception was the direct cause of the following exception: Traceback (most recent call last): File "host_test.py", line 8, in <module> socket.gethostbyname(h) UnicodeError: encoding with 'idna' codec failed (UnicodeError: label empty or too long) It looks like the hostname being 64 characters long is the issue in that it cannot be encoded. Thus falling into the UnicodeError being raised in idna.py: # ASCII name: fast path labels = result.split(b'.') for label in labels[:-1]: if not (0 < len(label) < 64): raise UnicodeError("label empty or too long") if len(labels[-1]) >= 64: raise UnicodeError("label too long") return result, len(input) I did some work on this to try and resolve this, but ultimately it was not worth committing so I wanted to report my findings. |
|||
msg372865 - (view) | Author: Steve Bowman (sdbowman) | Date: 2020-07-02 16:54 | |
When will this issue be fixed? Thanks! |
|||
msg373006 - (view) | Author: Joseph Hackman (joseph.hackman) * | Date: 2020-07-05 00:22 | |
According to the DNS standard, hostnames with more than 63 characters per label (the sections between .) are not allowed [https://tools.ietf.org/html/rfc1035#section-2.3.1]. That said, enforcing that at the codec level might be the wrong choice. I threw together a quick patch moving the limits up to 250, and nothing blew up. It's unclear what the general usefulness of such a change would be, since DNS servers probably couldn't handle those requests anyway. As for the original issue, if anybody is still doing something like that, could they provide a full example URL? I was unable to reproduce on HTTP (failed in a different place), or FTP. |
|||
msg374207 - (view) | Author: Aaron Black (ablack) | Date: 2020-07-24 19:35 | |
joseph.hackman I don't think that the 63 character limit on a label is the problem specifically, merely it's application. The crux of my issue was that credentials passed with the url in a basic-authy fashion (as some services require) count against the label length. For example, this would trigger the error: h = "https://ablack:very_long_api_key_0123456789012345678901234567890123456789012345678901234567890123@www.example.com" Since the first label would be treated as: "ablack:very_long_api_key_0123456789012345678901234567890123456789012345678901234567890123@www" My specific issue goes away if any text up to / including an "@" in the first label section is not included in the label validation. I don't know off hand if that information is supposed to be included per the label in the DNS spec though. |
|||
msg391990 - (view) | Author: Alex Vandiver (alexmv) | Date: 2021-04-26 22:04 | |
It seems reasonable to fail on hostnames that are too long -- but it feels like the weirdness is that it is categorized as a UnicodeError, and not as, say, a ValueError. Would a re-categorization as ValueError seem like a reasonable adjustment here? |
|||
msg393323 - (view) | Author: Ben Darnell (Ben.Darnell) * | Date: 2021-05-09 15:57 | |
[I'm coming here from https://github.com/tornadoweb/tornado/pull/3010) UnicodeError is a subclass of ValueError, so I don't see what value that change would provide. The thing that's surprising to me is that it's not a `socket.herror` (or `gaierror` for socket.getaddrinfo). I guess the docs don't formally say that `herror`/`gaierror` is the *only* possible error from these functions, but `gaierror` was the only error I was catching so the unexpected UnicodeError escaped the layer that was intended to handle it. I do think that in the special case of `getaddrinfo` with the `AI_NUMERICHOST` flag it should be handled differently: in that mode there is no network access necessary and it's reasonable to assume that the only possible error is a `gaierror` with `EAI_NONAME`. I'd like to at least see better documentation about what errors are possible from this family of functions. |
|||
msg411539 - (view) | Author: Gregory P. Smith (gregory.p.smith) * | Date: 2022-01-25 00:36 | |
ablack: the basic auth username:password@ part of the string is not part of a hostname. What code are you seeing that is trying to send that to a name resolver rather than stripping the obviously private info up through the @ sign? |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:58:58 | admin | set | github: 77139 |
2022-01-25 00:36:47 | gregory.p.smith | set | nosy:
+ gregory.p.smith messages: + msg411539 |
2021-05-09 15:57:56 | Ben.Darnell | set | nosy:
+ Ben.Darnell messages: + msg393323 |
2021-04-26 22:04:10 | alexmv | set | nosy:
+ alexmv messages: + msg391990 |
2020-11-15 04:15:55 | midopa | set | nosy:
+ midopa |
2020-07-24 19:35:40 | ablack | set | messages: + msg374207 |
2020-07-05 00:22:53 | joseph.hackman | set | nosy:
+ joseph.hackman messages: + msg373006 |
2020-07-02 16:54:51 | sdbowman | set | nosy:
+ sdbowman messages: + msg372865 |
2018-03-18 20:35:56 | ned.deily | set | nosy:
- ned.deily |
2018-03-17 18:39:35 | r.david.murray | set | nosy:
+ r.david.murray |
2018-03-06 12:35:41 | agnosticdev | set | nosy:
+ agnosticdev messages: + msg313323 |
2018-03-02 21:46:33 | ablack | set | messages: + msg313164 |
2018-03-02 21:32:44 | ned.deily | set | versions:
+ Python 3.7, Python 3.8 type: crash -> nosy: + ned.deily title: Urllib proxy_bypass crashes for urls containing long basic auth strings -> socket module calls with long host names can fail with idna codec error messages: + msg313163 stage: needs patch |
2018-02-26 19:52:35 | ablack | create |