CVE-2019-18348: CRLF injection via the host part of the url passed to urlopen()
Author: Riccardo Schirone (rschiron) Date: 2019-10-24 07:51
Copy-pasted from

The commit b7378d77289c911ca6a0c0afaf513879002df7d5 is incomplete: it doesn't seem to check for control characters in the "host" part of the URL, only in the "path" part of the URL. Example:
    from urllib import request as urllib_request
except ImportError:
    import urllib2 as urllib_request
import socket
def bug(*args):
    raise Exception(args)
# urlopen() must not call create_connection()
socket.create_connection = bug
urllib_request.urlopen('\r\n\x20hihi\r\n :11211')

The URL comes from the first message of this issue:

Development branches 2.7 and master produce a similar output:
Traceback (most recent call last):
Exception: (('\r\n hihi\r\n ', 11211), ..., None)

So urllib2/urllib.request actually does a real network connection (DNS query), whereas it should reject control characters in the "host" part of the URL.


A second problem comes into the game. Some C libraries like glibc strip the end of the hostname (strip at the first newline character) and so HTTP Header injection is still possible is this case:


According to the RFC 3986, the "host" grammar doesn't allow any control character, it looks like:

   host          = IP-literal / IPv4address / reg-name

   ALPHA (letters)
   DIGIT (decimal digits)
   unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
      pct-encoded = "%" HEXDIG HEXDIG
      sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="
   reg-name      = *( unreserved / pct-encoded / sub-delims )

   IP-literal    = "[" ( IPv6address / IPvFuture  ) "]"
   IPvFuture     = "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )
   IPv6address   =                            6( h16 ":" ) ls32
                 /                       "::" 5( h16 ":" ) ls32
                 / [               h16 ] "::" 4( h16 ":" ) ls32
                 / [ *1( h16 ":" ) h16 ] "::" 3( h16 ":" ) ls32
                 / [ *2( h16 ":" ) h16 ] "::" 2( h16 ":" ) ls32
                 / [ *3( h16 ":" ) h16 ] "::"    h16 ":"   ls32
                 / [ *4( h16 ":" ) h16 ] "::"              ls32
                 / [ *5( h16 ":" ) h16 ] "::"              h16
                 / [ *6( h16 ":" ) h16 ] "::"
   h16           = 1*4HEXDIG
   ls32          = ( h16 ":" h16 ) / IPv4address
   IPv4address   = dec-octet "." dec-octet "." dec-octet "." dec-octet

CVE-2019-18348 was assigned to this flaw, which is similar to CVE-2019-9947 and CVE-2019-9740 but it is about the *host* part of a url.
Author: Justin Capella (b1tninja) Date: 2019-11-20 13:52
Can't see the specifics of that "restricted" redhat bug, but this was interesting bug and I wanted to ask if perhaps the domain in such cases should be IDN / punycoded :// for example is ://💩.la
Author: Riccardo Schirone (rschiron) Date: 2019-11-25 15:38
The glibc issue mentioned in the first comment is CVE-2016-10739 .
Author: Matej Cepl (mcepl) Date: 2020-02-20 21:41
Just to say this is reproducible only on rather old enterprise Linux distributions, where CVE-2016-10739 bug in glibc has not been fixed. I believe it means RHEL-6, SUSE SLE-10, 11, 12 (not sure whether it applies to some old Debian as well).
Author: Gregory P. Smith (gregory.p.smith) Date: 2020-03-14 18:56
New changeset 9165addc22d05e776a54319a8531ebd0b2fe01ef by Ashwin Ramaswami in branch 'master':
bpo-38576: Disallow control characters in hostnames in http.client (GH-18995)
Author: Gregory P. Smith (gregory.p.smith) Date: 2020-03-14 19:02
Thanks for the PR Ashwin!
Author: miss-islington (miss-islington) Date: 2020-03-14 19:13
New changeset 34f85af3229f86c004a954c3f261ceea1f5e9f95 by Miss Islington (bot) in branch '3.7':
bpo-38576: Disallow control characters in hostnames in http.client (GH-18995)
Author: miss-islington (miss-islington) Date: 2020-03-14 19:13
New changeset ff69c9d12c1b06af58e5eae5db4630cedd94740e by Miss Islington (bot) in branch '3.8':
bpo-38576: Disallow control characters in hostnames in http.client (GH-18995)
Author: Ned Deily (ned.deily) Date: 2020-03-14 22:35
New changeset 83fc70159b24f5b11a5ef87c9b05c2cf4c7faeba by Miss Islington (bot) in branch '3.6':
bpo-38576: Disallow control characters in hostnames in http.client (GH-18995) (GH-19002)
Author: Gregory P. Smith (gregory.p.smith) Date: 2020-03-15 00:59
If anyone cares about 2.7, the *final* release is coming up in a few weeks.  They'll need to figure out what it looks like there and get a 2.7 PR reviewed by the release manager.
Author: Gregory P. Smith (gregory.p.smith) Date: 2020-03-18 04:09
marking as a 2.7 release blocker just to get benjamin's RM attention before the final 2.7.
Author: Benjamin Peterson (benjamin.peterson) Date: 2020-03-19 01:35
New changeset e176e0c105786e9f476758eb5438c57223b65e7f by Matěj Cepl in branch '2.7':
[2.7] closes bpo-38576: Disallow control characters in hostnames in http.client. (GH-19052)
Author: Larry Hastings (larry) Date: 2020-06-20 06:44
New changeset 09d8172837b6985c4ad90ee025f6b5a554a9f0ac by Tapas Kundu in branch '3.5':
[3.5] closes bpo-38576: Disallow control characters in hostnames in http.client. (#19300)
