New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CVE-2013-2099 ssl.match_hostname() trips over crafted wildcard names #62180
Comments
If the name in the certificate contains many "*" characters, matching the compiled regular expression against the host name can take a very long time. Certificate validation happens before host name checking, so I think this is a minor issue only because it can only be triggered in cooperation with a CA (which seems unlikely). The fix is to limit the number of "*" wildcards to a reasonable maximum (perhaps even 1). |
Does the RFC say anything about this? How much wildcards are necessary to take up a significant amount of CPU time? |
The CVE identifier of CVE-2013-2099 has been assigned: to this issue. |
This is caused by the regex engine's performance behaviour: |
I would like to know what is the expected scenario:
The reason is that the matching cost for a domain name fragment seems to be O(n**k), where n is the fragment length and k is the number of wildcards. Therefore, if the attacker controls both n and k, even limiting k to 2 already allows a quadratic complexity attack. |
RFC 2818 doesn't say anything about the maximum amount of wildcards. I'm going to check OpenSSL's implementation now. |
OpenSSL supports only a single wildcard character. In my tests, I used a host name like aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.example.org, and a dNSName like a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*.example.org. Quadratic behavior wouldn't be too bad because the host name is necessarily rather short (more than 255 characters will not pass through DNS). |
Indeed, two wildcards seem to be ok with a 255-character domain name: $ ./python -m timeit -s "import ssl; cert = {'subject': ((('commonName', '*a*a.com'),),)}" "try: ssl.match_hostname(cert, 'a' * 250 +'z.com')" "except ssl.CertificateError: pass"
1000 loops, best of 3: 797 usec per loop Three wildcards already start producing some load: $ ./python -m timeit -s "import ssl; cert = {'subject': ((('commonName', '*a*a*a.com'),),)}" "try: ssl.match_hostname(cert, 'a' * 250 +'z.com')" "except ssl.CertificateError: pass"
10 loops, best of 3: 66.2 msec per loop Four wildcards are more than enough for a DoS: $ ./python -m timeit -s "import ssl; cert = {'subject': ((('commonName', '*a*a*a*a.com'),),)}" "try: ssl.match_hostname(cert, 'a' * 250 +'z.com')" "except ssl.CertificateError: pass"
10 loops, best of 3: 4.12 sec per loop |
Hmm, but the host name doesn't necessarily come from DNS, does it? |
The host name is looked up to get the IP address to connect to. The lookup will fail if the host name is longer than 255 characters, and the crafted certificate is never retrieved. |
I think a malicious user could abuse SNI to craft a longer host name and trigger the pathological case. |
In GnuTLS, _gnutls_hostname_compare() (lib/gnutls_str.c) uses a trivial recursive approach with a maximum number of 5 wildcards. |
Wildcard matching can easily be done in worst-case linear time, but not with regexps. doctest.py's internal _ellipsis_match() shows one way to do it (doctest can use "..." as a wildcard marker). |
We could use an algorithm that doesn't need regexp for most cases. pseudo code: value = value.lower()
hostname = hostname.lower()
if '*' not in value:
return value == hostname
vparts = valuesplit(".")
hparts = hostname.split(".")
if len(vparts) != len(hparts):
# * doesn't match a dot
return False
for v, h in zip(vparts, hparts):
if v == "*":
# match any host part
continue
asterisk = v.count("*")
if asterisk == 0:
if v != h:
return False
elif asterisk == 1:
# match with simple re
else:
# don't support more than one * in a FQDN part
raise TooManyAsterisk |
Thanks, this may be a nice enhancement for 3.4. For 3.2 and 3.3, I'd prefer to go the safe way of simply limiting the |
Here is a patch allowing at most 2 wildcards per domain fragment. Georg, do you think this should go into 3.2? |
It's certainly a security fix, but probably not one that warrants an immediate release. If you commit it to the 3.2 branch, that's fine, it will get picked up by coming releases. |
Indeed, doing this _without a regexp_ is preferred. :) |
Are multiple wildcards per fragment even specified? I'm unable to find information if the wildcard is supposed to be a greedy or a non-greedy match. By the way Chromium does more fancy checks. For example it requires * to match at least on character and it does special handling of IDN. X509Certificate::VerifyHostname() around line 500. http://src.chromium.org/viewvc/chrome/trunk/src/net/cert/x509_certificate.cc |
I don't know the standard, but it sounds strange to have more than one wildcard per part of an URL. "*.*.*.google.com" looks valid to me, whereas "*a*a*a*.google.com" looks very suspicious. Said differently, I expect: assert max(part.count("*") for part in url.split(".")) <= 1 "*" pattern is replace with '[^.]+' regex, so I may not cause the exponential complexity issue. (I didn't check.) |
A possessive quantifier might also help, that is [^.]+?. |
SSL certificate hostname matching is defined in RFC 2818: It's not very verbose on how exactly matching should be done: """ Given that it's underspecified, I doubt that anyone using wildcards in certificates for valid purposes would risk using anything but very simply prefix/suffix matching - most certainly not any matching that would require backtracking to succeed. There are several variants out there of how the matching is done. |
Non-greedy matching actually makes things worse :-) $ ./python -m timeit -s "import re; pat = re.compile('\A*a*a*a\Z'.replace('*', '[^.]+'), re.IGNORECASE)" "pat.match('a' * 100 +'z')"
100 loops, best of 3: 3.31 msec per loop
$ ./python -m timeit -s "import re; pat = re.compile('\A*a*a*a\Z'.replace('*', '[^.]+?'), re.IGNORECASE)" "pat.match('a' * 100 +'z')"
100 loops, best of 3: 6.91 msec per loop |
Florian, I'm actually surprised by your assertion that OpenSSL supports a single wildcard character. Last I looked, I couldn't find any hostname matching function in OpenSSL (which is why I had to write our own). Could you point me to the relevant piece of code? |
Antoine, support for OpenSSL host name matching is quite new: <http://www.openssl.org/docs/crypto/X509_check_host.html\> |
libcurl supports a single wildcard for the whole domain name pattern (not even one per fragment), as per lib/hostcheck.c (this is when linked against OpenSSL; when linked against GnuTLS, curl will use the GnuTLS-provided matching function) Based on all the evidence, I think allowing one wildcard per fragment is sufficient. |
Ah, thanks. I was looking in 1.0.1e. |
Here's another long discussions about SSL hostname matching that may provide some useful insights: Note how RFC 2595 doesn't even allow sub-string matching. It only allows '*' to be used as component. |
Attached patch forbidding more than one wildcard per fragment. |
I still think that sub string wildcard should not match the IDN "xn--" prefix. With current code the rules "x*.example.de" gives a positive match for "götter.example.de". >>> u"götter.example.de".encode("idna")
'xn--gtter-jua.example.de' |
You should open a separate issue for this (possibly with a patch). |
The IDNA RFC contains additional rules for wildcard matching ... very well hidden indead! |
New changeset b9b521efeba3 by Antoine Pitrou in branch '3.2': New changeset c627638753e2 by Antoine Pitrou in branch '3.3': New changeset fafd33db6ff6 by Antoine Pitrou in branch 'default': |
Ok, this should be fixed now. Thanks a lot for reporting! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: