This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: hmac.secure_compare() leaks information about length of strings
Type: security Stage: resolved
Components: Library (Lib) Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Jon.Oberheide, alex, christian.heimes, fijall, georg.brandl, gregory.p.smith, hynek, loewis, ncoghlan, petri.lehtinen, pitrou, python-dev, serhiy.storchaka
Priority: normal Keywords: needs review, patch

Created on 2012-06-13 23:00 by christian.heimes, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
secure_compare_length.patch christian.heimes, 2012-06-13 23:00 review
timingsafe.h christian.heimes, 2012-06-21 16:24
timingsafe_cmp.patch christian.heimes, 2012-06-21 22:40 review
timingsafe_cmp-2.patch christian.heimes, 2012-06-23 14:20 review
compare_digest_c.patch christian.heimes, 2012-06-23 17:57 review
compare_digest_c-2.patch christian.heimes, 2012-06-24 00:07 review
Messages (96)
msg162739 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012-06-13 23:00
The secure_compare() function immediately returns False when both strings don't have equal length. With the patch the run time of secure_compare() always depends on the length of the right side. It no longer gives away information about the length of the left side.

The patch should be applied in combination with the patch in issue #14955.
msg162758 - (view) Author: Maciej Fijalkowski (fijall) * (Python committer) Date: 2012-06-14 08:33
secure_compare leaks the password always. Note that it takes different time to create a result of ord() depending whether it's <=100 or > 100 due to caching of small numbers. Such functions should be written in C.
msg162759 - (view) Author: Hynek Schlawack (hynek) * (Python committer) Date: 2012-06-14 09:10
We should. Adding secure functions that aren't really secure is something we should rather avoid. :)

Christian, are you willing to do that?
msg162760 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2012-06-14 09:18
fijal: while I agree with you, the limit for small ints has actually been pushed to 257 in recent CPythons.  So it should still theoretically work --- of course, assuming a predictable CPU, which is wrong, and assuming a simple interpreter.  (We can probably dig enough to find a timing issue even with CPython, and of course it's clear with any of the other Python interpreters out there.)
msg162761 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012-06-14 09:19
I don't see how the function is going to leak this information when both this patch and the patch in #14955 are applied. With http://bugs.python.org/file25801/secure-compare-fix-v2.patch ord() is no longer used and thus avoid the timing difference for integers > 256 (NSMALLPOSINTS is defined as 257, not 100).
msg162762 - (view) Author: Maciej Fijalkowski (fijall) * (Python committer) Date: 2012-06-14 09:22
Ah unicodes. is encode('unicode-internal') independent on the string characters? I heavily doubt so. you leak at least some information through that function alone.
msg162763 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-06-14 09:31
With PEP 393 unicode objects can have several representations, which makes it unlikely that *really* constant-timing functions can be devised.

Speaking about this particular patch, I don't understand the point. secure_compare() is obviously meant to be used on inputs of some known length, not arbitrary data.
msg162764 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012-06-14 09:55
IMHO it's not obvious to all users. Better safe than sorry. ;)

The invariant 'known and equal length' impresses an artificial limitation. Code may need to compare outside data with internal data without exposing too many details about the structure of the internal data.

The other matter should be discussed at #14955.
msg162765 - (view) Author: Hynek Schlawack (hynek) * (Python committer) Date: 2012-06-14 10:02
I don’t want to be the killjoy but I find it highly questionable to add a function that is advertised as "secure" while we can't fully grok the complexities at play. If we can't produce a provable secure one, we should scrub the function for good; or at least rename it somehow.

Unjustified perceived security is IMHO the worst possible case.
msg162766 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-06-14 10:10
> I don’t want to be the killjoy but I find it highly questionable to
> add a function that is advertised as "secure" while we can't fully
> grok the complexities at play. If we can't produce a provable secure
> one, we should scrub the function for good; or at least rename it
> somehow.

The function is probably secure (modulo unseen bugs) in the
bytestrings-of-the-same-size case. To make it "provably" secure, we
could write a C version (which would be quite easy).

For unicode strings things are a bit trickier though. Again, a C version
could provide some guarantees (and could raise an error if the passed
unicode strings use a different representation from each other).
msg162767 - (view) Author: Maciej Fijalkowski (fijall) * (Python committer) Date: 2012-06-14 10:13
Antoine, seriously? You want to explore a function that's called "secure" when the only thing you know about it is "probably secure"? This is extremely tricky business and I think it should be called secure only if you can prove it's secure. Otherwise it's plain insecure and should not be named that.
msg162768 - (view) Author: Maciej Fijalkowski (fijall) * (Python committer) Date: 2012-06-14 10:15
export not explore. Why can't I edit my own post?
msg162769 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-06-14 10:18
> Antoine, seriously? You want to explore a function that's called
> "secure" when the only thing you know about it is "probably secure"?
> This is extremely tricky business and I think it should be called
> secure only if you can prove it's secure. Otherwise it's plain
> insecure and should not be named that.

What's the methodology to "prove" that it's secure?

We could rename "secure" to "safe" to downtone it a bit, but it's still
an improvement on the nominal equality comparison.
msg162770 - (view) Author: Maciej Fijalkowski (fijall) * (Python committer) Date: 2012-06-14 10:27
For unicode at the very least it's not an improvement at all. With the patch mentioned that does encode it's also not an improvement at all. Prove as in reason about the function in C and make sure it does not do any conditionals depending on the input data. This is much easier than it is in Python.

We did this exercise for PyPy once, just for the sake of it. We looked at generated IR and made sure a comparison is not leaking any data.

As far as the function goes right now - I don't know. For now following the entire code of long_bitwise is a lot of effort - I genuinely can't say that it'll be the same for all numbers of 0-255. Can you? It's easier with low-level language simply (And yes, this is one of the few cases where I would argue it makes sense to implement something in C :)
msg162773 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-06-14 11:17
I recommend to revert the addition of the function, given that it can't be made secure.
msg162775 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012-06-14 12:01
I've two suggestions:

* rename the function to 'total_compare'. The name explains what the function actually does in comparison to '=='. It takes the total input values into account instead of using short circuit comparison.

* restrict the function to bytes. The implementation works well with bytes but not with unicode. It's not an issue since hash digests are bytes. The docs could explain the issue with unicode and how user code can implement a reasonable safe conversion.
msg162777 - (view) Author: Maciej Fijalkowski (fijall) * (Python committer) Date: 2012-06-14 12:16
Hi Christian. It's either secure or it's not. If it's not, there is no point in introducing it at all as I don't think it's a good idea to have a kind-of-secure-but-i-dont-know functions in stdlib.

If you restrict input to bytes it looks okish, but I looked at all the code that's invoked on the C side and it's quite a lot of code. Does you or anyone else actually go and review all the C code that's called via various operations to check if it does or does not depend on the value of various characters? I can't tell myself, it's too long.
msg162778 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-06-14 12:26
> It's either secure or it's not.

I don't think that's true. By that reasoning, Python is not secure so there's no point in fixing crashes or providing a hashlib module.

That said, I think renaming to "total_compare" isn't really helpful. The point of the function is to be secure (as much as possible), not to do a "total" comparison.
msg162838 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-06-15 01:37
Maciej, please read http://mjg59.dreamwidth.org/13061.html

"Secure" vs "not secure" is not a binary state - it's about making attacks progressively more difficult. Something that is secure against a casual script kiddie scatter gunning attacks on various sites with an automated script won't stand up to a systematic attack from a motivated attacker (also see the reporting on Flame and Stuxnet for what a *really* motivated and well resourced attacker can achieve).

The hash randomisation changes didn't make Python completely secure against hashing DoS attacks - it just made them harder, by requiring attackers to figure out the hashing seed for the currently running process first. It's protecting against scatter gun attacks, not targeted ones.

Performing a timing attack on Python's default short-circuiting comparison operation is *relatively easy* because the timing variations can be so large (send strings of increasing length until the time stops increasing - now you know the target digest length. Then try various initial characters until the time starts increasing again - now you know the first character. Repeat for the last character, then start with the second character and work through the string. Now you have the target hash, which you can try to crack offline at your leisure.

The new comparison function is designed to significantly *reduce* the variance, thus leaking *less* information about the target hash, and making the attack *harder* (albeit, probably still not impossible).

I agree with Christian's last two suggestions: change the name to total_compare, and only allow use on byte sequences (where the integer values are certain to be cached).

Nothing should ever be called "safe" or "secure" in the standard library, because the immediate question from anyone that knows what they're talking about is "Secure/safe against what level of threat and what kind of threat"? People that *don't* know what they're doing will think "Secure/safe against everything" and that's dangerously misleading.

Improving protection against remote timing attacks (e.g. by reducing comparison timing variance to the point where it is small relative to network timing variance) is a *lot* easier than protecting against local timing attacks. Protecting against someone with physical access to the machine hosting the target hash is harder still.

Regardless, the target needs to be *improving the status quo*.

Being able to tell people "using hmac.total_compare will make you less vulnerable to timing attacks than using ordinary short circuiting comparisons" is a *good thing*. We just need to be careful not to oversell it as making you *immune* to timing attacks.
msg162845 - (view) Author: Hynek Schlawack (hynek) * (Python committer) Date: 2012-06-15 06:19
> "Secure" vs "not secure" is not a binary state - it's about making attacks progressively more difficult. Something that is secure against a casual script kiddie scatter gunning attacks on various sites with an automated script won't stand up to a systematic attack from a motivated attacker (also see the reporting on Flame and Stuxnet for what a *really* motivated and well resourced attacker can achieve).

The problem here is, that _if_ you add a "secure" to the name of a method, it becomes binary. At least in the minds of the users. I know you address that, but that's the main point here.

> Regardless, the target needs to be *improving the status quo*.
> 
> Being able to tell people "using hmac.total_compare will make you less vulnerable to timing attacks than using ordinary short circuiting comparisons" is a *good thing*. We just need to be careful not to oversell it as making you *immune* to timing attacks.

Why not write a C function which can be more secure than Python code? I would argue that would be an general asset for the stdlib, not just for HMAC (therefore, I'd put it elsewhere).
msg162846 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-06-15 06:31
On 14.06.2012 14:26, Antoine Pitrou wrote:
> 
> Antoine Pitrou <pitrou@free.fr> added the comment:
> 
>> It's either secure or it's not.
> 
> I don't think that's true. By that reasoning, Python is not secure so
> there's no point in fixing crashes or providing a hashlib module.

The proper statement is "It's either time-independent or it's not".
This *is* a binary state (I agree that being secure is not a binary
state).
msg162847 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-06-15 06:37
> Being able to tell people "using hmac.total_compare will make you
> less vulnerable to timing attacks than using ordinary short
> circuiting comparisons" is a *good thing*.

No, it's not. It's a *bad thing*. The two issues that have been
opened since the function was first submitted indicate that people
will keep inspecting the code and find out that it's not
time-independent. If they had been relying on that it is, they will
be upset. Since it's inherently impossible to make the function
time-independent, people will be constantly annoyed about this function.
I can't find anything good in that.

If nobody else does, I'll revert the addition before the beta. Note
that there is no *actual* issue that is being resolved by this function;
it was added only because of its cuteness value.
msg162848 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-06-15 06:40
> Why not write a C function which can be more secure than Python code?

For Unicode strings, it's impossible to write a time-independent
comparison function even in C

> I would argue that would be an general asset for the stdlib

I would argue that it's not. No actual use case for this function
has been demonstrated so far.
msg162850 - (view) Author: Hynek Schlawack (hynek) * (Python committer) Date: 2012-06-15 07:08
>> Why not write a C function which can be more secure than Python code?
> For Unicode strings, it's impossible to write a time-independent
> comparison function even in C

Really? Some comments sounded different. That's too bad but also what I suspected in the first place – it seems to complex.

However, this function seems only useful to bytes anyway so why not strip it down if it _is_ possible with bytes? Am I missing something?

>> I would argue that would be an general asset for the stdlib
> I would argue that it's not. No actual use case for this function
> has been demonstrated so far.

Well, one example: https://github.com/mitsuhiko/python-pbkdf2/blob/master/pbkdf2.py and any other place that compares passwords, tokens, …
msg162852 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-06-15 07:28
Can people please stop raising a false dichotomy and using that as an excuse not to do anything?

The decision is not between "leak some information" and "leak no information". It is between "leak more information" and "leak less information".

The timing variations with standard comparison are relatively massive and relatively easy to analyse (if the time taken goes up, you got the previous digit correct).

With this comparison, they're far more subtle and require much greater analysis to figure out the significance of the timing changes. That reduces the pool of attackers to those capable of performing that analysis (or in possession of tools that will perform that analysis for them).

Yes, the docs and name are currently completely unacceptable. But scorched earth is not a good answer, because that just means people will fall back to using "==" which is *even worse* from a security point of view.
msg162853 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-06-15 07:38
> Well, one example:
> https://github.com/mitsuhiko/python-pbkdf2/blob/master/pbkdf2.py 

It says that it needs that, but I fail to understand why.
pbkdf2 is used to generate encryption keys from passwords, where
you don't need to compare strings at all. Instead, you derive a
key from the password, and use the key e.g. for AES encryption.

If you use pdkdf2 for password hashing, then you do need a comparison
function, but it's irrelevant whether that is time-independent. If an
attacker was able to determine that his hash brings him close to the
actual hash, this is no gain in cracking - since similar hashes do
not at all mean that the passwords are similar.

> and any other place that compares passwords, tokens, …

No no no. Any sensible place to compare passwords would use some
sort of one-way function (password hash) before the comparison,
so that someone breaking into the machine will not gain the clear
text passwords. As a side effect, timing attacks become futile,
since hash functions provide confusion and diffusion, so if a
timing attack detects that it found a key that hashes similar to
the real key, that doesn't get it any closer to revealing the
real key.
msg162855 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-06-15 07:41
To repeat, the specific feature being proposed for retention is:

* a function called hmac.total_compare() that is clearly documented as being still vulnerable to timing analysis given a sufficiently sophisticated attacker, while still being more resistant to such analysis than the standard comparison operator

* restricting that function to operating on bytes, to eliminate timing variations associated with encoding/decoding of Unicode text and reduce those associated with the calculation of integer values

Leaking less information on each comparison is intended to increase the effectiveness of higher level timing attack countermeasures (such as rate limiting and lockouts). Anyone that would use "hmac.total_compare" and call it done is likely using ordinary comparison today (which is even worse).
msg162856 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-06-15 07:45
> The timing variations with standard comparison are relatively massive
> and relatively easy to analyse (if the time taken goes up, you got
> the previous digit correct).

If you have an application that is vulnerable to such an attack, you
better reconsider your entire approach, rather than using a library
function that says it will harden your approach (but may
actually not).

> Yes, the docs and name are currently completely unacceptable. But
> scorched earth is not a good answer, because that just means people
> will fall back to using "==" which is *even worse* from a security
> point of view.

It's not scorched earth. It's the standard procedure for adding features
to the standard library. *At a minimum* there needs to be a use case,
which has not been demonstrated yet (the OP of the original report
thought he had a use case, but then agreed that he was wrong). Then,
the use case should be fairly relevant for inclusion in the standard
library. I wish there was an AES implementation in the library before
we start worrying about stuff like this. Then, ideally, the new library
function has been in wide use for some time.

This was rushed, and it needs to be reverted.
msg162857 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-06-15 07:47
> To repeat, the specific feature being proposed for retention is:

To repeat, no use case has been demonstrated for that function. It
has been added because it was fun to write, not because it is useful.
msg162858 - (view) Author: Maciej Fijalkowski (fijall) * (Python committer) Date: 2012-06-15 07:49
On Fri, Jun 15, 2012 at 9:41 AM, Nick Coghlan <report@bugs.python.org>wrote:

>
> Nick Coghlan <ncoghlan@gmail.com> added the comment:
>
> To repeat, the specific feature being proposed for retention is:
>
> * a function called hmac.total_compare() that is clearly documented as
> being still vulnerable to timing analysis given a sufficiently
> sophisticated attacker, while still being more resistant to such analysis
> than the standard comparison operator
>
> * restricting that function to operating on bytes, to eliminate timing
> variations associated with encoding/decoding of Unicode text and reduce
> those associated with the calculation of integer values
>
> Leaking less information on each comparison is intended to increase the
> effectiveness of higher level timing attack countermeasures (such as rate
> limiting and lockouts). Anyone that would use "hmac.total_compare" and call
> it done is likely using ordinary comparison today (which is even worse).
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue15061>
> _______________________________________
>

Nick, I fail to understand why are you opposing writing such a function in
C. Such a function can be provably time-independent (and as MvL says this
is a binary state), at least as long as it operates on bytes (I'll refrain
from asking about unicode, I think it's possible, but I dunno).

For the same function in python it's at the very least much harder to prove
(and has bugs as we've seen)

Cheers,
fijal
msg162859 - (view) Author: Maciej Fijalkowski (fijall) * (Python committer) Date: 2012-06-15 07:50
On Fri, Jun 15, 2012 at 9:47 AM, Martin v. Löwis <report@bugs.python.org>wrote:

>
> Martin v. Löwis <martin@v.loewis.de> added the comment:
>
> > To repeat, the specific feature being proposed for retention is:
>
> To repeat, no use case has been demonstrated for that function. It
> has been added because it was fun to write, not because it is useful.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue15061>
> _______________________________________
>

Is comparing passwords against a secure one not useful?
msg162860 - (view) Author: Hynek Schlawack (hynek) * (Python committer) Date: 2012-06-15 07:55
>> and any other place that compares passwords, tokens, …
> 
> No no no. Any sensible place to compare passwords would use some
> sort of one-way function (password hash) before the comparison,
> so that someone breaking into the machine will not gain the clear
> text passwords.

I agree that this is the right way to do. However I disagree that it's also the only sensible way to do in the real world. Sometimes you just _have_ to compare sensitive strings, whether you like it or not.

I see your point that adding such a function would leverage bad security behavior and thus may be a bad thing. The usefulness of such a function to some(?) people is IMHO not disputable though.
msg162861 - (view) Author: Maciej Fijalkowski (fijall) * (Python committer) Date: 2012-06-15 07:58
On Fri, Jun 15, 2012 at 9:55 AM, Hynek Schlawack <report@bugs.python.org>wrote:

>
> Hynek Schlawack <hs@ox.cx> added the comment:
>
> >> and any other place that compares passwords, tokens, …
> >
> > No no no. Any sensible place to compare passwords would use some
> > sort of one-way function (password hash) before the comparison,
> > so that someone breaking into the machine will not gain the clear
> > text passwords.
>
> I agree that this is the right way to do. However I disagree that it's
> also the only sensible way to do in the real world. Sometimes you just
> _have_ to compare sensitive strings, whether you like it or not.
>
> I see your point that adding such a function would leverage bad security
> behavior and thus may be a bad thing. The usefulness of such a function to
> some(?) people is IMHO not disputable though.
>

Note that this does not relief you from using a time-independent comparison
function. If you call some hash function (which time is known to the
attacker), then you compare it against a stored hashed version. If you use
a normal compare you're leaking the hash. This is indeed not as bad as
leaking the password, but it has been demonstrated that one-direction
functions are still vulnerable to some sort of attacks, so it's not ideal
either.
msg162862 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-06-15 07:59
This point was discussed in #14532 when the new API was added.

From http://bugs.python.org/issue14532#msg158045:

"""Given that this issue has affected a lot of security-sensitive third-party code (keyczar, openid providers, almost every python web service that implements "secure cookies" [1] or other HMAC-based REST API signatures), I do like the idea of adding a warning in the relevant documentation as sbt proposed.

The only reason I'd recommend _not_ putting a time_independent_comparison() function in the hmac module is that it's not really HMAC-specific. In practice, any fixed-length secrets should be compared in a time-independent manner. It just happens that HMAC verification is a pretty common case for this vulnerable construct. :-)"""

For password hashing, the attacker is unlikely to be able to provide the digest directly, but for signature checking it's far more likely to be the case.

The idea is to make it easy for people to reduce the time variance of their digest comparisons as the *default* choice when writing security related code. Deciding whether or not the risk of a timing attack is actually significant requires you to look at the system as a whole and decide "Oh, OK, shortcircuiting comparison doesn't leave us open to timing analysis here, we can use it as a performance enhancement". (Although, in most systems, there are likely to be plenty of other less sensitive places to go after for performance improvements first)
msg162863 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-06-15 08:03
I'm not really opposed to writing it in C - I just don't think rewriting it in C should be a requirement for keeping it. Even in pure Python, it still leaks less information than the standard comparison operator.
msg162864 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-06-15 08:03
> Is comparing passwords against a secure one not useful?

I claim that this use case doesn't occur in practice. Everybody uses
hashed passwords. If they do compare against a plain-text password,
and they want to change something about it, they should switch to
hashed passwords.
msg162865 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-06-15 08:06
> I see your point that adding such a function would leverage bad
> security behavior and thus may be a bad thing. The usefulness of such
> a function to some(?) people is IMHO not disputable though.

I think this entire issue is out of scale. There is really bigger fish
to fry. The two people who really need this should post a package to
PyPI.
msg162866 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-06-15 08:09
> Note that this does not relief you from using a time-independent comparison
> function. If you call some hash function (which time is known to the
> attacker), then you compare it against a stored hashed version. If you use
> a normal compare you're leaking the hash. This is indeed not as bad as
> leaking the password, but it has been demonstrated that one-direction
> functions are still vulnerable to some sort of attacks, so it's not ideal
> either.

But you don't leak the hash - you leak the first byte of the hash if you
make 256 tries, and the first two bytes if you make 65536 tries. To leak
the first four bytes of the hash, you need to make 2**32 tries.
So this is equivalent to a brute-force attack, which works just as well
against a time-independent function. So using a time-independent
function does not add any security.
msg162867 - (view) Author: Maciej Fijalkowski (fijall) * (Python committer) Date: 2012-06-15 08:12
On Fri, Jun 15, 2012 at 10:09 AM, Martin v. Löwis <report@bugs.python.org>wrote:

>
> Martin v. Löwis <martin@v.loewis.de> added the comment:
>
> > Note that this does not relief you from using a time-independent
> comparison
> > function. If you call some hash function (which time is known to the
> > attacker), then you compare it against a stored hashed version. If you
> use
> > a normal compare you're leaking the hash. This is indeed not as bad as
> > leaking the password, but it has been demonstrated that one-direction
> > functions are still vulnerable to some sort of attacks, so it's not ideal
> > either.
>
> But you don't leak the hash - you leak the first byte of the hash if you
> make 256 tries, and the first two bytes if you make 65536 tries. To leak
> the first four bytes of the hash, you need to make 2**32 tries.
> So this is equivalent to a brute-force attack, which works just as well
> against a time-independent function. So using a time-independent
> function does not add any security.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue15061>
> _______________________________________
>

Martin, you fail to understand how this works. You don't do 2**32 tries to
leak the 4 charaters, you need 4 * 256, that's why this attack is so bad,
because the time needed for the next character is brute force, but then you
can move on to the next one.
msg162868 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-06-15 08:13
> For password hashing, the attacker is unlikely to be able to provide
> the digest directly, but for signature checking it's far more likely
> to be the case.

Can you elaborate? What is the application, where is the digest
checking, and what is the threat?
msg162871 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-06-15 08:18
> Martin, you fail to understand how this works. You don't do 2**32 tries to
> leak the 4 charaters, you need 4 * 256, that's why this attack is so bad,
> because the time needed for the next character is brute force, but then you
> can move on to the next one.

How so? Assume we have a hashed password, and assume we have somehow
guessed the first three bytes. How can I then find out the fourth byte
in only 256 tries?

I would have to generate passwords whose *hash* matches in the first
three bytes. This is not feasible, for any hash function that is worth
its salt.
msg162872 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-06-15 08:32
That's why the vulnerable cases are far more likely to be related to *signature* checking. In those you can generally provide both the hash input (the message) and the hash target (the purported "signature").

If the signature check uses a time-dependent comparison that exhibits a lot of externally visible variance, then you can use a timing attack to find the signature that corresponds to a particular message (by keeping the message constant and changing the "signature"). Depending on the nature of the message, you're potentially done at that point (since on your final attempt your signed message was accepted), or else you may be after data that you can feed into an analysis aimed at breaking the signing key itself (a much harder prospect, but still possible given a sufficiently large sample, or a signing algorithm that is vulnerable to leaking the key as a result of chosen plaintext attacks).

Yes, system level defences are also important (that's why multiprocessing turned out to not, in fact, be vulnerable to an attack based on time dependent signature comparisons), but minimising information leakage is just a good principle of secure design.
msg162873 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2012-06-15 08:36
For example, Django uses time independent comparison to compare signatures of signed cookies. A signed cookie consists of a plain-text value followed by a signature.

An attacker wants to construct a cookie that has a malformed value and a valid signature for that value. Let's assume that a signature is a string of 16 hex characters.

If a short-cut comparison was used, the attacker would require at most 16 tries to find out the first character. He first tries the signature "000...0", then "100...0", and so on until he notices that Django takes a slightly longer time to respond. Now he know what's the first character of the hash, let's assume it's "8". He then tries "8000...0", "810...0", and so on until he finds the second character. He continues this until he has the correct 16 characters. This takes at most 16 * 16 tries.

But because Django uses a constant-time comparison function, the attacker cannot guess one character at a time, and he needs 16 ** 16 tries.

In real world, 16 * 16 tries is not enough, of course. But repeating the same requests many times, the timing variations can be used to reveal which is the correct character in each step.
msg162875 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-06-15 08:42
FWIW, Petri's example also explains why leaking the expected length of the string is considered an acceptable optimisation in most reimplementations of this signature check comparison: the attacker is assumed to already know the expected length of the signature, because it's part of a documented protocol or API.

However, I think it's more reasonable for a standard library implementation to omit that optimisation by default.
msg162877 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-06-15 09:11
> That's why the vulnerable cases are far more likely to be related to
> *signature* checking. In those you can generally provide both the
> hash input (the message) and the hash target (the purported
> "signature").

I see. I wonder how feasible this attack is, though, since the signature
validation involves a lot of computation (including
hashing the original message), of which comparing the signature
values is but a tiny step.

In addition, for public-key signatures (RSA and DSA), I fail to see the
threat. I can verify the signature offline, so I don't need to rely on
timing attacks to find out what the correct signature would be. Plus
I still need to brute-force the signatures, whether for offline
generation, or in the timing attack, because it's *not* the case
that the expected signature gets compared with the provided signature.
Instead, something derived from the message gets compared with something
derived from the purported signature.

It's really only HMAC which may be vulnerable. In

http://seb.dbzteam.org/crypto/python-oauth-timing-hmac.pdf

the author discusses that the real protection against this attack is
to fix the actual bugs in the OAuth implementation (when OAuth is
designed to prevent replay attacks which would also prevent this
attack). In the end, he claims that using time-independent comparison
would "add even more security", but in the same hand-waving fashion
shown in this issue (i.e. without providing any proof that his proposed
attack actually works - which I claim it won't, due to the noise
caused by the HMAC generation).
msg162880 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012-06-15 10:00
Oh dead god, what have I done ... I threw a small stone and caused a major landslide. :)

I'm all with Nick on this topic. A correctly named and documented function provides a tool to users that greatly reduced the change of a side channel attack. It's all about teaching good practice. I also agree that we must neither call it 'secure' nor documented it as 'secure'. I believe the correct term is 'hardened against timing analysis and side channel attacks'

I could wrap up a quick C implementation if you like. The operator module is a better place for a total_compare() function. Do you a agree?

I recommend that you read/watch Geremy Condra's  PyCon talk "Through the Side Channel: Timing and Implementation Attacks in Python". The slides contain timing analysis diagrams.
msg162882 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-06-15 10:31
> I could wrap up a quick C implementation if you like. The operator
> module is a better place for a total_compare() function. Do you a
> agree?

I think the function is fine in either hashlib or hmac. Putting it in
one of these modules is a hint that it's security-related. On the other
hand, linking to it from these modules' documentations is just as fine,
if it is put in the operator module.

If you make a C implementation, it could also be interesting to cover
the pure-ASCII unicode case.
msg162885 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-06-15 10:52
As a first step, I'm going to make a change to:

1. Rename the function to "compare_digest"
2. Remove the support for comparing strings
3. Update the documentation to be much clearer about its limitations (including why it's considered OK to leak the expected length of the digest)

If a C implemented operator.total_compare is made available, then hmac.compare_digest could be updated to use it (retaining the length shortcircuiting behaviour)
msg162888 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-06-15 11:14
New changeset f36af3766a20 by Nick Coghlan in branch 'default':
Issue #15061: Don't oversell the capabilities of the new non-shortcircuiting comparison function in hmac
http://hg.python.org/cpython/rev/f36af3766a20
msg162891 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-06-15 12:16
OK, the worst aspects (the misleading name and documentation) have been dealt with, so that leaves the questions of:

1. Avoiding leaking the length information (seems unnecessary, since most digests are part of protocols where they have a known, published length, or you can find out the length by looking at public source code)

2. Providing a C implementation via the operator module (given the restriction to bytes values, and the assumption of caching for all relevant integers, would a C reimplementation really be buying us much additional security?)

As far as restoring unicode support (even in a C implementation) goes, I actually don't like the idea. For the general unicode case, as suggested in the updated documentation for hexdigest(), I believe the better approach is to try to stay in the bytes domain as much as possible, and avoid having a Unicode->bytes conversion for the expected value anywhere in the comparison timing path.
msg162892 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-06-15 12:21
> 2. Providing a C implementation via the operator module (given the
> restriction to bytes values, and the assumption of caching for all
> relevant integers, would a C reimplementation really be buying us much
> additional security?)

I like the fact that a C implementation can be audited much more easily.
Who knows what kind of effects the Python implementation can trigger, if
some optimizations get added in the future.

> As far as restoring unicode support (even in a C implementation) goes,
> I actually don't like the idea. For the general unicode case, as
> suggested in the updated documentation for hexdigest(), I believe the
> better approach is to try to stay in the bytes domain as much as
> possible, and avoid having a Unicode->bytes conversion for the
> expected value anywhere in the comparison timing path.

The point of supporting unicode would precisely be to avoid a
unicode->bytes conversion when unicode strings are received.
msg162893 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012-06-15 12:34
Am 15.06.2012 14:21, schrieb Antoine Pitrou:
> I like the fact that a C implementation can be audited much more easily.
> Who knows what kind of effects the Python implementation can trigger, if
> some optimizations get added in the future.

Secondly we can predict the function's timing on other implementations
of Python. Jython or PyPy might have different settings for small int
caching -- or non at all.

> The point of supporting unicode would precisely be to avoid a
> unicode->bytes conversion when unicode strings are received.

A byte-wise comparison of the memory representation would work IFF both
sides have the same type and unicode kind. Anything else could give away
details of the content.

Either:
PyBytes_CheckExact(a) && PyBytes_CheckExact(b)

or

PyUnicode_CheckExact(a) && PyUnicode_CheckExact(b) && PyUnicode_KIND(a)
== PyUnicode_KIND(b)

I'm not sure about CheckExact, just being paranoid.
msg162895 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-06-15 12:53
> > The point of supporting unicode would precisely be to avoid a
> > unicode->bytes conversion when unicode strings are received.
> 
> A byte-wise comparison of the memory representation would work IFF both
> sides have the same type and unicode kind. Anything else could give away
> details of the content.

My proposal was to only allow them on ASCII strings. Any other unicode
kind would raise an error (perhaps NotImplementedError).
msg162899 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-06-15 14:01
(Ah, the dangers of using a real text editor for edit fields. This got rather long, but I think it's all still relevant)

I'm persuaded that a C implementation is a good idea in the long run. However, I *don't* think we should rush the design of it. It doesn't seem right to me to be adding a new function to the operator module this close to feature freeze (assuming that's even the right place to add this functionality), so I'd like to close this issue on the basis of my last round of updates, and create a new one aimed at 3.4 to provide a lower level time independent bytes comparison primitive.

I have a couple of major reasons for feeling this way:

Firstly, the various Python web servers and frameworks have achieved a tolerable level of security through pure Python bytes comparisons that simply avoid short circuiting (sufficent to avoid them getting CVEs raised against them, anyway)

Secondly, it seems to me that the proposed lower level feature may make more sense as a bytes method rather than as a function in the operator module.

Consider the proposed properties for the lower level security primitive:

1. Handle comparison of bytes objects via C level indexing and xor operations

I agree on the value of this part - it makes our implementation easy to audit, and explicitly puts the burden on other implementations to support this as a security primitive. Either approach (bytes method or operator function) would achieve this goal.

2. Only leak length information for one of the supplied arguments under timing analysis.

Initially I was thinking a function based implementation of this would be confusing, as it would be quite hard to remember which parameter controlled the perceived length. However, I realised it doesn't actually matter - if controlled by the expected value, then the comparison time appears independent of the input length, and doesn't leak clear timing information on the expected length. If controlled by the input value, then it varies with the input length and still doesn't leak clear information regarding the length of the expected value.

However, we can exploit the asymmetry of a method call signature to make it clear that in expected.compare(provided), the time taken for the comparison is dictated by the length of "expected" (or vice-versa - whichever way gets chosen, the asymmetry should help make it easier to remember).

In either case, hmac.compare_digest would remain as a simpler "argument order doesn't matter, but the expected length is assumed to be public" API (which, given real world use cases, is actually a reasonable assumption).

3. Handles Unicode objects of the same kind via C level indexing and xor operations

I'm still not seeing the point of this. To avoid timing attacks based on encoding your best bet is to do any encoding steps *before* calculating and comparing any signatures (that way the encoding duration is certain to be independent of any secrets involved in the signature verification). The reason the embedded unicode support was problematic was because both the supplied *and* the expected value were being converted to integers during the comparison, and a timing analysis could potentially exploit that to retrieve information about the expected value.

The timing vulnerability comes from this sequence:

1. Receive Unicode message, with signature
2. In Unicode, derive expected signature from message
3. Encode expected signature <-- Timing vulnerability here
4. Encode received signature
5. Compare the derived and expected signature

The original implementation had this problem, because the repeated calls to ord() were roughly equivalent to encoding the string as UTF-32, only with even bigger timing variations (due to the integer caching behaviour)

Compare that with the following sequence, which avoids the problem:

1. Receive Unicode message, with signature
2. Encode the received signature to bytes
3. Encode the message to bytes <-- Cannot expose signature related secrets
4. Derive the expected signature from the encoded message
5. Compare the derived and expected signature

Given that crypto primitives are designed to operate on byte sequences *anyway*, the latter approach is likely to be the only one that works regardless of Unicode support (or lack thereof).

Really, I think the Unicode support in the initial version of the function was just a mistake, pure and simple - the use of ord() during the iteration is a 2.x backwards compatibility trick for 8-bit strings that isn't applicable in the 3.x standard library.

You can see this in the equivalent code in the tornado web server, where the ord() based implementation is invoked to handle Python 2 byte strings, not Unicode text:

https://github.com/facebook/tornado/blob/master/tornado/web.py#L2004

4. The operator module is about protocols, this is not a protocol

Unlike the other operations in operator, this operation care more about consistency of timing, and the only way of achieving that is through strict type limitations. That's more appropriately handled through a new method on the affected types than it is through a new operator function.
msg162914 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-06-15 15:48
> Secondly, it seems to me that the proposed lower level feature may
> make more sense as a bytes method rather than as a function in the
> operator module.

If it's a function, though, it can compare all kinds of buffer-like
objects (bytearrays, memoryviews, etc.).
msg162949 - (view) Author: Jon Oberheide (Jon.Oberheide) Date: 2012-06-16 03:05
Wow, that escalated quickly. :-)

Nick, thanks for keeping things focused and on track.

To recap, the primary motivation here is two-fold. First, folks are using == pretty frequently in an unsafe manner when comparing digests, signatures, and other fixed-length strings. This is not good. Second, as we've seen in this thread and elsewhere, getting this right is not easy. Which is the exact reason it belongs in python's stdlib, so that folks do not try to implement it themselves incorrectly.

And again, preventing the leakage of the length of the inputs is not an intended goal here for this use case. It'd be best to keep things as simple as possible while achieving the desired security objective.
msg162950 - (view) Author: Jon Oberheide (Jon.Oberheide) Date: 2012-06-16 03:16
On a side note, it may be useful to follow the conventions that already exist in OpenBSD for their timingsafe_bcmp(3):

http://www.rootr.net/man/man/timingsafe_bcmp/3

"timingsafe" may be a more reasonable naming convention that is a bit less strong the "secure" which may be more appropriate.

Also, the implementation does leak the length of the string (well, actually you provide the length "n", but in real-world usage "n" is the static length of the expected input):

ftp://ftp.fr.openbsd.org/pub/OpenBSD/src/lib/libc/string/timingsafe_bcmp.c
msg163159 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012-06-19 13:10
I've increased the priority to "release blocker".

Reason:
We should come to an agreement how to handle the issue. In particular we must not pronounce something as secure that isn't secure.

Options:

1) Remove the function.

2) Rename the function to a more sensible name and provide a bytes only implementation. I like the Jon's proposal and suggest timingsafe_compare().

2b) optionally create a C implementation as it's much easier to check C code for timing issues.
msg163163 - (view) Author: Hynek Schlawack (hynek) * (Python committer) Date: 2012-06-19 13:27
I thought this is settled as of f36af3766a20 (option 2)?
msg163168 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012-06-19 14:27
Oh, I totally missed Nick's checkin. Sorry for the noise.

Should we add the encode('unicode-internal') trick from #14955 as the next best way to compare to unicode strings?
msg163170 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-06-19 14:33
> Should we add the encode('unicode-internal') trick from #14955 as the next best way to compare to unicode strings?

I don't think so.
msg163186 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-06-19 16:07
Unicode string timing depends on the string implementation which depends on the maximum character code in the string. Strings 'A'*9999+'$' 'A'*9999+'€'  have different timings for almost all operations (inluding encode('unicode-internal')).
msg163188 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-06-19 16:14
Oh, I see, Antoine said the same thing (msg162771).
msg163192 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012-06-19 17:08
I'm well aware of the fact that they have different timings. That's why I argued against including a unicode aware variant of the timing safe compare function.

I've used Guido's time machine and seen requests for a unicode function in the future. ;) I think (educated guess) that s.encode('unicode-internal') discloses the least amount of information. That way I argued that we suggest it in the documentation.
msg163193 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-06-19 17:12
> I'm well aware of the fact that they have different timings. That's
> why I argued against including a unicode aware variant of the timing
> safe compare function.

I would not want to repeat myself, but the compare function can be made
safe if it restricts itself to a given kind of unicode string (say,
all-ASCII).
msg163196 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012-06-19 17:24
Alright, Antoine.

Shall explore option 2b) "optionally create a C implementation as it's much easier to check C code for timing issues" as I suggested in http://bugs.python.org/issue15061#msg162893 ?
msg163204 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2012-06-19 20:08
So it's not a blocker anymore, right?
msg163329 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-06-21 12:14
> Shall explore option 2b) "optionally create a C implementation as it's much easier to check C code for timing issues"

Definitely. I'm not sure whether that can go in 3.3 post-beta, though.
msg163333 - (view) Author: Maciej Fijalkowski (fijall) * (Python committer) Date: 2012-06-21 14:03
Hi.

This is what we did with Armin: http://bpaste.net/show/32123/

It seems there is still *some* information leaking via side-channels, although it's a bit unclear what. Feel free to play with it (try swapping, having different object etc.)
msg163343 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012-06-21 16:24
I've attached a header for that implements a single C function timingsafe_eq(a, b). The file is targeted for Objects/stringlib/timingsafe.h. Please review the file.

Comments
--------

- I only handle exact byte or unicode types (no subclasses) since a user may have overwritten __eq__ and I don't want to special case it.

- The unicode path works only with compact ASCII strings. I'm not familiar with the new API so please scream if I did it wrong.

- length difference is currently optimized, length 0 isn't. I could easily un-optimize the len(a) != len(b) case or optimize the len(a) == len(b) == 0 case.

Open questions
--------------

Where should I place the function? hashlib would be a nice place but there are multiple backends for hashlib. _hashopenssl.c seems wrong.
msg163347 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-06-21 16:41
> The file is targeted for Objects/stringlib/timingsafe.h.

stringlib is for type-generic functions, so I don't think it should be
put there.

> - I only handle exact byte or unicode types (no subclasses) since a
> user may have overwritten __eq__ and I don't want to special case it.

We could handle all bytes-compatible objects, using the buffer API.

> - The unicode path works only with compact ASCII strings. I'm not
> familiar with the new API so please scream if I did it wrong.

It looks ok to me.

> Where should I place the function? hashlib would be a nice place but
> there are multiple backends for hashlib. _hashopenssl.c seems wrong.

Well, if it's meant to be a private function called from hmac, where
it's defined is an implementation detail. I think practicality beats
purity, so _hashopenssl is a good enough place.
msg163365 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-06-21 20:55
> > - I only handle exact byte or unicode types (no subclasses) since a
> > user may have overwritten __eq__ and I don't want to special case it.
> We could handle all bytes-compatible objects, using the buffer API.

It is timing unsafe.

> > - The unicode path works only with compact ASCII strings. I'm not
> > familiar with the new API so please scream if I did it wrong.
> It looks ok to me.

The user can just do timingsafe_eq(a.decode('ascii'),
b.decode('ascii')). I do not see a necessity in support of unicode
strings. Support ASCII strings will create the false impression that all
strings are supported.

About code. Instead (PyBytes_CheckExact(a) && PyBytes_CheckExact(b)) you
should use ((PyBytes_CheckExact(a) != 0) & (PyBytes_CheckExact(b) !=
0)).
msg163366 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-06-21 21:00
> The user can just do timingsafe_eq(a.decode('ascii'),
> b.decode('ascii')). 

You mean .encode()?

> I do not see a necessity in support of unicode
> strings. Support ASCII strings will create the false impression that all
> strings are supported.

I agree.

> About code. Instead (PyBytes_CheckExact(a) && PyBytes_CheckExact(b)) you
> should use ((PyBytes_CheckExact(a) != 0) & (PyBytes_CheckExact(b) !=
> 0)).

What's the difference? They are the same.
msg163368 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-06-21 21:09
> You mean .encode()?

Yes, of cause. timingsafe_eq(a.encode('ascii'), b.encode('ascii')).

> > About code. Instead (PyBytes_CheckExact(a) && PyBytes_CheckExact(b)) you
> > should use ((PyBytes_CheckExact(a) != 0) & (PyBytes_CheckExact(b) !=
> > 0)).
> 
> What's the difference? They are the same.

Laziness. If "a" (a secret key) is not bytes then PyBytes_CheckExact(b)
("b" is a user input) is not called. It exposes secret key type. I'm not
sure if it is real secret however.
msg163371 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-06-21 21:45
> > > - I only handle exact byte or unicode types (no subclasses) since a
> > > user may have overwritten __eq__ and I don't want to special case it.
> > We could handle all bytes-compatible objects, using the buffer API.
> 
> It is timing unsafe.

How so?

> > > - The unicode path works only with compact ASCII strings. I'm not
> > > familiar with the new API so please scream if I did it wrong.
> > It looks ok to me.
> 
> The user can just do timingsafe_eq(a.decode('ascii'),
> b.decode('ascii')).

I don't think that's the right answer, because people will instead e.g.
encode('utf-8'), and suddently the encodingly will not be timing-safe.
msg163377 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012-06-21 22:40
I'm a bit rusty and I hope I got it right. The ASCII unicode case is a good idea and IMO timing safe. The buffer path is also timing safe once I have both views. 

The function leaks some timing information when an error occurs. Since the timing just reveals minimal information about the involved types and none about the bytes it's IMO safe. The acquiring of the buffer views may leak an unknown amount of timing data which may be an issue. The comparison is still safe.

I've introduced a new module _hashlibfb (fb = fallback) for systems without openssl. I'm also open for a completely new module for future implementation of other digest, key derivation (PBKDF2) and password related C code.
msg163378 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012-06-21 23:26
The patch has another flaw. The compiler may choose to fold and optimize code in _tscmp(). I'm going to declare the length of the right side and both char* as volatile. That should stop any compiler.

I could also add some pragmas:

MSVC:
#pragma optimize("", off)
code
#pragma optimize("", on)

GCC 4.4+:
#pragma GCC push_options
#pragma GCC optimize ("O0")
code
#pragma GCC pop_options
msg163385 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-06-22 06:08
> > > We could handle all bytes-compatible objects, using the buffer API.
> > It is timing unsafe.
> How so?

I checked myself, and I see that most likely I was wrong. At least for
bytes and bytearrays it is timing safe.

> I don't think that's the right answer, because people will instead e.g.
> encode('utf-8'), and suddently the encodingly will not be timing-safe.

And what of that? It is outside of the timingsafe_eq function. People
can also do other timing unsafe operations with the secret key (for
example reading it from file or from DB) or not to use timingsafe_eq at
all. The secret key should be pre-encoded.

The error will be if code works for developer from ASCII word, and then
on the other side of ocean it will no longer work with non-ASCII
strings. You are expected to be familiar with such issues. In any case,
the obvious (and simplest, and fastest) way to check that a string is
ASCII-only is try to encoded it to ASCII.
msg163390 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012-06-22 10:09
> The error will be if code works for developer from ASCII word, and then
> on the other side of ocean it will no longer work with non-ASCII
> strings. You are expected to be familiar with such issues. In any case,
> the obvious (and simplest, and fastest) way to check that a string is
> ASCII-only is try to encoded it to ASCII.

No, the fastest way is to check the kind attribute of the unicode object in C code. That doesn't involve any additional conversion or Python function call. The function is deliberately limited.

The ASCII fallback is very useful as most people will store hex encoded bytes of their passphrases in their databases. With ASCII support you can do timingsafe_compare(hex_from_db, hmac.hexdigest()).


Maciej:
http://pastebin.com/ZAAjSkJh

The C function is one order of magnitude faster and the spread is one order smaller. 1e-07 is within the noise level on my idle computer. A busy server will have a higher noise level.
msg163468 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012-06-22 19:42
In order to get the patch in before the beta release I'm willing to drop the promise and document that the function may leak some information if the arguments differ in length or the arguments' types are incompatible.
msg163469 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-06-22 19:50
> In order to get the patch in before the beta release I'm willing to
> drop the promise and document that the function may leak some
> information if the arguments differ in length or the arguments' types
> are incompatible.

That's not a problem to me. Programming errors are not the nominal case.
msg163613 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012-06-23 14:20
Updated patch with volatile, better error report for non-ASCII strings and updated comments
msg163614 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-06-23 14:25
I'm not really happy with the addition of a separate extension module for a single private function. You could just put it in the operator module, for instance.

Also, the idea was not to expose timingsafe_cmp but to use it in compare_digest().
msg163615 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012-06-23 14:41
Me neither but you didn't want it in the operator module in the first place (msg162882). :) Please make a decision. I'm happy to follow it.

My idea is to drop the pure Python implementation of compare_digest() and just use the C implementation.
msg163617 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-06-23 14:50
> Me neither but you didn't want it in the operator module in the first
> place (msg162882). :) Please make a decision. I'm happy to follow it.

Oh, sorry. I've changed my mind about it, but I think the operator module should only export a private function (and hmac or hashlib export it publicly).
msg163619 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2012-06-23 14:54
Doesn't belong into operator IMO.  We used to have a "strop" module where it would have fitted...
msg163623 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-06-23 15:08
This is why I wanted to close the issue with the pure Python implementation, and punt on the question of a C accelerator for the moment.

compare_digest is effectively the same as what all the Python web servers and frameworks do now for signature checking. Yes, it's more vulnerable to timing attacks than a C implementation, but it's going to to take a sophisticated attacker to attack that through the noise of network jitter. It's sufficiently hardened that's it's unlikely to be the weakest link in the security chain.

For 3.4, I hope to see a discussion open up regarding the idea of something like a "securitytools" module that aims to provide some basic primitives for operations where Python's standard assumptions (such as flexibility and short circuiting behaviour) are a bad fit for security reasons. That would include exposing a C level full_compare option, as well as the core pbkdf2 algorithm.
msg163625 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-06-23 15:19
> Doesn't belong into operator IMO.  We used to have a "strop" module
> where it would have fitted...

Again, it can be a private function in the operator module that happens
to be wrapped or exposed in the hmac module. Practicality beats purity.
msg163626 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2012-06-23 15:22
Yes.
msg163627 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012-06-23 15:24
> Again, it can be a private function in the operator module that happens
> to be wrapped or exposed in the hmac module. Practicality beats purity.

Yes, we just need a place for the function. The operator module is a
good place if we don't want to introduce a new module.

Nick:
I'll target pbkdf2 and other password hashing / security related for 3.4.
msg163630 - (view) Author: Hynek Schlawack (hynek) * (Python committer) Date: 2012-06-23 15:35
> For 3.4, I hope to see a discussion open up regarding the idea of something like a "securitytools" module that aims to provide some basic primitives for operations where Python's standard assumptions (such as flexibility and short circuiting behaviour) are a bad fit for security reasons. That would include exposing a C level full_compare option, as well as the core pbkdf2 algorithm.

Strong +1 on that one. We could even consider adding bcrypt and scrypt as C isn't really an issue for us.

Ideally we'd add a module with docs which both promote and leverage secure behavior. Basically how to realize http://www.daemonology.net/blog/2009-06-11-cryptographic-right-answers.html in Python.
msg163652 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012-06-23 17:57
New patch. The compare_digest method now lives in the operator module as operator._compare_digest
msg163671 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-06-23 20:42
>>> About code. Instead (PyBytes_CheckExact(a) && PyBytes_CheckExact(b)) you
>>> should use ((PyBytes_CheckExact(a) != 0) & (PyBytes_CheckExact(b) !=
>>> 0)).
>>
>> What's the difference? They are the same.
> 
> Laziness. If "a" (a secret key) is not bytes then PyBytes_CheckExact(b)
> ("b" is a user input) is not called. It exposes secret key type. I'm not
> sure if it is real secret however.

I see; I missed that your version was using &. In any case, I don't
think this is a threat: you couldn't use it to get the secret key
faster.
msg163696 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012-06-24 00:07
New patch.

I've removed the special handling of PyBytes_CheckExact, support subclasses of str, non compact ASCII str and updated the docs.

(Next time I'll create a sandbox and push my work to its own branch.)
msg163780 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-06-24 11:49
New changeset c82451eeb595 by Christian Heimes in branch 'default':
Issue #15061: Re-implemented hmac.compare_digest() in C
http://hg.python.org/cpython/rev/c82451eeb595
msg163784 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012-06-24 13:12
Thanks to all for your input and assistance!
History
Date User Action Args
2022-04-11 14:57:31adminsetgithub: 59266
2012-06-24 13:12:05christian.heimessetstatus: open -> closed
resolution: fixed
messages: + msg163784

stage: commit review -> resolved
2012-06-24 11:49:01python-devsetmessages: + msg163780
2012-06-24 00:07:49christian.heimessetfiles: + compare_digest_c-2.patch

messages: + msg163696
2012-06-23 21:48:41gregory.p.smithsetnosy: + gregory.p.smith
2012-06-23 20:42:36loewissetmessages: + msg163671
2012-06-23 17:57:50christian.heimessetfiles: + compare_digest_c.patch

messages: + msg163652
2012-06-23 15:35:10hyneksetmessages: + msg163630
2012-06-23 15:24:20christian.heimessetmessages: + msg163627
2012-06-23 15:22:02georg.brandlsetmessages: + msg163626
2012-06-23 15:19:00pitrousetmessages: + msg163625
2012-06-23 15:08:57ncoghlansetmessages: + msg163623
2012-06-23 14:54:54georg.brandlsetmessages: + msg163619
2012-06-23 14:50:54pitrousetmessages: + msg163617
2012-06-23 14:41:00christian.heimessetmessages: + msg163615
2012-06-23 14:25:15pitrousetmessages: + msg163614
2012-06-23 14:20:29christian.heimessetfiles: + timingsafe_cmp-2.patch

messages: + msg163613
2012-06-22 19:50:54pitrousetmessages: + msg163469
2012-06-22 19:42:24christian.heimessetmessages: + msg163468
2012-06-22 10:09:09christian.heimessetmessages: + msg163390
2012-06-22 06:08:52serhiy.storchakasetmessages: + msg163385
2012-06-21 23:26:18christian.heimessetmessages: + msg163378
2012-06-21 22:40:34christian.heimessetfiles: + timingsafe_cmp.patch

messages: + msg163377
2012-06-21 21:45:30pitrousetmessages: + msg163371
2012-06-21 21:09:34serhiy.storchakasetmessages: + msg163368
2012-06-21 21:00:42loewissetmessages: + msg163366
2012-06-21 20:55:44serhiy.storchakasetmessages: + msg163365
2012-06-21 16:41:45pitrousetmessages: + msg163347
2012-06-21 16:24:39christian.heimessetfiles: + timingsafe.h

messages: + msg163343
2012-06-21 14:03:09fijallsetmessages: + msg163333
2012-06-21 12:14:50pitrousetmessages: + msg163329
2012-06-19 20:08:48georg.brandlsetpriority: release blocker -> normal
nosy: + georg.brandl
messages: + msg163204

2012-06-19 17:24:30christian.heimessetmessages: + msg163196
2012-06-19 17:12:58pitrousetmessages: + msg163193
2012-06-19 17:08:27christian.heimessetmessages: + msg163192
2012-06-19 16:14:38serhiy.storchakasetmessages: + msg163188
2012-06-19 16:07:35serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg163186
2012-06-19 14:41:11alexsetnosy: + alex
2012-06-19 14:33:15pitrousetmessages: + msg163170
2012-06-19 14:28:38christian.heimeslinkissue14955 dependencies
2012-06-19 14:27:53christian.heimessetmessages: + msg163168
stage: needs patch -> commit review
2012-06-19 13:27:19hyneksetmessages: + msg163163
2012-06-19 13:10:16christian.heimessetpriority: normal -> release blocker

messages: + msg163159
2012-06-16 03:16:29Jon.Oberheidesetmessages: + msg162950
2012-06-16 03:05:31Jon.Oberheidesetnosy: + Jon.Oberheide
messages: + msg162949
2012-06-15 15:48:13pitrousetmessages: + msg162914
2012-06-15 14:01:22ncoghlansetmessages: + msg162899
2012-06-15 12:53:15pitrousetmessages: + msg162895
2012-06-15 12:34:17christian.heimessetmessages: + msg162893
2012-06-15 12:21:38pitrousetmessages: + msg162892
2012-06-15 12:16:10ncoghlansetmessages: + msg162891
2012-06-15 11:14:48python-devsetnosy: + python-dev
messages: + msg162888
2012-06-15 10:52:36ncoghlansetmessages: + msg162885
2012-06-15 10:31:56pitrousetmessages: + msg162882
2012-06-15 10:00:20christian.heimessetmessages: + msg162880
2012-06-15 09:47:41arigosetnosy: - arigo
2012-06-15 09:11:01loewissetmessages: + msg162877
2012-06-15 08:42:18ncoghlansetmessages: + msg162875
2012-06-15 08:36:29petri.lehtinensetnosy: + petri.lehtinen
messages: + msg162873
2012-06-15 08:32:49ncoghlansetmessages: + msg162872
2012-06-15 08:18:50loewissetmessages: + msg162871
2012-06-15 08:13:49loewissetmessages: + msg162868
2012-06-15 08:12:39fijallsetmessages: + msg162867
2012-06-15 08:09:30loewissetmessages: + msg162866
2012-06-15 08:06:08loewissetmessages: + msg162865
2012-06-15 08:03:29loewissetmessages: + msg162864
2012-06-15 08:03:14ncoghlansetmessages: + msg162863
2012-06-15 07:59:43ncoghlansetmessages: + msg162862
2012-06-15 07:58:29fijallsetmessages: + msg162861
2012-06-15 07:55:29hyneksetmessages: + msg162860
2012-06-15 07:50:29fijallsetmessages: + msg162859
2012-06-15 07:49:06fijallsetmessages: + msg162858
2012-06-15 07:47:09loewissetmessages: + msg162857
2012-06-15 07:45:46loewissetmessages: + msg162856
2012-06-15 07:41:40ncoghlansetmessages: + msg162855
2012-06-15 07:38:48loewissetmessages: + msg162853
2012-06-15 07:28:41ncoghlansetmessages: + msg162852
2012-06-15 07:08:32hyneksetmessages: + msg162850
2012-06-15 06:40:29loewissetmessages: + msg162848
2012-06-15 06:37:43loewissetmessages: + msg162847
2012-06-15 06:31:59loewissetmessages: + msg162846
2012-06-15 06:19:44hyneksetmessages: + msg162845
2012-06-15 01:37:23ncoghlansetnosy: + ncoghlan
messages: + msg162838
2012-06-14 12:26:25pitrousetmessages: + msg162778
2012-06-14 12:16:47fijallsetmessages: + msg162777
2012-06-14 12:01:34christian.heimessetmessages: + msg162775
2012-06-14 11:17:10loewissetnosy: + loewis
messages: + msg162773
2012-06-14 10:27:30fijallsetmessages: + msg162770
2012-06-14 10:18:29pitrousetmessages: + msg162769
2012-06-14 10:15:37fijallsetmessages: + msg162768
2012-06-14 10:13:47fijallsetmessages: + msg162767
2012-06-14 10:10:19pitrousetmessages: + msg162766
title: hmac.secure_compare() leaks information about length of strings -> hmac.secure_compare() leaks information about length of strings
2012-06-14 10:02:47hyneksetmessages: + msg162765
2012-06-14 09:55:22christian.heimessetmessages: + msg162764
2012-06-14 09:31:25pitrousetnosy: + pitrou
messages: + msg162763
2012-06-14 09:22:40fijallsetmessages: + msg162762
2012-06-14 09:19:27christian.heimessetmessages: + msg162761
title: hmac.secure_compare() leaks information of length of strings -> hmac.secure_compare() leaks information about length of strings
2012-06-14 09:19:00arigosetnosy: + arigo
messages: + msg162760
2012-06-14 09:10:11hyneksettype: behavior -> security
components: + Library (Lib), - IO
versions: + Python 3.3, - Python 3.4
nosy: + hynek

messages: + msg162759
stage: patch review -> needs patch
2012-06-14 08:33:03fijallsetnosy: + fijall
messages: + msg162758
2012-06-13 23:00:23christian.heimescreate