This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ncoghlan
Recipients christian.heimes, fijall, hynek, loewis, ncoghlan, petri.lehtinen, pitrou, python-dev
Date 2012-06-15.14:01:21
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1339768883.29.0.42666196787.issue15061@psf.upfronthosting.co.za>
In-reply-to
Content
(Ah, the dangers of using a real text editor for edit fields. This got rather long, but I think it's all still relevant)

I'm persuaded that a C implementation is a good idea in the long run. However, I *don't* think we should rush the design of it. It doesn't seem right to me to be adding a new function to the operator module this close to feature freeze (assuming that's even the right place to add this functionality), so I'd like to close this issue on the basis of my last round of updates, and create a new one aimed at 3.4 to provide a lower level time independent bytes comparison primitive.

I have a couple of major reasons for feeling this way:

Firstly, the various Python web servers and frameworks have achieved a tolerable level of security through pure Python bytes comparisons that simply avoid short circuiting (sufficent to avoid them getting CVEs raised against them, anyway)

Secondly, it seems to me that the proposed lower level feature may make more sense as a bytes method rather than as a function in the operator module.

Consider the proposed properties for the lower level security primitive:

1. Handle comparison of bytes objects via C level indexing and xor operations

I agree on the value of this part - it makes our implementation easy to audit, and explicitly puts the burden on other implementations to support this as a security primitive. Either approach (bytes method or operator function) would achieve this goal.

2. Only leak length information for one of the supplied arguments under timing analysis.

Initially I was thinking a function based implementation of this would be confusing, as it would be quite hard to remember which parameter controlled the perceived length. However, I realised it doesn't actually matter - if controlled by the expected value, then the comparison time appears independent of the input length, and doesn't leak clear timing information on the expected length. If controlled by the input value, then it varies with the input length and still doesn't leak clear information regarding the length of the expected value.

However, we can exploit the asymmetry of a method call signature to make it clear that in expected.compare(provided), the time taken for the comparison is dictated by the length of "expected" (or vice-versa - whichever way gets chosen, the asymmetry should help make it easier to remember).

In either case, hmac.compare_digest would remain as a simpler "argument order doesn't matter, but the expected length is assumed to be public" API (which, given real world use cases, is actually a reasonable assumption).

3. Handles Unicode objects of the same kind via C level indexing and xor operations

I'm still not seeing the point of this. To avoid timing attacks based on encoding your best bet is to do any encoding steps *before* calculating and comparing any signatures (that way the encoding duration is certain to be independent of any secrets involved in the signature verification). The reason the embedded unicode support was problematic was because both the supplied *and* the expected value were being converted to integers during the comparison, and a timing analysis could potentially exploit that to retrieve information about the expected value.

The timing vulnerability comes from this sequence:

1. Receive Unicode message, with signature
2. In Unicode, derive expected signature from message
3. Encode expected signature <-- Timing vulnerability here
4. Encode received signature
5. Compare the derived and expected signature

The original implementation had this problem, because the repeated calls to ord() were roughly equivalent to encoding the string as UTF-32, only with even bigger timing variations (due to the integer caching behaviour)

Compare that with the following sequence, which avoids the problem:

1. Receive Unicode message, with signature
2. Encode the received signature to bytes
3. Encode the message to bytes <-- Cannot expose signature related secrets
4. Derive the expected signature from the encoded message
5. Compare the derived and expected signature

Given that crypto primitives are designed to operate on byte sequences *anyway*, the latter approach is likely to be the only one that works regardless of Unicode support (or lack thereof).

Really, I think the Unicode support in the initial version of the function was just a mistake, pure and simple - the use of ord() during the iteration is a 2.x backwards compatibility trick for 8-bit strings that isn't applicable in the 3.x standard library.

You can see this in the equivalent code in the tornado web server, where the ord() based implementation is invoked to handle Python 2 byte strings, not Unicode text:

https://github.com/facebook/tornado/blob/master/tornado/web.py#L2004

4. The operator module is about protocols, this is not a protocol

Unlike the other operations in operator, this operation care more about consistency of timing, and the only way of achieving that is through strict type limitations. That's more appropriately handled through a new method on the affected types than it is through a new operator function.
History
Date User Action Args
2012-06-15 14:01:23ncoghlansetrecipients: + ncoghlan, loewis, pitrou, christian.heimes, fijall, python-dev, petri.lehtinen, hynek
2012-06-15 14:01:23ncoghlansetmessageid: <1339768883.29.0.42666196787.issue15061@psf.upfronthosting.co.za>
2012-06-15 14:01:22ncoghlanlinkissue15061 messages
2012-06-15 14:01:21ncoghlancreate