Message 167990 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ncoghlan
Recipients	Arfrever, christian.heimes, georg.brandl, loewis, mark.dickinson, meador.inge, ncoghlan, pitrou, python-dev, skrah, vstinner
Date	2012-08-11.18:58:05
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1344711487.74.0.819262981129.issue15573@psf.upfronthosting.co.za>
In-reply-to

Content
OK, I think I finally understand what Martin is getting at from a semantic point of view, and I think I can better explain the background of the issue and why Stefan's proposed solution is both necessary and correct. The ideal definition of equivalence for memory view objects would actually be: memoryview(x) == memoryview(y) if (and only if) memoryview(x).tolist() == memoryview(y).tolist() Now, in practice, this approach cannot be implemented, because there are too many format definitions (whether valid or invalid) that memoryview doesn't understand (and perhaps will never understand) and because it would be completely infeasible on large arrays with complex format definitions. Thus, we are forced to accept a constraint on memoryview's definition of equality: individual values are always compared via raw memory comparison, thus values stored using different sizes or layouts in memory will always compare as unequal, even if they would compare as equal in Python This is an acceptable constraint as, in practice, you don't perform mixed format arithmetic and it's not a problem if there's no automatic coercion between sizes and layouts. The Python 3.2 memoryview effectively uses memcmp() directly treating everything as a 1D array of bytes data, completely ignoring both shape and format data. Thus: >>> ab = array('b', [1, 2, 3]) >>> ai = array('i', [1, 2, 3]) >>> aL = array('L', [1, 2, 3]) >>> ab == ai True >>> ab == ai == aL True >>> memoryview(ab) == memoryview(ai) False >>> memoryview(ab) == memoryview(aL) False >>> memoryview(ai) == memoryview(aL) False This approach leads to some major false positives, such as a floating point value comparing equal to an integer that happens to share the same binary representation: >>> af = array('f', [1.1]) >>> ai = array('i', [1066192077]) >>> af == ai False >>> memoryview(af) == memoryview(ai) True The changes in 3.3 are aimed primarily at eliminating those false positives by taking into account the shape of the array and the format of the contained values. It is not about changing the fundamental constraint that memoryview operates at the level of raw memory, rather than Python objects, and thus cares about memory layout details that are irrelevant after passing through the Python abstraction layer. This contrasts with the more limited scope of the array.array module, which does take into account the Python level abstractions. Thus, there will always be a discrepancy between the two definitions of equality, as memoryview cares about memory layout details, where array.array does not. The problem at the moment is that Python 3.3 currently has spurious false negatives that aren't caused by that fundamental constraint that comparisons must occur based directly on memory contents. Instead, they're being caused by memoryview returning False for any equality comparison for a format it doesn't understand. That's unacceptable, and is what Stefan's patch is intended to fix.

OK, I think I finally understand what Martin is getting at from a semantic point of view, and I think I can better explain the background of the issue and why Stefan's proposed solution is both necessary and correct.

The ideal definition of equivalence for memory view objects would actually be:

memoryview(x) == memoryview(y)

if (and only if)

memoryview(x).tolist() == memoryview(y).tolist()

Now, in practice, this approach cannot be implemented, because there are too many format definitions (whether valid or invalid) that memoryview doesn't understand (and perhaps will never understand) and because it would be completely infeasible on large arrays with complex format definitions.

Thus, we are forced to accept a *constraint* on memoryview's definition of equality: individual values are always compared via raw memory comparison, thus values stored using different *sizes* or *layouts* in memory will always compare as unequal, even if they would compare as equal in Python

This is an *acceptable* constraint as, in practice, you don't perform mixed format arithmetic and it's not a problem if there's no automatic coercion between sizes and layouts.

The Python 3.2 memoryview effectively uses memcmp() directly treating everything as a 1D array of bytes data, completely ignoring both shape *and* format data. Thus:

>>> ab = array('b', [1, 2, 3])
>>> ai = array('i', [1, 2, 3])
>>> aL = array('L', [1, 2, 3])
>>> ab == ai
True
>>> ab == ai == aL
True
>>> memoryview(ab) == memoryview(ai)
False
>>> memoryview(ab) == memoryview(aL)
False
>>> memoryview(ai) == memoryview(aL)
False

This approach leads to some major false positives, such as a floating point value comparing equal to an integer that happens to share the same binary representation:

>>> af = array('f', [1.1])
>>> ai = array('i', [1066192077])
>>> af == ai
False
>>> memoryview(af) == memoryview(ai)
True

The changes in 3.3 are aimed primarily at *eliminating those false positives* by taking into account the shape of the array and the format of the contained values. It is *not* about changing the fundamental constraint that memoryview operates at the level of raw memory, rather than Python objects, and thus cares about memory layout details that are irrelevant after passing through the Python abstraction layer.

This contrasts with the more limited scope of the array.array module, which *does* take into account the Python level abstractions. Thus, there will always be a discrepancy between the two definitions of equality, as memoryview cares about memory layout details, where array.array does not.

The problem at the moment is that Python 3.3 currently has *spurious* false negatives that aren't caused by that fundamental constraint that comparisons must occur based directly on memory contents. Instead, they're being caused by memoryview returning False for any equality comparison for a format it doesn't understand. That's unacceptable, and is what Stefan's patch is intended to fix.

History
Date	User	Action	Args
2012-08-11 18:58:07	ncoghlan	set	recipients: + ncoghlan, loewis, georg.brandl, mark.dickinson, pitrou, vstinner, christian.heimes, Arfrever, skrah, meador.inge, python-dev
2012-08-11 18:58:07	ncoghlan	set	messageid: <1344711487.74.0.819262981129.issue15573@psf.upfronthosting.co.za>
2012-08-11 18:58:07	ncoghlan	link	issue15573 messages
2012-08-11 18:58:05	ncoghlan	create