classification
Title: Wrong documentation (Library) for unicode and str comparison
Type: behavior Stage:
Components: Documentation Versions: Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: RK-5wWm9h, docs@python, martin.panter, r.david.murray
Priority: normal Keywords:

Created on 2017-01-19 14:40 by RK-5wWm9h, last changed 2017-01-19 23:53 by r.david.murray.

Messages (5)
msg285792 - (view) Author: (RK-5wWm9h) Date: 2017-01-19 14:40
PROBLEM (IN BRIEF):
In the currently published 2.7.13 The Python Standard Library (Library Reference manual) section 5.6 "Sequence Types" (https://docs.python.org/2/library/stdtypes.html#sequence-types-str-unicode-list-tuple-bytearray-buffer-xrange):

    "to compare equal, ... the two sequences must be of the same type"

This an *incorrect (and misleading) statement*, for the unicode and str case.


PROPOSED FIX:

Current full paragraph:

    "Sequence types also support comparisons. In particular, tuples and lists are compared lexicographically by comparing corresponding elements. This means that to compare equal, every element must compare equal and the two sequences must be of the same type and have the same length. (For full details see Comparisons in the language reference.)"

Proposed replacement text:

    "Sequence types also support comparisons. In particular, tuples and lists are compared lexicographically by comparing corresponding elements. This means that to compare equal, every element must compare equal and the two sequences 
must be of the same type and have the same length. (Unicode and str are treated as the same type here; for full details see Comparisons in the language reference.)"


DETAILS, JUSTIFICATION, CORRECTNESS, ETC:

The current incorrect text is really misleading.

The behaviour that a str and a unicode object -- despite being objects of different types -- may compare equal, is explicitly stated in the 2.7.13 The Python Language Reference manual, section 5.9 "Comparisons" (https://docs.python.org/2/reference/expressions.html#comparisons):

    "* Strings are compared lexicographically using the numeric equivalents (the result of the built-in function ord()) of their characters. Unicode and 8-bit strings are fully interoperable in this behavior. [4]"

(Aside: Incidentally an earlier paragraph in the Language Ref fails to cover the unicode and str case; see separately filed bug Issue 29321.)
msg285815 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-01-19 17:28
Unicode and string *are* of the same type: basestring.  This is a specific example of the liskov substitution principle, so I don't think it should be called out explicitly in this section.
msg285816 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-01-19 17:30
As per your other issue, though, the real issue is that the two objects must be *comparable*, not that they be of the same type, and the language should probably be updated to reflect that.
msg285852 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2017-01-19 22:36
If you read the whole paragraph carefully, I don't think it is too misleading. "In particular, tuples and lists . . ." suggests the author was just trying to say that a tuple never compares equal to a list. Maybe we just need to make that more obvious?

However there are other problems in this part of the reference about comparing different types. See Issue 22000, about the earlier section on Comparisons of built-in types.
msg285862 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-01-19 23:53
That's a good point, I think that is exactly the issue with that paragraph.
History
Date User Action Args
2017-01-19 23:53:21r.david.murraysetmessages: + msg285862
2017-01-19 22:36:08martin.pantersetnosy: + martin.panter
messages: + msg285852
2017-01-19 17:30:53r.david.murraysetmessages: + msg285816
2017-01-19 17:28:06r.david.murraysetnosy: + r.david.murray
messages: + msg285815
2017-01-19 14:40:31RK-5wWm9hcreate