This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: unicode and string compare should not cause an exception
Type: behavior Stage:
Components: Interpreter Core Versions: Python 2.6
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: aaron_watters, gvanrossum, lemburg
Priority: normal Keywords:

Created on 2008-02-01 21:18 by aaron_watters, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (5)
msg61976 - (view) Author: Aaron Watters (aaron_watters) Date: 2008-02-01 21:18
As I understand it comparisons between two objects should
always work.  I get this at the interpreter prompt:

Python 2.6a0 (trunk, Jan 11 2008, 11:40:59) 
[GCC 3.4.6 20060404 (Red Hat 3.4.6-8)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> unichr(0xffff) < chr(128)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0:
ordinal not in range(128)
>>> 

I think the fix for this case is to do something
arbitrary but consistent if possible?
msg61979 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2008-02-01 21:38
The change we did was for == and != comparisons to always work (they now
"raise" warnings) - mostly because doing otherwise resulted in strange
exceptions when dealing with dictionary lookups. 

However, this was not done for comparisons <, <=, >=, > since these test
for ordering and it's not at all clear what the default outcome should be.

>>> u'abc' == 'äöü'
UnicodeWarning: Unicode equal comparison failed to convert both
arguments to Unicode - interpreting them as being unequal
False

>>> u'abc' < 'äöü'
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0:
ordinal not in range(128)

>>> 1 < 1j
TypeError: no ordering relation is defined for complex numbers
msg61983 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-02-01 23:47
> As I understand it comparisons between two objects should
> always work.

Hi Aaron!  Glad to see you're back.

It used to be that way when you & Jim wrote the first Python book. :-)

Nowadays, comparisons *can* raise exceptions.  Marc-Andre has explained
why.  In 3.0, this particular issue will go away due to a different
treatment of Unicode, but many more cases will raise TypeError when < is
used.  == and != will generally work, though there are no absolute
guarantees.
msg62002 - (view) Author: Aaron Watters (aaron_watters) Date: 2008-02-02 15:00
Okay.  I haven't looked but this should be well documented
somewhere because I found it very surprising (it crashed a large
run somewhere in the middle).

In the case of strings versus unicode I think it is possible
to hack around this by catching the exceptional case and
comparing character by character -- treating out of band
characters as larger than all unicode characters.  I don't
see why this would cause any problems at any rate.

   -- Aaron Watters

On Feb 1, 2008 6:47 PM, Guido van Rossum <report@bugs.python.org> wrote:

>
> Guido van Rossum added the comment:
>
> > As I understand it comparisons between two objects should
> > always work.
>
> Hi Aaron!  Glad to see you're back.
>
> It used to be that way when you & Jim wrote the first Python book. :-)
>
> Nowadays, comparisons *can* raise exceptions.  Marc-Andre has explained
> why.  In 3.0, this particular issue will go away due to a different
> treatment of Unicode, but many more cases will raise TypeError when < is
> used.  == and != will generally work, though there are no absolute
> guarantees.
>
> ----------
> nosy: +gvanrossum
> resolution:  -> rejected
> status: open -> closed
>
> __________________________________
> Tracker <report@bugs.python.org>
> <http://bugs.python.org/issue1997>
> __________________________________
>
msg62004 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-02-02 16:10
You should be grateful. :-)

The error points out a bug in your program: you're mixing encoded and
unencoded text.
History
Date User Action Args
2022-04-11 14:56:30adminsetgithub: 46281
2008-02-02 16:10:46gvanrossumsetmessages: + msg62004
2008-02-02 16:09:24gvanrossumsetfiles: - unnamed
2008-02-02 15:00:53aaron_watterssetfiles: + unnamed
messages: + msg62002
2008-02-01 23:47:19gvanrossumsetstatus: open -> closed
resolution: rejected
messages: + msg61983
nosy: + gvanrossum
2008-02-01 21:38:12lemburgsetnosy: + lemburg
messages: + msg61979
2008-02-01 21:18:44aaron_watterscreate