Message 106778 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	PeterL
Recipients	PeterL, ezio.melotti, pitrou
Date	2010-05-30.20:03:47
SpamBayes Score	0.00011983159
Marked as misclassified	No
Message-id	<1275249828.96.0.479501843487.issue8859@psf.upfronthosting.co.za>
In-reply-to

Content
I am not sure I can follow you. I will try to be more specific. The test string consists originally of one character; the Czech Š. 1. On Linux with Python 2.6.4 1.1 If I keep the original code line order: label = obj.get() print type(label), repr(label) label = " ".join(label.split()) print type(label), repr(label) label = unicode(label) if len(label) > 40: label = label[:40] + "..." Both lines print type(label), repr(label) gives: <type 'str'> '\xc5\xa0' 1.2 If I change order and take the unicode conversion first: label = obj.get() label = unicode(label) print type(label), repr(label) label = " ".join(label.split()) print type(label), repr(label) if len(label) > 40: label = label[:40] + "..." Both lines print type(label), repr(label) gives: <type 'unicode'> u'\u0160' 2. On Windows with Python 2.6.5 2.1 The original code line order: The lines print type(label), repr(label) gives <type 'str'> '\xc5\xa0' <type 'str'> '\xc5' 8217: ERROR: gramps.py: line 138: Unhandled exception .... 2.2 If I change order and take the unicode conversion first: Both lines print type(label), repr(label) gives: <type 'unicode'> u'\u0160' 3. If I use this little code: # -- coding: utf-8 -- label = 'Š' print type(label), repr(label) label = " ".join(label.split()) print type(label), repr(label) I get <type 'str'> '\xc5\xa0' <type 'str'> '\xc5\xa0' on both Linux and Windows. The examples above under 1. and 2. comes from an application, Gramps. There is still something I don't understand.

I am not sure I can follow you. I will try to be more specific.

The test string consists originally of one character; the Czech Š.

1. On Linux with Python 2.6.4
1.1 If I keep the original code line order:
label = obj.get()
print type(label), repr(label)
label = " ".join(label.split())
print type(label), repr(label)
label = unicode(label)
if len(label) > 40:
    label = label[:40] + "..."

Both lines print type(label), repr(label) gives:
<type 'str'> '\xc5\xa0'

1.2 If I change order and take the unicode conversion first:
label = obj.get()
label = unicode(label)
print type(label), repr(label)
label = " ".join(label.split())
print type(label), repr(label)
if len(label) > 40:
    label = label[:40] + "..."

Both lines print type(label), repr(label) gives:
<type 'unicode'> u'\u0160'

2. On Windows with Python 2.6.5
2.1 The original code line order:
The lines print type(label), repr(label) gives
<type 'str'> '\xc5\xa0'
<type 'str'> '\xc5'
 8217: ERROR: gramps.py: line 138: Unhandled exception
 ....

2.2 If I change order and take the unicode conversion first:
Both lines print type(label), repr(label) gives:
<type 'unicode'> u'\u0160'

3.
If I use this little code:
# -*- coding: utf-8 -*-
label = 'Š'
print type(label), repr(label)
label = " ".join(label.split())
print type(label), repr(label)
I get 
<type 'str'> '\xc5\xa0'
<type 'str'> '\xc5\xa0'
on both Linux and Windows.

The examples above under 1. and 2. comes from an application, Gramps.

There is still something I don't understand.

History
Date	User	Action	Args
2010-05-30 20:03:49	PeterL	set	recipients: + PeterL, pitrou, ezio.melotti
2010-05-30 20:03:48	PeterL	set	messageid: <1275249828.96.0.479501843487.issue8859@psf.upfronthosting.co.za>
2010-05-30 20:03:47	PeterL	link	issue8859 messages
2010-05-30 20:03:47	PeterL	create