Message106778
I am not sure I can follow you. I will try to be more specific.
The test string consists originally of one character; the Czech Š.
1. On Linux with Python 2.6.4
1.1 If I keep the original code line order:
label = obj.get()
print type(label), repr(label)
label = " ".join(label.split())
print type(label), repr(label)
label = unicode(label)
if len(label) > 40:
label = label[:40] + "..."
Both lines print type(label), repr(label) gives:
<type 'str'> '\xc5\xa0'
1.2 If I change order and take the unicode conversion first:
label = obj.get()
label = unicode(label)
print type(label), repr(label)
label = " ".join(label.split())
print type(label), repr(label)
if len(label) > 40:
label = label[:40] + "..."
Both lines print type(label), repr(label) gives:
<type 'unicode'> u'\u0160'
2. On Windows with Python 2.6.5
2.1 The original code line order:
The lines print type(label), repr(label) gives
<type 'str'> '\xc5\xa0'
<type 'str'> '\xc5'
8217: ERROR: gramps.py: line 138: Unhandled exception
....
2.2 If I change order and take the unicode conversion first:
Both lines print type(label), repr(label) gives:
<type 'unicode'> u'\u0160'
3.
If I use this little code:
# -*- coding: utf-8 -*-
label = 'Š'
print type(label), repr(label)
label = " ".join(label.split())
print type(label), repr(label)
I get
<type 'str'> '\xc5\xa0'
<type 'str'> '\xc5\xa0'
on both Linux and Windows.
The examples above under 1. and 2. comes from an application, Gramps.
There is still something I don't understand. |
|
Date |
User |
Action |
Args |
2010-05-30 20:03:49 | PeterL | set | recipients:
+ PeterL, pitrou, ezio.melotti |
2010-05-30 20:03:48 | PeterL | set | messageid: <1275249828.96.0.479501843487.issue8859@psf.upfronthosting.co.za> |
2010-05-30 20:03:47 | PeterL | link | issue8859 messages |
2010-05-30 20:03:47 | PeterL | create | |
|