Author drylock
Recipients
Date 2006-08-29.21:16:22
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
Python 2.5c1 (r25c1:51305, Aug 19 2006, 18:23:29) 
[GCC 4.1.2 20060814 (prerelease) (Debian 4.1.1-11)] on
linux2

(Also seen in 2.4)

shlex.split do not like unicode strings:

>>> shlex.split(u"foo")
['f\x00\x00\x00o\x00\x00\x00o\x00\x00\x00']

The shlex code IMO suggests that it should accept
unicode (as it checks for argument being an instance of
basestring).

Digging slightly into this, this seems to be a
difference between StringIO and cStringIO. While
cStringIO claims it accepts unicode as long as it
encode too ASCII it gives invalid results:

>>> sys.getdefaultencoding()
'ascii'


>>> cStringIO.StringIO(u'foo').getvalue()
'f\x00\x00\x00o\x00\x00\x00o\x00\x00\x00'

Perhaps cStringIO should .encode to ASCII encoding
before consuming the input, as I can't imagine anyone
cares about the above result (which I guess are the
UCS-2 or UCS-4 characters).

History
Date User Action Args
2007-08-23 14:42:26adminlinkissue1548891 messages
2007-08-23 14:42:26admincreate