Author Santiago.Romero
Recipients Santiago.Romero, belopolsky, benjamin.peterson, cgwalters, cvrebert, dexen, doughellmann, eric.araujo, fperez, loewis, mark.dickinson, mcepl, nwerneck, r.david.murray, rhettinger, vstinner
Date 2011-07-13.07:51:47
SpamBayes Score 0.00329165
Marked as misclassified No
Message-id <1310543508.39.0.0494174695906.issue1170@psf.upfronthosting.co.za>
In-reply-to
Content
I think I'm suffering the same problem in some small programs that use shlex:


>>> import shlex

>>> text = "python and shlex"
>>> shlex.split(text)
['python', 'and', 'shlex']

>>> text = u"python and shlex"
>>> shlex.split(text)
['p\x00\x00\x00y\x00\x00\x00t\x00\x00\x00h\x00\x00\x00o\x00\x00\x00n\x00\x00\x00', '\x00\x00\x00a\x00\x00\x00n\x00\x00\x00d\x00\x00\x00', '\x00\x00\x00s\x00\x00\x00h\x00\x00\x00l\x00\x00\x00e\x00\x00\x00x\x00\x00\x00']


 I'm currently using the following "basic" workaround (while assuming that my strings have only ascii chars):

>>> [ x.replace("\0", "") for x in shlex.split(text) ]
['python', 'and', 'shlex']

 It would be very nice if shlex could work with unicode strings ...

 Thanks.
History
Date User Action Args
2011-07-13 07:51:48Santiago.Romerosetrecipients: + Santiago.Romero, loewis, rhettinger, mark.dickinson, belopolsky, vstinner, dexen, benjamin.peterson, cgwalters, mcepl, eric.araujo, doughellmann, r.david.murray, nwerneck, cvrebert, fperez
2011-07-13 07:51:48Santiago.Romerosetmessageid: <1310543508.39.0.0494174695906.issue1170@psf.upfronthosting.co.za>
2011-07-13 07:51:47Santiago.Romerolinkissue1170 messages
2011-07-13 07:51:47Santiago.Romerocreate