Title: shlex (or perhaps cStringIO) and unicode strings
Messages (9)
msg29709 - (view) Author: Erwin S. Andreasen (drylock) Date: 2006-08-29 21:16
Python 2.5c1 (r25c1:51305, Aug 19 2006, 18:23:29) 
[GCC 4.1.2 20060814 (prerelease) (Debian 4.1.1-11)] on

(Also seen in 2.4)

shlex.split do not like unicode strings:

>>> shlex.split(u"foo")

The shlex code IMO suggests that it should accept
unicode (as it checks for argument being an instance of

Digging slightly into this, this seems to be a
difference between StringIO and cStringIO. While
cStringIO claims it accepts unicode as long as it
encode too ASCII it gives invalid results:

>>> sys.getdefaultencoding()

>>> cStringIO.StringIO(u'foo').getvalue()

Perhaps cStringIO should .encode to ASCII encoding
before consuming the input, as I can't imagine anyone
cares about the above result (which I guess are the
UCS-2 or UCS-4 characters).

msg29710 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2006-10-12 09:47
Logged In: YES 

Thanks for your report, this is now fixed in rev. 52301,
52302 (2.5).
msg146126 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-10-21 20:12
Still happens on latest 2.7:

>>> from cStringIO import StringIO
>>> sio = StringIO(u"abc")
>>> sio.getvalue()
msg146128 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-10-21 20:22
And unsurprisingly so, since the fix was reverted in r56830 by Georg.
msg146132 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-10-21 20:35
Georg, is this patch ok to you?
msg146162 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2011-10-22 10:17
If you think it's fine to change this behavior, then yes :)
msg146184 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-10-22 19:31
New changeset 27ae7d4e1983 by Antoine Pitrou in branch '2.7':
Issue #1548891: The cStringIO.StringIO() constructor now encodes unicode
msg146185 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-10-22 19:32
> If you think it's fine to change this behavior, then yes :)

Well, the "documented" behaviour makes no sense.
Either it should raise TypeError or convert. Since write() converts, it's logical for the constructor to do so as well.
msg146217 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-10-23 02:38
New changeset 0b39f2486314 by Éric Araujo in branch '2.7':
Note that the #1548891 fix indirectly fixes shlex (#6988, #1170)
