classification
Title: cStringIO and unicode
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 2.7, Python 2.6
process
Status: closed Resolution: duplicate
Dependencies: Superseder: shlex (or perhaps cStringIO) and unicode strings
View: 1548891
Assigned To: Nosy List: flox, georg.brandl, pitrou, vdupras
Priority: normal Keywords: patch

Created on 2008-03-18 13:27 by vdupras, last changed 2012-04-27 11:47 by pitrou. This issue is now closed.

Files
File name Uploaded Description Edit
cStringIO_unicode_test.diff vdupras, 2008-03-18 13:27 test_StringIO.py patch
Messages (4)
msg63911 - (view) Author: Virgil Dupras (vdupras) (Python triager) Date: 2008-03-18 13:27
hsoft-dev:python hsoft$ python
Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04) 
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from cStringIO import StringIO
>>> StringIO(u'foo').read()
'foo'
>>> 
hsoft-dev:python hsoft$ ./python.exe 
Python 2.6a1+ (trunk:61515, Mar 18 2008, 13:38:47) 
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from cStringIO import StringIO
>>> StringIO(u'foo').read()
'f\x00o\x00o\x00'
>>> 

The documentation says:

Unlike the memory files implemented by the StringIO module, those provided by 
this module are not able to accept Unicode strings that cannot be encoded as 
plain ASCII strings.

Attached a patch to test_StringIO.
msg63945 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2008-03-18 17:21
The 2.5.1 "fix" was determined to be too backwards-incompatible and
since rolled back. The trunk behavior is "correct". Closing as rejected.
msg159443 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2012-04-27 07:28
It seems the documentation is not enough accurate.

"Unlike the StringIO module, this module is not able to accept Unicode strings that cannot be encoded as plain ASCII strings."

I understand that u'foo' can be encoded as plan ASCII, however it does not behave correctly with cStringIO.


Python 2.7.3 (default, Apr 14 2012, 01:49:35) 
>>> from cStringIO import StringIO
>>> StringIO(u'foo').read()
'f\x00o\x00o\x00'
>>>
msg159450 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-27 11:47
This was fixed in 2.7.3 actually (27ae7d4e1983):


Python 2.7.3+ (2.7:8b8b580e3fd3, Apr 25 2012, 17:24:51) 
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from cStringIO import StringIO
>>> StringIO(u'foo').read()
'foo'
History
Date User Action Args
2012-04-27 11:47:08pitrousetstatus: open -> closed

superseder: shlex (or perhaps cStringIO) and unicode strings

nosy: + pitrou
messages: + msg159450
resolution: duplicate
stage: resolved
2012-04-27 07:28:23floxsetstatus: closed -> open
versions: + Python 2.7
nosy: + flox

messages: + msg159443

resolution: rejected -> (no value)
2008-03-18 17:21:21georg.brandlsetstatus: open -> closed
resolution: rejected
messages: + msg63945
nosy: + georg.brandl
2008-03-18 13:27:58vduprascreate