classification
Title: cStringIO should provide a binary option
Type: enhancement Stage:
Components: None Versions:
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: georg.brandl, gvanrossum, lemburg
Priority: normal Keywords:

Created on 2002-04-23 12:52 by gvanrossum, last changed 2007-08-23 20:30 by georg.brandl. This issue is now closed.

Messages (9)
msg53535 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2002-04-23 12:52
The last few comments added to bug 216388 indicate a
new problem in cStringIO. Rather than abusing that bug
report, I'm opening a new one here. The problem is that
cStringIO now accepts Unicode strings to write(), but
when you use this, getvalue() returns binary garbage.
The cause is apparently MAL's checkin for cStringIO
2.30, which enabled read buffers.
msg53536 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2002-04-23 12:59
Logged In: YES 
user_id=6380

I wonder if perhaps the fix is as simple as using "t#"
instead of "s#" in the PyArg_... format string in P_write().
That accepts Unicode strings as args to write() only when
they are ASCII (actually, it uses the default encoding).

Marc-Andre, can you explain the reason for the change in the
first place (other than fixing a dubious dependency on
PyString_GetSize() raising an exception for a non-string
object)?
msg53537 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2002-04-26 21:08
Logged In: YES 
user_id=6380

Should I just check this in? It looks pretty safe to me...
msg53538 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2002-04-27 15:02
Logged In: YES 
user_id=38388

The idea to rip out the old string only approach was to make
cStringIO more compatible to the file object implementation.

Rather than switching from s# to t#, the cStringIO object
should maintain a binary switch just like the file
object does and then use s# for pseudo files opened
in binary mode (default) and t# for text mode ones.

Note that in any case, Unicode should be explicitly
encoded before writing it to a file. 

Simply switching to t# would cause compatibility 
problems, since a different buffer API would be used
for all input objects.

msg53539 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2002-04-27 15:13
Logged In: YES 
user_id=38388

Another note: the bug title is wrong: cStringIO doesn't
mangle Unicode, it just returns the raw binary data. Not
that this is of much use, but it's in sync with what the
file object does.
msg53540 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2002-04-28 00:02
Logged In: YES 
user_id=6380

I think that adding a binary mode to cStringIO is okay, but
the default should be text, and until we have the binary
mode option, the format should be t#.

Another solution would be to let cStringIO act more like
StringIO; after all that was its original intention. But
since that would require a major overhaul, I'm not seriously
proposing that.
msg53541 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2002-05-29 10:36
Logged In: YES 
user_id=38388

Guido already fixed this in CVS, so I'll turn 
the bug into a feature request:

cStringIO should provide a way to "open" a
file in binary mode.
msg55193 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2007-08-23 19:33
Unassigning: I've never had a need for this in the past years.
msg55203 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2007-08-23 20:30
I think this can be closed, cStringIO won't change and Py3k won't have
StringIO unicode problems anyway.
History
Date User Action Args
2007-08-23 20:30:21georg.brandlsetstatus: open -> closed
resolution: wont fix
messages: + msg55203
nosy: + georg.brandl
2007-08-23 19:33:20lemburgsetassignee: lemburg ->
messages: + msg55193
2002-04-23 12:52:36gvanrossumcreate