classification
Title: file.encoding support for file.write and file.writelines
Type: Stage:
Components: Interpreter Core Versions: Python 2.5
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: georg.brandl, hyeshik.chang, lemburg, prikryl, quiver
Priority: normal Keywords: patch

Created on 2005-06-04 17:45 by georg.brandl, last changed 2005-08-08 06:49 by georg.brandl. This issue is now closed.

Files
File name Uploaded Description Edit
fileobject-unicodewrite.diff hyeshik.chang, 2005-06-05 02:26 fixed styles, added test
fileobject-unicodewrite-4.diff georg.brandl, 2005-06-05 12:17
fileobject-unicodewrite-5.diff georg.brandl, 2005-06-05 16:31
Messages (12)
msg48428 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2005-06-04 17:45
Here is a patch that allows Unicode strings written to
a file being automatically encoded. It enables Python
code to set file.encoding and obeys this encoding when
writing Unicode strings with write() or writelines().

It is my first core hackery, so forgive me one leaked
ref or the other. I hope I got the error handling
right; it is kind of confusing...

(btw: Bug #967986 will be fixed with this)
msg48429 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2005-06-05 02:26
Logged In: YES 
user_id=55188

The idea looks good to me.
I attached a revised patch fixed code style, C99-style local
variable declaration and added a regrtest.
I think some documentation update will be needed also.
msg48430 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2005-06-05 07:56
Logged In: YES 
user_id=1188172

Third revision; adds new documentation and allows Python
code to set the encoding to Py_None.
msg48431 - (view) Author: George Yoshida (quiver) (Python committer) Date: 2005-06-05 12:09
Logged In: YES 
user_id=671362

Reinhold, libstdtypes.tex needs two fixes.

 \versionadded{2.3}
+\versionchanged[The encoding attribute is now writable and 
is used
+for encoding Unicode strings given to \method{write()} and 
+\method{writelines()}.]{
                      ~~~
First, versionchanged tag does not have a trailing brace and it 
resuls in compile error.

Second(really trivial), versionchanged macro automatically 
appends a period at the end of the sentence(see the link [*]), 
so you don't need to put it by hand.

Then the above line would become:

+\method{writelines()}]{2.5}

[*]: http://docs.python.org/doc/inline-markup.html
msg48432 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2005-06-05 12:17
Logged In: YES 
user_id=1188172

Thanks! Corrected in patch #4.
msg48433 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2005-06-05 15:20
Logged In: YES 
user_id=55188

Yet another thing to fix:

You can't put local namespace declarations after
non-declaration statements.  Because Python uses C89 as a C
source code standard, you should all declarations in the top
of functions only.
msg48434 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2005-06-05 16:31
Logged In: YES 
user_id=1188172

Okay, put on #5.
msg48435 - (view) Author: Petr Prikryl (prikryl) Date: 2005-07-12 09:59
Logged In: YES 
user_id=771873

The title and the comments do not say so, but the patch was 
created by Reinhold Birkenfeld to solve the bug 

[ 1099364 ] raw_input() displays wrong unicode prompt

As the bug was closed and Reinhold claims to be his "first 
core hackery", I'd like to ask someone else to revise, whether 
the patch is the correct solution to the reported bug. The bug 
seems to be very visible (hence serious) in non-English 
speaking countries where Unicode promisses to solve many 
problems. Because of that I ask whether the bug should be 
closed before accepting the patch. I am adding this text also 
to link this patch to the original problem.

Thanks, Petr
msg48436 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2005-07-14 07:52
Logged In: YES 
user_id=38388

This doesn't quite work (yet): you've broken the support for
writing binary data to the file via file.write(). Encodings
should only be used for non-binary files.

Also note that you are not freeing the memory allocated by
the "et#" parser for s.

Please add some test cases where you open a binary file and
write:
a) binary strings 
b) contents of a buffer object
c) Unicode objects 
to it.

Case c) should raise an exception. a) and b) should result
in the data being written as-is - without doing any recoding.
msg48437 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2005-07-14 08:19
Logged In: YES 
user_id=38388

I've thought about this some more: I'm not sure whether it
is such a good idea to try to move code from the codecs into
the standard file object - after all, the codecs already
support all this and do a much better job at handling error
cases and the like.

Furthermore, codecs support both directions: reading and
writing. Your patch only does one way.

The encoding support you currently find in the file object
is only needed for printing Unicode objects - it is not used
anywhere else.
msg48438 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2005-07-14 08:34
Logged In: YES 
user_id=1188172

I agree with you that writing Unicode objects to a binary
file should raise an exception, but with the 'et#' format
string, 8-bit string objects should pass through file.write
unrecoded.

About your second comment: Yes, codecs is one way to do it,
but then I think that the encoding handling for print should
be ripped out, too. After all, that's what many people
complain about: "print unistr" works, while
"sys.stdout.write(unistr)" does not. As the comment below
about bug 1099364 shows, this shows up in various locations.

If this is rejected, file.write() shouldn't accept Unicode
anymore, and print should behave the same way.
msg48439 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2005-08-08 06:49
Logged In: YES 
user_id=1188172

Rejecting. This is incomplete and will be addressed more
properly in Py3k.
History
Date User Action Args
2005-06-04 17:45:09birkenfeldcreate