Message 27204 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	paul_g
Recipients
Date	2006-01-02.21:32:52
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to

Content
Logged In: YES user_id=1417712 i think there's a bit of confusion here as to what exactly the problem is. ansi c says that for files fopen()ed for reading and writing (ie r+, w+ etc), you must issue an fflush(), fseek(), fsetpos(), or rewind() between a read and a write. the exception to this is if the read last read EOF. the behaviour we are seeing using python file objects: with glibc: 1. read + write + read result in no data being returned by the last read. this is the case regardless of whether we do f.readlines()+f.writelines()+f.readlines() or f.read()+f.write()+f.read(). this does not comnform to expected behaviour (as per ansi c and glibc fopen(3)), because at least in the latter (read() with no size parameter) case, python docs promise to stop at EOF, triggering the exception ansi c/glibc make to the intervening synchronization with file positioning requirement. with msvscrt: 1. in the f.read()+f.write()+f.read() case, the f.write() generates an IOError. this deviates from ansi c, but is in line with msdn docs. 2. in the f.readlines()+f.writelines()+f.readlines() case, you see the type of results quoted in the bug submission. this deviates from ansi c if you expect readlines() to read EOF, but is still in line with msdn docs. there are 3 issues here: 1. if we give users a high level interface for file i/o, as we do by giving them a File object, should we expect them to research, be aware of and deal with the requirements imposed by the low level implementation used? if it is reasonable to require that when they use read() and write(), is it still reasonable to require it when they user readlines() and writelines()? 2. if we expect users to be aware of ansi c requirements for file stream usage and deal with them, is it reasonable to expect them to deal with the differences in libc implementations, including the differing requirements they impose and differing failure modes being seen? should we not attempt to present an ansi c compliant interface to them, performing workarounds as is necessary on a given platform (or libc make as is the case here)? we certainly do that in some cases (but not in this one) based on my brief reading of Objects/fileobject.c. 3. if we leave users to deal with this mess, should we not at least document this in some fashion? whether it be a detailed explanation or just a pointer to look at the appropriate docs, or even just a mention that they should be reading up on fopen(), since that is the underlying implemention behind file objects. is it reasonable to expect folks for whom python is their first language, as some folks seem to promote python, to figure all of this out when they haven't the foggiest about ansi c? to recap, the real issue, imo, seems to be that we shouldn't be exposing users to this, rather than the funky results of not doing this right. there are 4 options for dealing with this: 1. do nothing (what tim currently fabours, it appears) 2. document this to some extent 3. make this work the same across all libcs 4. perform the syncrhonization (fflush, fsetpos etc depending on libc) for the user, behind the scenes, if we see a write coming in and the previous op was a read. the latter option, from the perspective of "this is exactly what a high level interface should do for the user", makes the most sense to me. but then, maybe that's why i'm not a python core dev ;) cheers, -p

Logged In: YES 
user_id=1417712

i think there's a bit of confusion here as to what exactly
the problem is.

ansi c says that for files fopen()ed for reading and writing
(ie r+, w+ etc), you must issue an fflush(), fseek(),
fsetpos(), or rewind() between a read and a write. the
exception to this is if the read last read EOF.

the behaviour we are seeing using python file objects:

with glibc:
1. read + write + read result in no data being returned by
the last read. this is the case regardless of whether we do
f.readlines()+f.writelines()+f.readlines() or
f.read()+f.write()+f.read(). this does not comnform to
expected behaviour (as per ansi c and glibc fopen(3)),
because at least in the latter (read() with no size
parameter) case, python docs promise to stop at EOF,
triggering the exception ansi c/glibc make to the
intervening synchronization with file positioning requirement.

with msvscrt:
1. in the f.read()+f.write()+f.read() case, the f.write()
generates an IOError. this deviates from ansi c, but is in
line with msdn docs.
2. in the f.readlines()+f.writelines()+f.readlines() case,
you see the type of results quoted in the bug submission.
this deviates from ansi c if you expect readlines() to read
EOF, but is still in line with msdn docs.

there are 3 issues here:

1. if we give users a high level interface for file i/o, as
we do by giving them a File object, should we expect them to
 research, be aware of and deal with the requirements
imposed by the low level implementation used? if it is
reasonable to require that when they use read() and write(),
is it still reasonable to require it when they user
readlines() and writelines()?

2. if we expect users to be aware of ansi c requirements for
file stream usage and deal with them, is it reasonable to
expect them to deal with the differences in libc
implementations, including the differing requirements they
impose and differing failure modes being seen? should we not
attempt to present an ansi c compliant interface to them,
performing workarounds as is necessary on a given platform
(or libc make as is the case here)? we certainly do that in
some cases (but not in this one) based on my brief reading
of Objects/fileobject.c.

3. if we leave users to deal with this mess, should we not
at least document this in some fashion? whether it be a
detailed explanation or just a pointer to look at the
appropriate docs, or even just a mention that they should be
reading up on fopen(), since that is the underlying
implemention behind file objects. is it reasonable to expect
folks for whom python is their first language, as some folks
seem to promote python, to figure all of this out when they
haven't the foggiest about ansi c?


to recap, the real issue, imo, seems to be that we shouldn't
be exposing users to this, rather than the funky results of
not doing this right.

there are 4 options for dealing with this:

1. do nothing (what tim currently fabours, it appears)
2. document this to some extent
3. make this work the same across all libcs
4. perform the syncrhonization (fflush, fsetpos etc
depending on libc) for the user, behind the scenes, if we
see a write coming in and the previous op was a read.

the latter option, from the perspective of "this is exactly
what a high level interface should do for the user", makes
the most sense to me. but then, maybe that's why i'm not a
python core dev ;)

cheers,
-p

History
Date	User	Action	Args
2007-08-23 14:37:04	admin	link	issue1394612 messages
2007-08-23 14:37:04	admin	create