This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Some IO related problems on x86 windows
Type: Stage:
Components: IO Versions:
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: benjamin.peterson, gsingh, hynek, pitrou, stutzbach
Priority: normal Keywords:

Created on 2013-03-16 16:33 by gsingh, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (14)
msg184330 - (view) Author: Gurmeet Singh (gsingh) Date: 2013-03-16 16:33
1. The read mode is not the default mode as mentioned in the docs.python.org. 
In particular see the first Traceback below - "b" does not work (as it does in C though) and 
you are forced to use "rb" instead.

2. io.BufferedReader does not implement read1 (the last lines of trace below)

3. io.FileIO does not implements single OS system call on read() - instead reads a file until EOF i.e. ignores the arguments supplied to read() - larger arguments are slower to execute (see the read calls in the trace below). 

_________________

>>> import io
>>> fl = io.FileIO('c:/temp9/Capability/Analyzing Data.mp4', 'r')
>>> byt = fl.read()
>>> len(byt)
70934549
>>> fl.close()
>>> fl = io.FileIO('c:/temp9/Capability/Analyzing Data.mp4', 'r')
>>> byt = fl.read(256)
>>> len(byt)
256
>>> byt = fl.read(512)
>>> len(byt)
512
>>> byt = fl.read(1024)
>>> len(byt)
1024
>>> byt = fl.read(4096)
>>> len(byt)
4096
>>> byt = fl.read(10240)
>>> len(byt)
10240
>>> len(fl.read(40960))
40960
>>> len(fl.read(102400))
102400
>>> len(fl.read(1048576))
1048576
>>> fl.close()
>>> fl = io.FileIO('c:/temp9/Capability/Analyzing Data.mp4', 'r')
>>> len(fl.read(70934549))
70934549
>>> len(fl.read(70934549))
0
>>> fl.close()
>>> fl = io.FileIO('c:/temp9/Capability/Analyzing Data.mp4', 'r')
>>> b = bytearray(70934549)
>>> fl.readinto(b)
70934549
>>> fl.close()
>>> fl = open ('c:/temp9/Capability/Analyzing Data.mp4', 'b', buffering = 0)
Traceback (most recent call last):
  File "<pyshell#31>", line 1, in <module>
    fl = open ('c:/temp9/Capability/Analyzing Data.mp4', 'b', buffering = 0)
ValueError: Must have exactly one of create/read/write/append mode and at most one plus
>>> fl = open ('c:/temp9/Capability/Analyzing Data.mp4', 'rb', buffering = 0)
>>> type(fl)
<class '_io.FileIO'>
>>> cfl = io.FileIO('c:/temp9/Capability/Analyzing Data.mp4', 'r')
>>> type(cfl)
<class '_io.FileIO'>
>>> cfl.close()
>>> cfl = open ('c:/temp9/Capability/Analyzing Data.mp4', 'rb', buffering = -1)
>>> type(cfl)
<class '_io.BufferedReader'>
>>> io.DEFAULT_BUFFER_SIZE
8192
>>> len(fl.read(70934549))
70934549
>>> cfl.close()
>>> cfl = open ('c:/temp9/Capability/Analyzing Data.mp4', 'rb', buffering = -1)
>>> len(fl.read1(70934549))
Traceback (most recent call last):
  File "<pyshell#44>", line 1, in <module>
    len(fl.read1(70934549))
AttributeError: '_io.FileIO' object has no attribute 'read1'
>>>
msg184332 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-03-16 16:51
> 1. The read mode is not the default mode as mentioned in the
> docs.python.org.

It is. If you don't mention a mode, the mode is "r" by default. But if you mention a mode, then you are required to specify one of "r", "w", "a".

> io.BufferedReader does not implement read1 (the last lines of trace
> below)

It does. You made a mistake in your experiment (you called read1() on a FileIO object, not a BufferedReader object).

> io.FileIO does not implements single OS system call on read() - instead
> reads a file until EOF i.e. ignores the arguments supplied to read() 

Your experiments show otherwise, the argument supplied to read() is observed: if you call read(1024), at most 1024 bytes are returned, etc.

It's only if you call read() without an argument that the file is being read until EOF.
msg184368 - (view) Author: Gurmeet Singh (gsingh) Date: 2013-03-17 13:50
Please consider following before making a decision:

__________
> io.BufferedReader does not implement read1 (the last lines of trace
> below)

>It does. You made a mistake in your experiment (you called read1() on a FileIO object, not a BufferedReader object).

Please see the following lines:
>>> cfl = open ('c:/temp9/Capability/Analyzing Data.mp4', 'rb', buffering = -1)
>>> type(cfl)
<class '_io.BufferedReader'>

According to me it is a _io.BufferedReader only and not just _io.FileIO (the base class). Please tell me if I am wrong here.
msg184369 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-03-17 14:01
You called read1() on fl (a FileIO object) and not cfl (a BufferedReader object). Your fault for choosing confusing variable names :-)

>>> len(fl.read1(70934549))
Traceback (most recent call last):
  File "<pyshell#44>", line 1, in <module>
    len(fl.read1(70934549))
AttributeError: '_io.FileIO' object has no attribute 'read1'

Please try to call cfl.read1() and see whether it works (it should).
msg184370 - (view) Author: Gurmeet Singh (gsingh) Date: 2013-03-17 14:01
Please consider following before making a decision:

>> io.FileIO does not implements single OS system call on read() - instead
>> reads a file until EOF i.e. ignores the arguments supplied to read() 

>Your experiments show otherwise, the argument supplied to read() is >observed: if you call read(1024), at most 1024 bytes are returned, etc.

>It's only if you call read() without an argument that the file is being >read until EOF.

I said this because I saw the following in the docs:
>class io.RawIOBase 
>read(n=-1) 
>Read up to n bytes from the object and return them. As a convenience, >if n is unspecified or -1, readall() is called. Otherwise, only one >system call is ever made. Fewer than n bytes may be returned if the >operating system call returns fewer than n bytes.

If only one system call is being made, then I think that 
fl.read(256) and fl.read(70934549) should take same amount of time to complete - assuming disk I/O is the time consuming factor in this operation (as compared to memory processing).

I am only saying that instead of one system call being made - the whole size specified by read is being read (by multiple system calls - as it appears to me). 

Please tell me if I am wrong here.
msg184371 - (view) Author: Gurmeet Singh (gsingh) Date: 2013-03-17 14:02
@Antoine - wait I will do it
msg184374 - (view) Author: Gurmeet Singh (gsingh) Date: 2013-03-17 14:09
@Antoine 
It worked. I was wrong to say read1() was not implemented. Sorry.

But please do consider other issues.
msg184375 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-03-17 14:11
> If only one system call is being made, then I think that 
> fl.read(256) and fl.read(70934549) should take same amount of time to
> complete - assuming disk I/O is the time consuming factor in this
> operation (as compared to memory processing).

What do you mean? Reading a large number of bytes will most certainly always be slower than reading a small number of bytes, even if it only takes one system call. You still have to copy the data from disk or filesystem buffers into userspace.

A reasonable expectation is for read(N) to be O(N), but not O(1). You might want to check that by timing it with different N values.
msg184376 - (view) Author: Gurmeet Singh (gsingh) Date: 2013-03-17 14:32
I did the following to understand time taken for in memory copy:
1>>> fl = io.FileIO('c:/temp9/Capability/Analyzing Data.mp4', 'rb')
2>>> byt = fl.read(70934549)
3>>> byt2 = None
4>>> byt2 = byt[:]
5>>> fl.close()
6>>> fl = io.FileIO('c:/temp9/Capability/Analyzing Data.mp4', 'rb')
7>>> byt = fl.read(1)

I found that python interpreter blocked for negligible time on line 4 (and line 7), as compared to line 2. 

I assume that line 4 is a correct syntax for an in place memory copy.

Therefore, multiple system calls could be taking place - This is how I assumed. Please suggest if I am incorrect.
msg184378 - (view) Author: Gurmeet Singh (gsingh) Date: 2013-03-17 14:34
Sorry, typo in the last post - I meant "in memory - memory copy" not "in place memory copy".
msg184380 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-03-17 15:07
Bytes objects are immutable, so trying to "copy" them doesn't copy anything actually (it's an optimization):

>>> b = b"x" *10
>>> id(b)
139720033059920
>>> b2 = b[:]
>>> id(b2)
139720033059920

FileIO.read() only calls the underlying read() once, you can check the implementation:
http://hg.python.org/cpython/file/8002f45377d4/Modules/_io/fileio.c#l703
msg184383 - (view) Author: Gurmeet Singh (gsingh) Date: 2013-03-17 16:02
Thanks for letting me know about the optimization. 

I trusted you that the system call is made once, though I looked up code to see if size of the read in buffer is being passed to the C routine. I should apologize though for raising this issue - since it is incorrect.

But, I think you would be interested (out of CURIOSITY) in findings from the last experiment that I did to understand this issue:

1 >>> import io
2 >>> fl = io.FileIO('c:/temp9/Capability/Analyzing Data.mp4', 'rb')
3 >>> barr = bytearray(70934549)
4 >>> barr2= bytearray(70934549)
5 >>> id(barr)
29140440
6 >>> id(barr2)
26433560
7 >>> fl.readinto(barr)
70934549
8 >>> barr2 = barr[:]
9 >>> fl.close()
10 >>> fl = io.FileIO('c:/temp9/Capability/Analyzing Data.mp4', 'rb')
11 >>> barrt = bytearray(1)
12 >>> id(barrt)
34022512
13 >>> fl.readinto(barrt)
1
14 >>> fl.close()
>>>
 
The time of line 7 was much greater than line 13. It was also greater than 8 (but not that great as of 11). But I cannot say for sure that the time for line 13 plus line 8 is equal to or lesser than 7 - it looks lesser but needs more precise testing to say anything further.
 
I tried to reason the situation as follows (for this I looked up the hyperlink that you gave). I feel that the underlying system call takes the size argument - so I guess that large value suggests the C compiler to make ask the disk subsystem to read up the longer data - hence it takes the time since disk access is slower.

Thanks for your time. Sorry for the incorrect issue.
msg184385 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-03-17 16:09
> The time of line 7 was much greater than line 13.

Well, yes, reading 70 MB is much longer than reading a single byte :-)

> I feel that the underlying system call takes the size argument

Indeed it does. It would be totally inefficient if it didn't.

> so I guess that large value suggests the C compiler to make ask the
> disk subsystem to read up the longer data - hence it takes the time
> since disk access is slower.

It's not the C compiler. It's the OS kernel which reads data from the
disk when you ask to.
msg184386 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-03-17 16:09
Anyway, I'm now closing the issue as invalid.
History
Date User Action Args
2022-04-11 14:57:42adminsetgithub: 61642
2013-03-17 16:09:35pitrousetstatus: open -> closed

messages: + msg184386
2013-03-17 16:09:08pitrousetmessages: + msg184385
2013-03-17 16:02:19gsinghsetmessages: + msg184383
2013-03-17 15:07:48pitrousetmessages: + msg184380
2013-03-17 14:34:17gsinghsetmessages: + msg184378
2013-03-17 14:32:36gsinghsetmessages: + msg184376
2013-03-17 14:11:17pitrousetmessages: + msg184375
2013-03-17 14:09:36gsinghsetmessages: + msg184374
2013-03-17 14:02:13gsinghsetmessages: + msg184371
2013-03-17 14:01:20gsinghsetmessages: + msg184370
2013-03-17 14:01:06pitrousetmessages: + msg184369
2013-03-17 13:50:23gsinghsetstatus: pending -> open

messages: + msg184368
2013-03-16 20:50:54amaury.forgeotdarcsetstatus: open -> pending
resolution: not a bug
2013-03-16 16:51:03pitrousetmessages: + msg184332
2013-03-16 16:41:52serhiy.storchakasetnosy: + pitrou, benjamin.peterson, stutzbach, hynek
components: + IO
2013-03-16 16:33:23gsinghcreate