This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: size limit exceeded for read() from network drive
Type: Stage:
Components: Windows Versions:
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: georg.brandl, loewis, markshep, tim.peters
Priority: normal Keywords:

Created on 2006-04-28 16:46 by markshep, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (9)
msg28375 - (view) Author: Mark Sheppard (markshep) Date: 2006-04-28 16:46
If you've got a network share mounted as a local drive
then Windows has a limit of 67,076,095 (0x03ff7fff)
bytes for a read from an open file on that drive.

Running the python read() method on an open file larger
than this size throws an "IOError: [Errno 22] Invalid
argument" exception.

A fix would be for python to internally use multiple
reads so as to not exceed this limit.
msg28376 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2006-04-29 13:23
Logged In: YES 
user_id=849994

How can it be determined whether exactly this restriction
caused the "invalid argument" error? If it can't, there's
nothing that can be done -- restricting all reads just
because of a Windows limitation doesn't seem right.
msg28377 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2006-04-30 10:10
Logged In: YES 
user_id=21627

What version of Windows are you using? Do you know of any
documentation of this limit? (without actually testing, I
find it hard to believe that this limit exists in Windows)
msg28378 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2006-04-30 16:23
Logged In: YES 
user_id=31435

Martin, here's an MS article seemingly related to this:

http://support.microsoft.com/default.aspx?scid=kb;en-us;899149

However, it's about writing to a file on a network drive,
not reading from it.  It says that opening the file in 'w+b'
mode, instead of 'wb' mode, is a workaround.

I couldn't find anything documenting the same kind of
problem for reading.
msg28379 - (view) Author: Mark Sheppard (markshep) Date: 2006-05-02 10:48
Logged In: YES 
user_id=1512331

I'm running Windows XP.  I've been unable to find any
documentation about this exact problem - only that fwrite
thing.  But my testing shows that it works if I do
file.read(67076095), but throws an exception with
file.read(67076096).

I'm not suggesting limiting all reads from Python.  All I'm
suggesting is that under the hood the Windows implementation
of Python's read() call actually uses multiple fread() (or
whatever) calls if more than 67076095 bytes need to be read.
 That's all.  No interface changes.
msg28380 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2006-05-02 19:00
Logged In: YES 
user_id=21627

I could reproduce the write problem on XPSP2; I get the
Win32 error ERROR_NO_SYSTEM_RESOURCES after fwrite returns
(from GetLastError).

I can't reproduce the fread problem, though: in Python,
f.read(90*2**20) just returns with a 90MiB string. So it
could be a limitation of your machine (e.g. it might not
have enough memory), or of the server machine. I'm hesitant
to add a work-around for that into Python if this isn't a
system limitation. Performing multiple reads is also bad:
what happens if the first read succeeds, and the second one
fails? It might be that the system *really* is out of resources.
msg28381 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2006-05-03 01:45
Logged In: YES 
user_id=31435

Sorry, I'm closing this as "3rd Party, Won't Fix".  It's
certainly not Python's doing that Microsoft's stdio
implementation has peculiar undocumented warts (Python just
calls the platform C's fread() here), so at best this is a
request for enhancement rather than a Python bug.

If there is a bug here, it's Microsoft's bug, and then the
proper source for a fix is also Microsoft.  This is
especially true since the two people who have tried this
here don't see the same behavior -- we don't even know what
"the bug" is.
msg28382 - (view) Author: Mark Sheppard (markshep) Date: 2006-05-03 10:38
Logged In: YES 
user_id=1512331

Thanks for closing this bug without giving me a chance to
follow up!

The problem isn't caused by a limitation of my machine -
it's got 3 GiB of RAM.

I've done some more testing on this and the problem only
appears when connected to a server running certain SMB
implementations:

  The local Windows XP machine
  A remote Windows XP machine
  Samba 3.0.22 on Linux

When connected to servers running the following SMB
implementations the problem isn't present:

  Windows NT 4.0 Server
  Windows Server 2000
  Windows Server 2003 Standard Edition

As this error is being returned by the underlying fread()
call the proper place for it to be fixed is there, but the
chances of Microsoft doing so for Windows XP are negligible.

As you're trying to provide a cross-platform language then
having to put up with OS's undocumented warts is just part
of the job.  As it's entirely possible for you to implement
a work-around for this problem Python I think you should. 
One of reasons for using a high level language like Python
is to be insulated from system quirks likes this.  If you're
refusing to smooth over these quirks where possible then
you're undermining that reason.

The documentation for Python's read() method on a file
handle already says "Note that this method may call the
underlying C function fread() more than once", so this
possibility is already catered for in the documentation.

As this problem only affects remotely mounted filesystems
the workaround need only be used for such filesystems.  You
can determine whether or not a drive is a network one by
using the GetDriveType() Windows call.
msg28383 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2006-05-04 06:18
Logged In: YES 
user_id=31435

markshep:  As you discovered, closing the report doesn't
stop you from following up.  It just reflects the reality
that I don't consider this to be a Python bug, and am
opposed to trying to worm around it inside Python.

Like many people who have just been burned by a platform
quirk, I think you're over-selling the severity of the
problem while ignoring the costs of worming around it. 
Adding piles of Windows-specific code to what's _currently_
a simple and uniform implementation is an ongoing
maintenance headache, not least because that code will stick
around long after the next version of Windows has removed
the cause for it.  In the meantime it complicates the code
with obscure platform-specific hacks, reducing the
reliability of the code because it also reduces the code's
clarity.  The code can't be sanely tested by Python's
standard test suite either (it apparently requires a Windows
network to provoke, and the test suite assumes no such
thing), and untested hack-code is a horrible idea over time.

While it's true that the docs allow for multiple reads under
the covers, it's talking about cases like file objects
returned by a popen() call or a socket makefile() call when
read() is passed a `size` argument, or when read() is called
with no `size` argument (so it's impossible to know in
advance how large a buffer may be needed to reach EOF).  The
entire reading code for an explicitly-sized read on a
genuine file is a single

    return fread(buf, 1, n, stream);

call today, and on all platforms.

It doesn't look like this can end with reading either:  MS
documents a similar problem with writing, and I expect you
want to see that hacked around too (or, if not, you're
pretty selective ;-)).  Pain spreads.

In return, what's the benefit?  The fact that it _is_ so
hard to find anything via Google about this strongly
suggests to me that trying to read more than 64MB in one
gulp across a vulnerable Windows combo is mighty rare.  If
it happens, the failure isn't silent, an explicit exception
is raised letting the programmer know it didn't work.  While
I appreciate that's irritating, it's not a disaster, and a
programmer who cares can worm around it on their own ("so
don't do that -- read < 64MB per gulp yourself").

Obviously, I'm not going to pursue this.  Since I'm one of
the few people who "does" Windows code for the core, that
does cut the chance that anyone will.  If you want to pursue
it, the best chance is to supply a patch implementing it,
and get someone else to review it.  A stronger case could be
made if, e.g., there was evidence that Perl or PHP or Ruby
or VB or C# or ... intend to worm (or have wormed) around it.
History
Date User Action Args
2022-04-11 14:56:17adminsetgithub: 43292
2006-04-28 16:46:18markshepcreate