This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: [regression] reading from a urllib2 file descriptor happens byte-at-a-time
Type: performance Stage:
Components: Library (Lib) Versions: Python 2.6, Python 2.5
process
Status: closed Resolution: duplicate
Dependencies: Superseder: performance problem in socket._fileobject.read
View: 2632
Assigned To: akuchling Nosy List: akuchling, barry, doko, gregory.p.smith, mhammond, nnorwitz, pitrou, schmir
Priority: release blocker Keywords:

Created on 2008-04-08 21:15 by doko, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (10)
msg65219 - (view) Author: Matthias Klose (doko) * (Python committer) Date: 2008-04-08 21:15
r61009 on the 2.5 branch

  - Bug #1389051, 1092502: fix excessively large memory allocations when
    calling .read() on a socket object wrapped with makefile(). 

causes a regression compared to 2.4.5 and 2.5.2:

When reading from urllib2 file descriptor, python will read the data a
byte at a time regardless of how much you ask for. python versions up to
2.5.2 will read the data in 8K chunks.

This has enough of a performance impact that it increases download time
for a large file over a gigabit LAN from 10 seconds to 34 minutes. (!)

Trivial/obvious example code:

  f =
urllib2.urlopen("http://launchpadlibrarian.net/13214672/nexuiz-data_2.4.orig.tar.gz")
  while 1:
    chunk = f.read()

... and then strace it to see the recv()'s chugging along, one byte at a
time.
msg65488 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-04-14 22:11
See #2632 for more discussion of what is probably the same issue.
msg65503 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2008-04-15 06:11
Bumping the priority.  I'd like to see this fixed before the next
release.  What version(s) does this problem apply to: 2.5, 2.6, 3.0?
msg65504 - (view) Author: Ralf Schmitt (schmir) Date: 2008-04-15 06:21
quoting http://bugs.python.org/issue1389051: "Applied to 2.6 trunk in
rev. 61008 and to 2.5-maint in rev. 61009."

I don't know about py3k...
msg65517 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2008-04-15 13:15
It was applied to 2.5-maint after 2.5.2 was released, BTW, so the change
isn't in any stable released version, only the 2.6 alphas.
msg65538 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2008-04-16 01:23
So if the fix was applied to 2.5 branch and 2.6 (3.0 should have
picked up from 2.6 automatically), can we close this bug?
msg65539 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-04-16 02:18
I don't think the fix was acceptable.  Now python spins consuming all
cpu trying to read trivial amounts of data one byte at a time...

See the discusson at the end of http://bugs.python.org/issue1092502 as
well as a recent python-dev thread:

http://mail.python.org/pipermail/python-dev/2008-April/078613.html
msg65540 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-04-16 02:21
or else i'm missing something here in the maze of three bugs talking
about the same issue..

which revisions fixed the introduced performance issue?
msg65545 - (view) Author: Ralf Schmitt (schmir) Date: 2008-04-16 06:02
me and amk are talking about the commit that introduced this bug (which
was meant as a fix for another bug).
neal seems to think that this commit is the fix to this bug itself.
and gregory, you are now confused :)

hope it's clear now.
msg65990 - (view) Author: Mark Hammond (mhammond) * (Python committer) Date: 2008-04-30 05:55
For those trying to follow along at home: best I can tell we have 3
other issues on this: #1092502 and #1389051 are dupes of an initial bug,
but the fix for those bugs caused regressions reported in this bug and
in #2632.  To try and reduce confusion I'm closing this as a dupe of
#2632 which has a patch for review.
History
Date User Action Args
2022-04-11 14:56:33adminsetnosy: + barry
github: 46853
2008-04-30 05:59:01mhammondsetresolution: duplicate
2008-04-30 05:56:14mhammondsetstatus: open -> closed
2008-04-30 05:56:00mhammondsetnosy: + mhammond
superseder: performance problem in socket._fileobject.read
messages: + msg65990
2008-04-16 06:02:02schmirsetmessages: + msg65545
2008-04-16 02:21:14gregory.p.smithsetmessages: + msg65540
2008-04-16 02:18:30gregory.p.smithsetnosy: + gregory.p.smith
messages: + msg65539
2008-04-16 01:23:34nnorwitzsetmessages: + msg65538
2008-04-15 13:15:04akuchlingsetmessages: + msg65517
2008-04-15 06:21:49schmirsetmessages: + msg65504
versions: + Python 2.6
2008-04-15 06:11:49nnorwitzsetpriority: critical -> release blocker
nosy: + nnorwitz
messages: + msg65503
2008-04-14 22:47:31schmirsetnosy: + schmir
2008-04-14 22:11:04pitrousetnosy: + pitrou
messages: + msg65488
2008-04-09 19:21:30georg.brandlsetpriority: high -> critical
2008-04-08 21:15:29dokocreate