classification
Title: urlretrieve regression: first call returns block size as 0
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.4, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: gregory.p.smith Nosy List: akira, gregory.p.smith, python-dev, techtonik
Priority: normal Keywords:

Created on 2012-11-04 22:27 by techtonik, last changed 2012-11-10 21:45 by gregory.p.smith. This issue is now closed.

Files
File name Uploaded Description Edit
test.py techtonik, 2012-11-04 22:27
Messages (7)
msg174836 - (view) Author: anatoly techtonik (techtonik) Date: 2012-11-04 22:27
Renamed urllib.urlretrieve changed behaviour in Py3k, which leads to ZeroDivisionErrors in applications that use block_size parameter for calculations. Previously, block size was constant. Now it varies making it impossible to exactly calculate value transferred so far. Test file attached.


C:\Python33\python.exe test.py
0 0 59654
1 8192 59654
2 8192 59654
3 8192 59654
4 8192 59654
5 8192 59654
6 8192 59654
7 8192 59654
8 2310 59654

C:\Python27\python.exe test.py
(0, 8192, 59654)
(1, 8192, 59654)
(2, 8192, 59654)
(3, 8192, 59654)
(4, 8192, 59654)
(5, 8192, 59654)
(6, 8192, 59654)
(7, 8192, 59654)
(8, 8192, 59654)
msg174868 - (view) Author: (akira) * Date: 2012-11-05 07:13
Summary: It is a new behavior. There is no need to change either code or
         docs. Though docs could be clarified to be more explicit.
  
 
The behavior has been introduced only in 3.3 in revision 53715804dc71 
[1] from issue 10050 [2]

It knownly breaks backward-compatibility. The docstring says:

> The reporthook argument should be a callable that accepts a block
> number, a read size, and the total file size of the URL target.

The change was rejected in issue 849407 [3]

Though it is not mentioned in Misc/NEWS and 
Doc/library/urllib.request.rst doesn't mention the change from a static
block size to a read size explicitly:

> The third argument, if present, is a hook function that will be called 
> once on establishment of the network connection and once after each 
> block read thereafter.  The hook will be passed three arguments; a 
> count of blocks transferred so far, a block size in bytes, and the 
> total size of the file.

*after each block read* and *block size in bytes* do not suggest that 
the block size must be constant. This part in the docs could be more 
clear.

There is related issue 1490929 [4]

[1]: http://hg.python.org/cpython/rev/53715804dc71
[2]: http://bugs.python.org/issue10050
[3]: http://bugs.python.org/issue849407
[4]: http://bugs.python.org/issue1490929
msg174875 - (view) Author: anatoly techtonik (techtonik) Date: 2012-11-05 07:55
I strongly disagree with your summary.


It's a new behavior for the old renamed module that clearly breaks existing code ported with 2to3.

The 95% of callable usage is to get the estimated download progress (to draw progress bar or time calculations). This requires calculating value of total blocks like:

    total_blocks = math.ceil(float(total_size) / block_size)

when block_size allowed to be 0, it immediately spawns ZeroDivisionErrors. This change broke http://pypi.python.org/pypi/wget


If you make block size in callback parameters varying, the parameter with number of blocks transferred loses any sense.
msg174914 - (view) Author: (akira) * Date: 2012-11-05 15:01
The summary assumes that issue 10050 is valid i.e., urlretrieve is
reimplemented using new urlopen and 2.x FancyURLopener is deprecated.

It might not be so [1]. In this case the summary is incorrect.

Old implementation is available as:

  opener = FancyURLopener()
  urlretrieve = opener.retrieve

btw, the new implementation doesn't prevent you to estimate the 
progress:

  def make_reporthook():
      read = 0  # number of bytes read so far
      def reporthook(blocknum, blocksize, totalsize):
          nonlocal read
          read += blocksize
          if totalsize > 0:
              percent = read * 1e2 / totalsize
              print("%5.1f%% %*d / %d" % (
                      percent, len(str(totalsize)), read, totalsize))
          else:
              print("read %d" % (read,))
      return reporthook


[1]: http://mail.python.org/pipermail/python-dev/2011-March/109450.html
msg175079 - (view) Author: anatoly techtonik (techtonik) Date: 2012-11-07 10:48
I see nothing wrong with reimplementation with different underlying lib as long as behavior stays the same and lib2to3 converts code correctly.

The API that block size if constant and non-zero is broken with Python 3.3. I thought that API breaks are only allowed for major Python versions?
msg175084 - (view) Author: anatoly techtonik (techtonik) Date: 2012-11-07 11:51
I agree that new implementation doesn't prevent me from estimating the 
progress, but it does make the callback code worse, not better. Either by closure or by global variable, but it makes callback stateful.

To fix this issue either blocks_number parameter should be removed, or previous behavior restored.
msg175297 - (view) Author: Roundup Robot (python-dev) Date: 2012-11-10 21:44
New changeset 19e6e32db5ec by Gregory P. Smith in branch '3.3':
Fixes issue #16409: The reporthook callback made by the legacy
http://hg.python.org/cpython/rev/19e6e32db5ec

New changeset 8c48eb0239ca by Gregory P. Smith in branch 'default':
Fixes issue #16409: The reporthook callback made by the legacy
http://hg.python.org/cpython/rev/8c48eb0239ca
History
Date User Action Args
2012-11-10 21:45:35gregory.p.smithsetstatus: open -> closed
resolution: fixed
2012-11-10 21:44:56python-devsetnosy: + python-dev
messages: + msg175297
2012-11-08 22:36:16gregory.p.smithsetassignee: gregory.p.smith

nosy: + gregory.p.smith
2012-11-07 11:51:57techtoniksetmessages: + msg175084
2012-11-07 10:48:28techtoniksetmessages: + msg175079
2012-11-05 15:01:36akirasetmessages: + msg174914
2012-11-05 07:55:19techtoniksetmessages: + msg174875
2012-11-05 07:13:48akirasetversions: - Python 3.1, Python 3.2
nosy: + akira

messages: + msg174868

type: behavior
2012-11-04 22:27:05techtonikcreate