classification
Title: urllib.request still has old 2.x urllib primitives
Type: Stage: needs patch
Components: Library (Lib) Versions: Python 3.2, Python 3.3, Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: akira, eric.araujo, ezio.melotti, jhylton, mcjeff, nadeem.vawda, neologix, orsenthil, pitrou, python-dev, techtonik
Priority: normal Keywords: patch

Created on 2010-10-08 11:14 by pitrou, last changed 2013-03-19 00:08 by orsenthil. This issue is now closed.

Files
File name Uploaded Description Edit
issue10050.patch mcjeff, 2011-03-18 22:57 review
issue10050.patch mcjeff, 2011-03-19 01:18 Second Revision review
issue10050.patch mcjeff, 2011-03-19 01:47 Third Update - Change Synopsis review
issue10050.patch mcjeff, 2011-03-20 03:03 Applied Antoine's suggestions review
issue10050-deprecation.patch orsenthil, 2012-03-14 06:33 review
Messages (44)
msg118182 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-10-08 11:14
3.x urllib.request still has a lot of stuff inherited from 2.x urllib: URLopener, FancyURLopener, the urlretrieve implementation (perhaps others?).

Ideally, urlretrieve should be reimplemented using urlopen or build_opener, and the other stuff deprecated.
msg131369 - (view) Author: Jeff McNeil (mcjeff) * Date: 2011-03-18 22:57
Alright, attaching a patch that reworks urlretrieve to use urlopen internal to urllib.request.  

1. I dropped the local caching as it isn't turned on by default anyway (and isn't really documented).

2. Updated documentation to reflect caching changes & make urlretrieve part of the official API again.

3. Kept the urlcleanup function, but use a global list to track temporary files.  I'd be happy to change this functionality if that makes sense.

4. After moving the urlretrieve stuff out of test_urllibnet, I realized that file didn't serve much of a purpose any longer, so I just removed it. 

5. Updated NEWS.

I'd be happy to rework any of this in order to bring it up to stuff. Comments and suggestions are very much welcomed.
msg131371 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-03-18 23:48
Follow the “review” link next to the patch for an initial review.
msg131373 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011-03-19 00:21
Significant patch. Thanks. I looked at the review too. For the
Context Manager part, don't club it along with this one. We need to
test this thoroughly, after this is in shape, that can be addressed.
msg131375 - (view) Author: Jeff McNeil (mcjeff) * Date: 2011-03-19 01:18
Made recommended changes. Moved to NamedTemporaryFile. I don't think the spooled file makes sense here as the existing protocol provides a filename in the returned tuple, not a f.l.o. 

As far as the description?  Here are a couple suggestions:

1. URL Retrieval Library
2. URL Access Module

Updated the module documentation as well as the howto.
msg131376 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011-03-19 01:21
Changing the description is a minor update. The term 'URL access module' seems fine.
msg131380 - (view) Author: Jeff McNeil (mcjeff) * Date: 2011-03-19 01:47
Made requested change to Synopsis/Description.
msg131411 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-03-19 13:13
I don't think the tests should be moved from one file to the other. It's making more difficult to tell whether you have changed them or not. I think moving the tests (as well as changing the synopsis, hello Eric) are cosmetic changes that are better done in separate changesets.

Some other things:
- please do the "import tempfile" at the top-level. Imports from functions are generally frown upon, unless otherwise necessary.
- instead of "try ... finally: tfp.close()", you can simply write "with tfp: ..."
- I don't understand why `size` is read only when a reporthook is set, while it is always used for raising ContentTooShortError
- I'm not sure why the reporthook is called with `bs` rather than `len(block)`. I think the user is more interested in the actual number of bytes.
- you don't need to update Misc/NEWS yourself
msg131419 - (view) Author: Jeff McNeil (mcjeff) * Date: 2011-03-19 14:16
I'll make those changes, sure.  I had the same thought re: block size, but I was trying to keep inline with what the current function did.
msg131471 - (view) Author: Jeff McNeil (mcjeff) * Date: 2011-03-20 03:03
Take four! Includes Antoine's suggestions. I changed the callback to return (block num, read size, file size) as opposed to (block num, block size, file size) as this seems to make more sense. 

I appreciate the back and forth. I'd be happy to create issues & handle the other things that have been moved out of this patch.
msg131476 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011-03-20 04:30
Thanks again.

Agree to use of returning the block size to the returnhook callable.
But this is going be a deviation from the existing behavior which was
present to inform the user at regular intervals and did not bother to
provide any 'progressive size' related data, so this change should
documented as new behavior.

With respect to the docs patch in urllib2 howto I would do something
like this.

with open(local_filename) as fh: 
     content = fh.read()

Or *better* would demonstrate the filename argument passing and
reporthook functionality. If you wish, take this part alone, (the
howto document) update as separate patch, which should be committed
once this is in.

There is one comment which I forgot to mention earlier.

The current urlretrieve function is internally calling the URLOpener's
retrieve method. 

Those URLOpener class might need a DeprecationWarning while address
thing bug and that particular retrieve method can be updated to use
the updated facility just as bonus till the time it survives.

Antoine - any suggestions on the last point?
msg131488 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-03-20 12:10
> There is one comment which I forgot to mention earlier.
> 
> The current urlretrieve function is internally calling the URLOpener's
> retrieve method. 
> 
> Those URLOpener class might need a DeprecationWarning while address
> thing bug and that particular retrieve method can be updated to use
> the updated facility just as bonus till the time it survives.
> 
> Antoine - any suggestions on the last point?

Can you clarify the issue?
msg131490 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011-03-20 12:51
Antoine Pitrou wrote:
> Can you clarify the issue?

URLopener which is an old class from the merge of urllib and urllib2
and it can be slowly and safely removed. If we go this line, then I
assume it has to have a DeprecationWarning before we remove it. Should
we or not?

My thought is, in Python 3.x these may not be in use, but still if we
remove without DeprecationWarning, it could raise concerns with some
folks (given some discussion about this recently when people trying to
upgrade from 2.x to 3.x).

At the moment, urlretrieve function is calling  URLopener.retrieve internally.
But this patch addresses urlretrive function at much higher level and leaves
the URLopener.retrieve in a hanging state. So, I was thinking how to
handle this scenario.

Change URLopener.retrieve also with the cleaner and modern code, but
and add a DeprecationWarning in those.
msg131491 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-03-20 13:06
> URLopener which is an old class from the merge of urllib and urllib2
> and it can be slowly and safely removed. If we go this line, then I
> assume it has to have a DeprecationWarning before we remove it. Should
> we or not?

Yes, we should.

> Change URLopener.retrieve also with the cleaner and modern code, but
> and add a DeprecationWarning in those.

Sounds overkill and of questionable interest.
Honestly, I don't think URLopener.retrieve() has much point anyway.
Perhaps it would have if the whole caching thing had been implemented.
msg131505 - (view) Author: Jeff McNeil (mcjeff) * Date: 2011-03-20 16:21
I'm not exactly sure what the steps are with respect to the DeprecationWarning. Is the common case just to raise the warning in the __init__ method? Are there related documentation changes?

Thanks again! Learning a ton. Hopefully the next patch I submit will go much smoother.
msg131511 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-03-20 16:52
Antoine: you’re right about not mixing concerns, sorry.

Regarding deprecation, we should be cautious and maybe plan it across more than two versions.  See thread starting at http://mail.python.org/pipermail/python-dev/2011-March/109450.html
msg131718 - (view) Author: Jeff McNeil (mcjeff) * Date: 2011-03-22 01:36
Just wanted to check so this doesn't sit with people waiting on me.  Is there anything else I need/should do to this patch? Little unclear on how to handle the deprecation process.
msg131719 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011-03-22 01:42
No, it is not waiting for you, in fact it is waiting for me to commit
the portion which you have already contributed  with some additional
code (DeprecationWarning). 

But this bug does not end up here as there are other primitives which
may need cleanup and have  DeprecationWarnings. Those needs to be
addressed as well, either as part of this issue or as separate issues.
msg132617 - (view) Author: Jeff McNeil (mcjeff) * Date: 2011-03-30 22:47
I'd be happy to pick some of that stuff up. I'd like to address separately as it keeps fewer concerns in this one patch. I'll grab them once they're created.
msg132690 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-03-31 17:57
Feel free to create them and set orsenthil as assignee.
msg155709 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-03-14 02:29
New changeset 53715804dc71 by Senthil Kumaran in branch 'default':
Issue10050 - urlretrieve uses newer urlopen. reporthook of urlretrieve takes, block number, block read size, file_size
http://hg.python.org/cpython/rev/53715804dc71
msg155710 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012-03-14 02:32
One at the time - urlretrieve is re-written. Now should add deprecation warnings to things methods to be deprecated / removed.
msg155712 - (view) Author: Jeff McNeil (mcjeff) * Date: 2012-03-14 02:49
I'd be happy to do that (URLopener, to begin with), though I'm not familiar with the usual approach.  Is it simply a matter of warning in __init__?
msg155715 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012-03-14 03:04
Yes, it is simply a matter of adding warnings.warn in proper classes
and methods. I have started on that, Jeff. No problem. I just wanted
the changes to be two separate commits.  Thanks!
msg155725 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012-03-14 06:33
Here is patch which adds DeprecationWarning to URLOpener and (it's subclasses FancyURLopener) as well as some other Request methods.
The idea would be, in the next release we remove this old URLOpener and it's subclasses and remove (or make _private) the Request methods.

Added tests for DeprecationWarnings too.

I am trying to see if more classes/methods from old urllib can be deprecated/removed.  

But this first patch is for no-doubt ones. If one have any review comments, please provide. Otherwise I shall check it in 3.3.a1.


Thanks!
msg155782 - (view) Author: Jeff McNeil (mcjeff) * Date: 2012-03-14 19:06
I don't see why we'd need to make these _private -- they're just accessors/mutators for the most part.  I'd be happy to help clean this up if you need it.
msg155787 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012-03-14 20:37
My concern was with the method get_full_url. I have seen it being
useful in more than one place and it's not just the accessor. Others
are just the accessors and those can be safely removed. When
get_full_url gives the correct url, taking into consideration the url
fragment, I think that should stay. Other's can be cleaned up.

Note that in 3.3, we should just be deprecating and as soon as 3.3 is
out, we can remove these. 

Thanks,
Senthil
msg155788 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-03-14 20:44
New changeset eab274c7d456 by Senthil Kumaran in branch 'default':
deprecated the old urllib primitives in 3.3 urllib package - issue 10050
http://hg.python.org/cpython/rev/eab274c7d456
msg155792 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-03-14 21:00
New changeset 30f13d7fecd0 by Senthil Kumaran in branch 'default':
Fix the buildbot breakdown - issue 10050
http://hg.python.org/cpython/rev/30f13d7fecd0
msg156741 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2012-03-25 08:34
Since eab274c7d456, all the buildbots are randomly failing, e.g.

"""
======================================================================
FAIL: test_method_deprecations (test.test_urllib2.OpenerDirectorTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/test/test_urllib2.py", line 622, in test_method_deprecations
    req.add_data("data")
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/contextlib.py", line 54, in __exit__
    next(self.gen)
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/test/support.py", line 766, in _filterwarnings
    missing[0])
AssertionError: filter ('', DeprecationWarning) did not catch any warning
"""

http://python.org/dev/buildbot/all/builders/AMD64%20Snow%20Leopard%202%203.x/builds/231/steps/test/logs/stdio
msg156747 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012-03-25 09:23
It's funny,  Antoine had raised an issue too.  Should see why is this
sporadic failures..
msg156759 - (view) Author: Jeff McNeil (mcjeff) * Date: 2012-03-25 16:39
I was looking at a somewhat unrelated issue and I bumped up against a similar situation with the warnings system. 

I didn't look too much into it, but it appeared that warnings didn't get added to __warningregistry__ correctly. Though, when I set the stack level explicitly (to an incorrect value for the issue at hand), the warnings were caught by catch_warnings.

I don't know if it is related or not, but my assumption at the time was that smarter people than I had vetted warnings. =)
msg156763 - (view) Author: Jeff McNeil (mcjeff) * Date: 2012-03-25 17:33
Disregard. That was in concert with ntpath, which uses a funky approach to testing.
msg172843 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2012-10-14 02:16
Are we still planning on removing URLopener and FancyURLopener in 3.4? The documentation for 3.3 does not list these classes as deprecated.
msg172846 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012-10-14 03:41
Only the classes which are marked as deprecated are allowed removed in
the next release.
So the ones which we marked as deprecated in 3.3 can safely go in 3.4.
URLopener and FancyURLopener AFAIR had some dependency/ usage,
deprecating those may require some refactoring of the existing code.
msg172874 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2012-10-14 11:13
Hmm, OK. URLopener and FancyURLopener do each issue a DeprecationWarning when used, though. If they are not actually deprecated, perhaps we should remove the warnings for the moment?
msg172879 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-10-14 13:00
See also #12707.
msg174869 - (view) Author: Akira Li (akira) * Date: 2012-11-05 07:15
Related issue 16409
msg174887 - (view) Author: anatoly techtonik (techtonik) Date: 2012-11-05 10:32
Please note that changes to urlretrieve to return block size 0 in callback breaks existing PyPI apps, such as http://pypi.python.org/pypi/wget

It would be nice if you revert this part from http://hg.python.org/cpython/rev/53715804dc71

Also if you change semantics of "block size" (which is assumed to be fixed) to "data read size" then the "count of blocks transferred" losses any sense.

Changed callback semantics without renaming function or introducing a different type of callback doesn't add clarity to the code that should be maintained for both Python 2 and Python 3 version.

Please add issue #16409 as affected by this bug.
msg175301 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-11-10 23:13
New changeset e19a6194aee4 by Gregory P. Smith in branch '3.3':
Fix test_urllib broken by my previous commits.  The assumptions it was
http://hg.python.org/cpython/rev/e19a6194aee4

New changeset dcf9a07830a6 by Gregory P. Smith in branch 'default':
Fix test_urllib broken by my previous commits.  The assumptions it was
http://hg.python.org/cpython/rev/dcf9a07830a6
msg175314 - (view) Author: Jeff McNeil (mcjeff) * Date: 2012-11-11 04:31
Reverting of the len(block) back to 'bs' here aside, does it even make sense to include block information at all? 

That's the attempted read size, so it might not be an accurate representation of the size of the data actually read.  Thus the reason for the initial change.
msg175317 - (view) Author: Jeff McNeil (mcjeff) * Date: 2012-11-11 04:47
Ah, disregard. I followed up on the other bug. The legacy interface indeed should have stayed consistant.
msg184556 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2013-03-18 23:42
I think, this issue should have been closed. Or at least be closed now.

1. urlretrieve - change was done.

2. Re: msg172874 - DeprecationWarnings on URLopener have been in place since 3.3. Also in the documentation in the Legacy Interface section, it is mentioned that "They might become deprecated at some point in the future." - Adding an Explicit deprecation in 3.3 is okay. It can even be done now. 

3. All other old primitives have been deprecated/documented to be removed.
msg184563 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-03-19 00:07
New changeset ea76cfff5851 by Senthil Kumaran in branch '3.3':
#10050 - Document DeprecationWarnings for URLopener and FancyURLopener (msg172874 )
http://hg.python.org/cpython/rev/ea76cfff5851

New changeset d4cbd9e2e1cb by Senthil Kumaran in branch 'default':
#10050 : merge to default
http://hg.python.org/cpython/rev/d4cbd9e2e1cb
History
Date User Action Args
2013-03-19 00:08:18orsenthilsetstatus: open -> closed
resolution: fixed
2013-03-19 00:07:38python-devsetmessages: + msg184563
2013-03-18 23:42:22orsenthilsetmessages: + msg184556
2012-11-11 04:47:44mcjeffsetmessages: + msg175317
2012-11-11 04:31:48mcjeffsetmessages: + msg175314
2012-11-10 23:13:36python-devsetmessages: + msg175301
2012-11-05 10:32:36techtoniksetversions: + Python 3.4
2012-11-05 10:32:22techtoniksetnosy: + techtonik
messages: + msg174887
2012-11-05 07:15:55akirasetnosy: + akira
messages: + msg174869
2012-10-14 13:00:07ezio.melottisetmessages: + msg172879
2012-10-14 11:13:12nadeem.vawdasetmessages: + msg172874
2012-10-14 03:41:10orsenthilsetmessages: + msg172846
2012-10-14 02:16:27nadeem.vawdasetmessages: + msg172843
2012-03-25 17:33:42mcjeffsetmessages: + msg156763
2012-03-25 16:39:21mcjeffsetmessages: + msg156759
2012-03-25 09:23:45orsenthilsetmessages: + msg156747
2012-03-25 08:34:35neologixsetnosy: + neologix
messages: + msg156741
2012-03-14 21:00:45python-devsetmessages: + msg155792
2012-03-14 20:44:04python-devsetmessages: + msg155788
2012-03-14 20:37:00orsenthilsetmessages: + msg155787
2012-03-14 19:06:54mcjeffsetmessages: + msg155782
2012-03-14 06:33:51orsenthilsetfiles: + issue10050-deprecation.patch

messages: + msg155725
2012-03-14 03:04:31orsenthilsetmessages: + msg155715
2012-03-14 02:49:37mcjeffsetnosy: + mcjeff
messages: + msg155712
2012-03-14 02:32:29orsenthilsetmessages: + msg155710
2012-03-14 02:29:46python-devsetnosy: + python-dev
messages: + msg155709
2012-03-13 08:57:54mcjeffsetnosy: - mcjeff
2011-08-07 23:54:48ezio.melottisetnosy: + ezio.melotti
2011-03-31 17:57:31eric.araujosetmessages: + msg132690
2011-03-30 22:47:56mcjeffsetmessages: + msg132617
2011-03-22 01:42:38orsenthilsetnosy: jhylton, orsenthil, pitrou, nadeem.vawda, eric.araujo, mcjeff
messages: + msg131719
2011-03-22 01:36:44mcjeffsetnosy: jhylton, orsenthil, pitrou, nadeem.vawda, eric.araujo, mcjeff
messages: + msg131718
2011-03-20 16:52:54eric.araujosetnosy: jhylton, orsenthil, pitrou, nadeem.vawda, eric.araujo, mcjeff
messages: + msg131511
2011-03-20 16:21:14mcjeffsetnosy: jhylton, orsenthil, pitrou, nadeem.vawda, eric.araujo, mcjeff
messages: + msg131505
2011-03-20 13:06:35pitrousetnosy: jhylton, orsenthil, pitrou, nadeem.vawda, eric.araujo, mcjeff
messages: + msg131491
2011-03-20 12:51:40orsenthilsetnosy: jhylton, orsenthil, pitrou, nadeem.vawda, eric.araujo, mcjeff
messages: + msg131490
2011-03-20 12:10:13pitrousetnosy: jhylton, orsenthil, pitrou, nadeem.vawda, eric.araujo, mcjeff
messages: + msg131488
2011-03-20 04:30:43orsenthilsetnosy: jhylton, orsenthil, pitrou, nadeem.vawda, eric.araujo, mcjeff
messages: + msg131476
2011-03-20 03:03:12mcjeffsetfiles: + issue10050.patch
nosy: jhylton, orsenthil, pitrou, nadeem.vawda, eric.araujo, mcjeff
messages: + msg131471
2011-03-19 14:16:50mcjeffsetnosy: jhylton, orsenthil, pitrou, nadeem.vawda, eric.araujo, mcjeff
messages: + msg131419
2011-03-19 13:13:08pitrousetnosy: jhylton, orsenthil, pitrou, nadeem.vawda, eric.araujo, mcjeff
messages: + msg131411
2011-03-19 01:47:29mcjeffsetfiles: + issue10050.patch
nosy: jhylton, orsenthil, pitrou, nadeem.vawda, eric.araujo, mcjeff
messages: + msg131380
2011-03-19 01:21:31orsenthilsetnosy: jhylton, orsenthil, pitrou, nadeem.vawda, eric.araujo, mcjeff
messages: + msg131376
2011-03-19 01:18:14mcjeffsetfiles: + issue10050.patch
nosy: jhylton, orsenthil, pitrou, nadeem.vawda, eric.araujo, mcjeff
messages: + msg131375
2011-03-19 00:21:13orsenthilsetnosy: jhylton, orsenthil, pitrou, nadeem.vawda, eric.araujo, mcjeff
messages: + msg131373
2011-03-18 23:48:07eric.araujosetnosy: jhylton, orsenthil, pitrou, nadeem.vawda, eric.araujo, mcjeff
messages: + msg131371
2011-03-18 22:57:49mcjeffsetfiles: + issue10050.patch

messages: + msg131369
keywords: + patch
nosy: jhylton, orsenthil, pitrou, nadeem.vawda, eric.araujo, mcjeff
2011-03-18 17:07:48nadeem.vawdasetnosy: + nadeem.vawda
2011-03-18 02:43:40mcjeffsetnosy: + mcjeff
2010-11-01 21:01:45eric.araujosetnosy: + eric.araujo
2010-10-08 11:14:29pitroucreate