Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urllib.request.urlopen does not return an iterable object #48858

Closed
jwilk mannequin opened this issue Dec 9, 2008 · 19 comments
Closed

urllib.request.urlopen does not return an iterable object #48858

jwilk mannequin opened this issue Dec 9, 2008 · 19 comments
Assignees
Labels
easy stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@jwilk
Copy link
Mannequin

jwilk mannequin commented Dec 9, 2008

BPO 4608
Nosy @rhettinger, @facundobatista, @orsenthil, @devdanzin, @jwilk, @florentx
Files
  • issue4608_py31.diff
  • issue4608_py31-v2.diff
  • tests_issue4608_py31.diff: Test addinfourl iterability
  • tests-iter-urllib-py3k.patch: Fix short write and add tests.
  • issue4608.diff: Fix allowing iteration over urlopen results for "ftp://" and "file://" URLs.
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/rhettinger'
    closed_at = <Date 2011-06-26.12:31:11.789>
    created_at = <Date 2008-12-09.12:13:42.006>
    labels = ['easy', 'type-bug', 'library']
    title = 'urllib.request.urlopen does not return an iterable object'
    updated_at = <Date 2011-06-26.12:31:11.788>
    user = 'https://github.com/jwilk'

    bugs.python.org fields:

    activity = <Date 2011-06-26.12:31:11.788>
    actor = 'rhettinger'
    assignee = 'rhettinger'
    closed = True
    closed_date = <Date 2011-06-26.12:31:11.789>
    closer = 'rhettinger'
    components = ['Library (Lib)']
    creation = <Date 2008-12-09.12:13:42.006>
    creator = 'jwilk'
    dependencies = []
    files = ['12410', '12548', '12987', '18433', '22474']
    hgrepos = []
    issue_num = 4608
    keywords = ['patch', 'easy']
    message_count = 19.0
    messages = ['77405', '77409', '77415', '78139', '78416', '78880', '78944', '79575', '81426', '86365', '113245', '113260', '113280', '134129', '134130', '134319', '139100', '139170', '139171']
    nosy_count = 15.0
    nosy_names = ['jhylton', 'rhettinger', 'facundobatista', 'sschwarzer', 'orsenthil', 'ajaksu2', 'jwilk', 'zanella', 'pl', 'Arfrever', 'marduk', 'flox', 'santoso.wijaya', 'bbrazil', 'python-dev']
    pr_nums = []
    priority = 'high'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue4608'
    versions = ['Python 3.1', 'Python 3.2']

    @jwilk
    Copy link
    Mannequin Author

    jwilk mannequin commented Dec 9, 2008

    $ cat urltest2.5
    #!/usr/bin/python2.5
    from urllib2 import urlopen
    for line in urlopen('http://python.org/'):
            print line
            break
    
    $ ./urltest2.5
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    
    
    $ cat urltest3.0
    #!/usr/bin/python3.0
    from urllib.request import urlopen
    for line in urlopen('http://python.org/'):
            print(line)
            break
    $ ./urltest3.0
    Traceback (most recent call last):
      File "./urltest3.0", line 3, in <module>
        for line in urlopen('http://python.org/'):
    TypeError: 'addinfourl' object is not iterable

    @jwilk jwilk mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Dec 9, 2008
    @orsenthil
    Copy link
    Member

    I verified this bug in the Py3.0 and Py3.1. Shall come out with a patch
    for it.

    @jhylton
    Copy link
    Mannequin

    jhylton mannequin commented Dec 9, 2008

    Oops. I didn't think it translate the code in addinfobase to the new
    style of iterators.

    Jeremy

    On Tue, Dec 9, 2008 at 7:50 AM, Senthil <report@bugs.python.org> wrote:

    Senthil <orsenthil@gmail.com> added the comment:

    I verified this bug in the Py3.0 and Py3.1. Shall come out with a patch
    for it.

    ----------
    nosy: +orsenthil


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue4608\>



    Python-bugs-list mailing list
    Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/jeremy%40alum.mit.edu

    @orsenthil
    Copy link
    Member

    Here is a patch to fix the issue.
    Jeremy, is it approach okay? Or do you have any other suggestion?

    @jwilk
    Copy link
    Mannequin Author

    jwilk mannequin commented Dec 28, 2008

    Regarding Senthil's patch:
    __next__() method seems superfluous to me (and the implementation is buggy).

    @orsenthil
    Copy link
    Member

    Jakub,

    I have attached a revision to the patch.
    You are right, when __iter__ returns self.fp (as in previous patch), the
    __next__ is superflous.
    But, I was thinking of __iter__ returning an instance of addbase,
    instead of self.fp and in that case __next__ was required. But I see
    that i had not changed self.fp to self.

    This is implemented in the similar lines of IOBase class, io.py
    w.r.t to your other comment, why do you think __next__ implementation is
    incorrect?

    Thanks,
    Senthil

    @jwilk
    Copy link
    Mannequin Author

    jwilk mannequin commented Jan 3, 2009

    Oops, __next__ is OK. Sorry for the confusion.

    @facundobatista
    Copy link
    Member

    Senthil, do you think you could provide a test case for this?

    Thank you!

    @facundobatista facundobatista self-assigned this Jan 10, 2009
    @devdanzin
    Copy link
    Mannequin

    devdanzin mannequin commented Feb 8, 2009

    Test cases attached.

    The second one highlights a bug in the current patch, as it fails to
    return a line longer than 65475 chars. This behavior doesn't match trunk's.

    @devdanzin devdanzin mannequin added the easy label Apr 22, 2009
    @orsenthil
    Copy link
    Member

    This issue is already fixed by jeremy at Revision 70815, wherein "The
    response from an HTTP request is now an HTTPResponse instance instead of
    an addinfourl() wrapper instance."

    So the issue won't be present in the py3k code ( confirmed).

    However, the test added by Daniel,which tests for urlopen() for a
    request which is b"verylong" * 8192 still fails.

    It is not just with iteration; but test_200 will fail too if the request
    is a large chunk. This is only in py3k branch, test will pass in the
    trunk code. I am investigating further.

    @bbrazil
    Copy link
    Mannequin

    bbrazil mannequin commented Aug 8, 2010

    This looks as though its a short write:

    [pid 28343] recvfrom(5, "GET / HTTP/1.1\r\nAccept-Encoding:"..., 8192, 0, NULL, NULL) = 118
    [pid 28343] poll([{fd=5, events=POLLOUT, revents=POLLOUT}], 1, 10000) = 1
    [pid 28343] sendto(5, "HTTP/1.0 200 OK\r\n", 17, 0, NULL, 0) = 17
    [pid 28343] poll([{fd=5, events=POLLOUT, revents=POLLOUT}], 1, 10000) = 1
    [pid 28343] sendto(5, "Server: TestHTTP/ Python/3.2a1+\r"..., 33, 0, NULL, 0) = 33
    [pid 28343] poll([{fd=5, events=POLLOUT, revents=POLLOUT}], 1, 10000) = 1
    [pid 28343] sendto(5, "Date: Sun, 08 Aug 2010 09:41:08 "..., 37, 0, NULL, 0) = 37
    [pid 28343] poll([{fd=5, events=POLLOUT, revents=POLLOUT}], 1, 10000) = 1
    [pid 28343] sendto(5, "Content-type: text/plain\r\n", 26, 0, NULL, 0) = 26
    [pid 28343] poll([{fd=5, events=POLLOUT, revents=POLLOUT}], 1, 10000) = 1
    [pid 28343] sendto(5, "\r\n", 2, 0, NULL, 0) = 2
    [pid 28343] poll([{fd=5, events=POLLOUT, revents=POLLOUT}], 1, 10000) = 1
    [pid 28343] sendto(5, "verylongverylongverylongverylong"..., 56001, 0, NULL, 0) = 49054
    [pid 28343] shutdown(5, 1 /* send */) = 0

    @bbrazil
    Copy link
    Mannequin

    bbrazil mannequin commented Aug 8, 2010

    The attached patch handles short writes, and adds ajaksu2's tests.

    @florentx
    Copy link
    Mannequin

    florentx mannequin commented Aug 8, 2010

    Thanks, Brian.
    Pushed with revision 83833.

    @florentx florentx mannequin closed this as completed Aug 8, 2010
    @florentx florentx mannequin assigned florentx and unassigned facundobatista Aug 8, 2010
    @marduk
    Copy link
    Mannequin

    marduk mannequin commented Apr 20, 2011

    This issue appears to persist when the protocol used is FTP:

    root@tp-db $ cat test.py
    from urllib.request import urlopen
    for line in urlopen('ftp://gentoo.osuosl.org/pub/gentoo/releases/'):
    print(line)
    break

    root@tp-db $ python3.2 test.py
    Traceback (most recent call last):
      File "test.py", line 2, in <module>
        for line in urlopen('ftp://gentoo.osuosl.org/pub/gentoo/releases/'):
    TypeError: 'addinfourl' object is not iterable

    @marduk
    Copy link
    Mannequin

    marduk mannequin commented Apr 20, 2011

    Oops, previous example was a directory, but it's the same if the url points to a ftp file.

    @orsenthil orsenthil reopened this Apr 20, 2011
    @zanella
    Copy link
    Mannequin

    zanella mannequin commented Apr 24, 2011

    The patch that makes addinfourl() iterable was not commited due to the change to HTTP request see: msg86365 (http://bugs.python.org/issue4608#msg86365).

    Since urllib is protocol agnostic it should behave the same with FTP, right?

    So, where to fix? Change the addinfourl() to become itrable or change the FTPHandler return?

    @sschwarzer
    Copy link
    Mannequin

    sschwarzer mannequin commented Jun 25, 2011

    It turned out that although the addinfourl instance had the __iter__ attribute in addbase.__init__ correctly assigned, __iter__ wasn't found by the iter builtin. It seems that iter always tries to use the __iter__ method of the class and doesn't look at the instance.

    Riccardo Attilio Galli and I made the attached patch. The patch also fixes a corresponding TypeError for "file://" URLs, not just "ftp://" URLs.

    @rhettinger rhettinger assigned rhettinger and unassigned florentx Jun 25, 2011
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jun 26, 2011

    New changeset c0a68b948f5d by Raymond Hettinger in branch '3.2':
    Issue bpo-4608: urllib.request.urlopen does not return an iterable object
    http://hg.python.org/cpython/rev/c0a68b948f5d

    New changeset d4aeeddf72e3 by Raymond Hettinger in branch 'default':
    Issue bpo-4608: urllib.request.urlopen does not return an iterable object
    http://hg.python.org/cpython/rev/d4aeeddf72e3

    @rhettinger
    Copy link
    Contributor

    Thanks for the patch.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    easy stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants