Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support iterable bodies in httplib #47493

Closed
catlee mannequin opened this issue Jun 30, 2008 · 62 comments
Closed

Support iterable bodies in httplib #47493

catlee mannequin opened this issue Jun 30, 2008 · 62 comments
Assignees
Labels
easy stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@catlee
Copy link
Mannequin

catlee mannequin commented Jun 30, 2008

BPO 3243
Nosy @birkenfeld, @rhettinger, @orsenthil, @pitrou, @catlee, @merwok, @IIIIllllIIIIllllIIIIllllIIIIllllIIIIll, @sorcio, @tzickel
PRs
  • [2.7] bpo-3243: Support iterable bodies in httplib #10226
  • Files
  • python-3243.patch: Patch for python2.7
  • issue_3243_py3k.diff
  • issue_3243_py3k_2.patch
  • issue_3243_py3k_3.patch
  • Issue3243-4.patch
  • issue_3243_py3k_5.patch
  • Issue3243-6.patch
  • issue_3243_py3k_6.patch
  • issue_3243_py3k_7.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/orsenthil'
    closed_at = <Date 2011-01-15.18:26:50.823>
    created_at = <Date 2008-06-30.18:01:18.309>
    labels = ['easy', 'type-feature', 'library']
    title = 'Support iterable bodies in httplib'
    updated_at = <Date 2018-10-30.20:47:43.149>
    user = 'https://github.com/catlee'

    bugs.python.org fields:

    activity = <Date 2018-10-30.20:47:43.149>
    actor = 'tzickel'
    assignee = 'orsenthil'
    closed = True
    closed_date = <Date 2011-01-15.18:26:50.823>
    closer = 'orsenthil'
    components = ['Library (Lib)']
    creation = <Date 2008-06-30.18:01:18.309>
    creator = 'catlee'
    dependencies = []
    files = ['12241', '19852', '19873', '19879', '19890', '19891', '19900', '19901', '19902']
    hgrepos = []
    issue_num = 3243
    keywords = ['patch', 'easy', 'needs review']
    message_count = 62.0
    messages = ['69014', '77037', '84304', '100817', '110675', '120999', '122647', '122707', '122756', '122784', '122787', '122819', '122822', '122878', '122898', '122904', '122905', '122906', '122907', '122909', '122910', '122911', '122912', '122915', '122921', '122970', '122973', '122986', '122987', '122996', '122998', '122999', '123001', '123038', '123039', '123040', '123043', '123044', '123045', '123046', '123047', '123049', '123050', '123051', '123052', '123053', '123069', '123073', '124339', '124340', '124341', '124342', '124343', '124344', '124345', '124355', '124588', '126326', '126342', '126344', '126346', '328946']
    nosy_count = 11.0
    nosy_names = ['jhylton', 'georg.brandl', 'rhettinger', 'orsenthil', 'pitrou', 'catlee', 'eric.araujo', 'rcoyner', 'xuanji', 'davide.rizzo', 'tzickel']
    pr_nums = ['10226']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue3243'
    versions = ['Python 3.2']

    @catlee
    Copy link
    Mannequin Author

    catlee mannequin commented Jun 30, 2008

    httplib should support requests whose bodies are iterable objects. This
    would facilitate doing large file uploads via HTTP since you wouldn't
    have to load the entire file into memory to create the request string.

    Index: Lib/httplib.py
    ===================================================================
    --- Lib/httplib.py (revision 64600)
    +++ Lib/httplib.py (working copy)
    @@ -688,7 +688,12 @@
    self.__state = _CS_IDLE

        def send(self, str):
    -        """Send `str' to the server."""
    +        """Send `str` to the server.
    +
    +        ``str`` can be a string object, a file-like object that supports
    +        a .read() method, or an iterable object that supports a .next()
    +        method.
    +        """
            if self.sock is None:
                if self.auto_open:
                    self.connect()
    @@ -710,6 +715,10 @@
                    while data:
                        self.sock.sendall(data)
                        data=str.read(blocksize)
    +            elif hasattr(str,'next'):
    +                if self.debuglevel > 0: print "sendIng an iterable"
    +                for data in str:
    +                    self.sock.sendall(data)
                else:
                    self.sock.sendall(str)
            except socket.error, v:

    @catlee catlee mannequin added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Jun 30, 2008
    @catlee
    Copy link
    Mannequin Author

    catlee mannequin commented Dec 5, 2008

    The attached patch implements this for python 2.7. It also adds support
    for iterable bodies in urllib2, where it is more generally useful.
    urllib2 enforces the presence of a Content-Length header in the request
    if the body is an iterable, whereas httplib does not.

    The patch also includes updates to docs and tests (which all pass as of
    r67584 on my macbook)

    @jhylton
    Copy link
    Mannequin

    jhylton mannequin commented Mar 28, 2009

    Seems like a reasonable feature request. I'm going to apply a variant
    of the patch in 3.1 first.

    @jhylton jhylton mannequin self-assigned this Mar 28, 2009
    @rcoyner
    Copy link
    Mannequin

    rcoyner mannequin commented Mar 11, 2010

    This patch and its tests still work. Any particular reason why it hasn't been adopted yet?

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Jul 18, 2010

    Could someone with knowledge of httplib please move this forward, all the comments I see are positive.

    @merwok
    Copy link
    Member

    merwok commented Nov 12, 2010

    The patch needs to be ported to the py3k branch.

    @merwok merwok added the easy label Nov 28, 2010
    @orsenthil orsenthil assigned orsenthil and unassigned jhylton Nov 28, 2010
    @IIIIllllIIIIllllIIIIllllIIIIllllIIIIll
    Copy link
    Mannequin

    Hi, attached is a ported version of the patch that applies against py3k.

    @rhettinger
    Copy link
    Contributor

    Instead of
    hasattr(str,'next')

    consider using
    isinstance(str, collections.Iterable)

    Also consider changing the variable name from the now overly type specific, "str" to something like "source" to indicate the significance of the data, not its type.

    @IIIIllllIIIIllllIIIIllllIIIIllllIIIIll
    Copy link
    Mannequin

    Hi Raymond, I assume you're referring to catlee's patch? 'str' has been changed to 'data' in python 3.2

    While porting the patch I ran into this issue, which is that isinstance(str, collections.Iterable) doesn't behave exactly like hasattr(str,'next')

    >>> hasattr("lol", '__next__')
    False
    >>> isinstance("lol", collections.Iterable)
    True

    so using the isinstance method would actually match against strings, which iirc would make the program fail. I can confirm later if needed.

    @orsenthil
    Copy link
    Member

    xuanji, the issue you stumbled upon was just fixed by Raymond for the report bpo-10565.

    @IIIIllllIIIIllllIIIIllllIIIIllllIIIIll
    Copy link
    Mannequin

    What a timely coincidence. I'll try out the change soon.

    @IIIIllllIIIIllllIIIIllllIIIIllllIIIIll
    Copy link
    Mannequin

    Changed according to Raymond's suggestion. I realized that there are some classes (str, bytes, array.array) that are iterable but should not be handled by the iteration logic provided by catlee. eg: if we iterate through b'a message' we'd get a bunch of ints and sending them is wrong. patch attached.

    @pitrou
    Copy link
    Member

    pitrou commented Nov 29, 2010

    Thanks for the patch.

    First, you don't need to support str, since sockets only accept binary strings (not unicode).

    Second, I think it's simpler and more generic to do something like:

    try:
    self.sock.sendall(data)
    except TypeError:
    try:
    it = iter(data)
    except TypeError:
    raise TypeError("data should be a bytes-like object or "
    "an iterable, got %r" % type(it))
    for d in t:
    self.sock.sendall(d)

    @IIIIllllIIIIllllIIIIllllIIIIllllIIIIll
    Copy link
    Mannequin

    pitrou: that sounds good. I attached another patch.

    @IIIIllllIIIIllllIIIIllllIIIIllllIIIIll
    Copy link
    Mannequin

    pitrou: actually that seems a bit suspect now... you need to handle 'data' differently depending on its type, and while you can determine the type by finding out when 'data' throws certain exceptions, it doesn't seem like what exceptions were meant for.

    @pitrou
    Copy link
    Member

    pitrou commented Nov 30, 2010

    pitrou: actually that seems a bit suspect now... you need to handle
    'data' differently depending on its type,

    Yes, but you can't know all appropriate types in advance, so it's better
    to try and catch the TypeError.

    I don't understand your changes in Lib/urllib/request.py. len(data) will
    raise anyway.

    @IIIIllllIIIIllllIIIIllllIIIIllllIIIIll
    Copy link
    Mannequin

    I don't fully understand Lib/urllib/request.py either, I just ported it and ran the unittests... it seems like what it does is that if you send an iterator through as 'data' you can't know the length in advance, and rather than let the len(data) raise an exception catlee thought it's better to raise an exception to tell the user exactly why the code failed (ie, because the user sent an iterator and there's no way to meaningfully find the Content-Length of that).

    As for the catching exceptions vs using isinstance: I thought about it for a while, I think something like this feels right to me:

    try:
    self.sock.sendall(data)
    except TypeError:

          if isinstance(data, collections.Iterable):
              for d in t:
                  self.sock.sendall(d)
          else:
              raise TypeError("data should be a bytes-like object or an iterable, got %r" % type(it))

    anyway, calling iter(data) is equivalent to calling data.__iter__(), so catching the exception is equivalent to hasattr(data, '__iter__'), which is roughly the same as isinstance(data, collections.Iterable). so we try the most straightforward method (sending everything) then if that fails, data is either an iterator or a wrong type.

    @sorcio
    Copy link
    Mannequin

    sorcio mannequin commented Nov 30, 2010

    len(data) will raise anyway.

    No, it won't, if the iterable happens to be a sequence.

    @pitrou
    Copy link
    Member

    pitrou commented Nov 30, 2010

    > len(data) will raise anyway.

    No, it won't, if the iterable happens to be a sequence.

    Well, it seems the patch is confused between iterable and iterator. Only
    iterators have a __next__, but they usually don't have a __len__.

    The patch should really check for iterables, so it should use:

    if isinstance(data, collections.Iterable)
        raise ValueError#etc.
    

    @IIIIllllIIIIllllIIIIllllIIIIllllIIIIll
    Copy link
    Mannequin

    davide: yeah, hasattr("lol", '__next__') == False, even though strings are Iterable; so for strings and other such sequences the len(data) line will be executed. So technically we shouldn't say "No Content-Length specified for iterable body" but we should say "No Content-Length specified for iterable body that is not a sequence".

    Basically, this whole patch (both parts of it) will be much better off iif there is a clean way to say "a is an iterable but a is not a sequence", because even though b'this is a message' is Iterable, we want to treat it differently compared to, say, a generator object; we do NOT want to use the Iterator features (iter, next) of it, we want to use the sequencey features (by sending the whole chunk of it, by calling len)

    @orsenthil
    Copy link
    Member

    Xuanji, a wording which does convey the approximate meaning is fine. I think, the Exception error messages will help the people based on the Context.

    • Lets have the ValueError raised from the urllib/request.py. Changing it to isinstance(data,collections.Iterable) as Antoine suggested is okay here too.
    • Same change for http.client code checking for an Iterable as specified in msg122905

    tests are fine. Docs and NEWS should be updated.

    Georg: Is it okay, if we push this feature in before Dec 4th, beta1 release?

    @pitrou
    Copy link
    Member

    pitrou commented Nov 30, 2010

    One way to check that it's bytes-compatible is to take a memoryview of it:

    >>> memoryview(b"abc")
    <memory at 0x1cf5120>
    >>> memoryview(bytearray(b"abc"))
    <memory at 0x1cf55a0>
    >>> memoryview(array.array('b', b"abc"))
    <memory at 0x1cf52a0>
    
    >>> memoryview([b"abc"])
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: cannot make memory view because object does not have the buffer interface
    >>> memoryview("abc")
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: cannot make memory view because object does not have the buffer interface

    @pitrou
    Copy link
    Member

    pitrou commented Nov 30, 2010

    • Lets have the ValueError raised from the urllib/request.py. Changing
      it to isinstance(data,collections.Iterable) as Antoine suggested is
      okay here too.

    Xuanji is right: it's not. We want bytes to be accepted, and it's an
    iterable.

    @pitrou
    Copy link
    Member

    pitrou commented Nov 30, 2010

    Answering to myself, sorry. memoryview() does return the right answer of whether the object supports the buffer interface, *however* it doesn't mean the len() will be right. For example, take an array.array of ints:

    >>> memoryview(array.array("I", [1,2,3]))
    <memory at 0x1cf5720>
    >>> len(array.array("I", [1,2,3]))
    3
    >>> len(memoryview(array.array("I", [1,2,3])))
    3
    >>> len(bytes(array.array("I", [1,2,3])))
    12

    len() returns 3 but the number of bytes written out by sendall() will really be 12...

    *However*, the right len can be calculated using the memoryview:

    >>> m = memoryview(array.array("I", [1,2,3]))
    >>> len(m) * m.itemsize
    12

    (without actually converting to a bytes object, so all this is cheap even for very large buffers)

    @orsenthil
    Copy link
    Member

    Updated patch after correcting the mistake (bytes vs str) in the previous one.

    @orsenthil
    Copy link
    Member

    On Thu, Dec 02, 2010 at 03:08:55AM +0000, Xuanji Li wrote:

    req = Request("http://example.com/", "")

    That should be:

    req = Request("http://example.com/", b"")

    I updated some of those in the latest updated patch.

    @IIIIllllIIIIllllIIIIllllIIIIllllIIIIll
    Copy link
    Mannequin

    And my version too...

    @IIIIllllIIIIllllIIIIllllIIIIllllIIIIll
    Copy link
    Mannequin

    Actually I don't think you can go around changing test_urllib2.py, they are after all regression tests... and surely some users will send "" as data.

    My patch handles the case of 0-length data specially, removing the need for 2) Can call len but not buffer: assume len == #bytes

    @orsenthil
    Copy link
    Member

    On Thu, Dec 02, 2010 at 03:16:53AM +0000, Xuanji Li wrote:

    And my version too...

    + if hasattr(data, '__len__') and not len(data):
    + request.add_unredirected_header('Content-length', '0')

    This is very special case. It should not be so. There was wrong
    examples in the test_urllib2 which I just corrected.

    Expect for the difference in (it = iter(data) - Which I am seeking
    some advice too). Rest of the things in both patches are same.

    Thank you.

    @orsenthil
    Copy link
    Member

    On Thu, Dec 02, 2010 at 03:20:05AM +0000, Xuanji Li wrote:

    Actually I don't think you can go around changing test_urllib2.py,
    they are after all regression tests... and surely some users will
    send "" as data.

    Think about it this way. In py3k, socket.send can handle only bytes,
    not string. So sending "" as data to the socket.send is wrong.

    @IIIIllllIIIIllllIIIIllllIIIIllllIIIIll
    Copy link
    Mannequin

    thought of a better way, we can check if data is true; this will cover "" and b"". this is in issue_3243_py3k_7.patch

    @orsenthil
    Copy link
    Member

    Glaring at my mistakes.

    There was wrong examples in the test_urllib2 which I just corrected.

    There were..

    Expect for the difference in (it = iter(data) - Which I am seeking

    Except for ...

    @merwok
    Copy link
    Member

    merwok commented Dec 2, 2010

    > What is __read__ supposed to be?
    I don't think is required.

    The point is that Python does not define a __read__ magic method. Only read exists, on file objects.

    @IIIIllllIIIIllllIIIIllllIIIIllllIIIIll
    Copy link
    Mannequin

    eric: sorry, that has been fixed in issue_3243_py3k_7.patch

    @orsenthil
    Copy link
    Member

    This is committed in r87399. Documentation and NEWS is added. Thanks for the patch and review comments.

    @IIIIllllIIIIllllIIIIllllIIIIllllIIIIll
    Copy link
    Mannequin

    Hello, sorry for commenting on a closed issue... but I think the documentation change is incorrect. In urllib.request.rst, it says data is a string. However as seen in the changes to test_urllib2.py, data must be a bytes object rather than a string object. I think we should reopen this issue and change the documentation.

    Thanks!

    @IIIIllllIIIIllllIIIIllllIIIIllllIIIIll
    Copy link
    Mannequin

    Also, the patch for request.py contains a debug statement, print(data)

    @birkenfeld
    Copy link
    Member

    Good catch, fixed in r87400.

    @IIIIllllIIIIllllIIIIllllIIIIllllIIIIll
    Copy link
    Mannequin

    Also, I am not familiar with the backward-comparability requirements of py3k, but orsenthil's patch will break py3k code that relies on data being a string, shouldn't this be mentioned somewhere?

    Finally, I do not understand why my proposed change, which is to add

    + if not data:
    + request.add_unredirected_header('Content-length', '0')

    so that code that relies on being able to pass a blank string as data (for instance, the code in test_urllib2.py) is still able to do so.

    @birkenfeld
    Copy link
    Member

    Hmm, indeed: Senthil, could data be a string in earlier versions?

    If yes, the code should be changed to still allow that. (But after beta2 please, it's already tagged.)

    @birkenfeld birkenfeld reopened this Dec 19, 2010
    @birkenfeld
    Copy link
    Member

    Raising priority so that this gets sorted out before final.

    @orsenthil
    Copy link
    Member

    Xuanji, Thanks for the comments on 'data' being bytes. I had just cared to add the feature information. I think that data detail should have been updated too.

    I think for your other two questions, we discussed it msg123051 - socket in py3k handles only bytes, sending string was wrong and test_urllib2 had the mistake in sending zero length strings which weren't detected.

    However, let me see if we have to accommodate those "very special case" where data can be a zero length string just to accommodate the mistake it was present in the earlier version.

    @orsenthil
    Copy link
    Member

    A mistake with Content-Length in the previous commit resolved in revision 87469.

    @birkenfeld
    Copy link
    Member

    Anything left to do here, Senthil?

    @orsenthil
    Copy link
    Member

    Hi Georg,

    In the previous comment, I had written that

    'let me see if we have to accommodate those "very special case"
    where data can be a zero length string just to accommodate the mistake
    it was present in the earlier version.'

    More, I think about it, the more it seems to me that accommodating that
    special wrong case is not required.

    The decision boils down to this.

    1. In py3k, data for POST should be bytes.

    2. But urllib.request had a 'bug/hole' that when a zero length string
      was passed as a data, it did not raise an exception. There were cases
      in test_urllib2 where zero length string was passed. I argue that it
      was more of a mistake than, what we actually wanted to test it.

    Because there is NO practical scenario where Zero length data as a
    POST is useful.

    1. Now, with the introduction of this feature requested in this issue,
      this zero length string would raise an Exception and would demand that
      even if it is zero length, please send it as bytes.

    IMO, this is correct behavior and we need not accommodate the previous
    one.

    So, I would recommend closing this bug as Fixed without further change.
    At most, a NEWS item can be added to explain point 2.

    @IIIIllllIIIIllllIIIIllllIIIIllllIIIIll
    Copy link
    Mannequin

    Yes, I think we should close it.

    @orsenthil
    Copy link
    Member

    Thanks for the note. I shall some details to the NEWS entry before the release. Closing this report.

    @tzickel
    Copy link
    Mannequin

    tzickel mannequin commented Oct 30, 2018

    This patch was opened for 2.7 but never applied there ?

    #10226

    This causes a bug with requests HTTP library (and others as well as httplib) when you want to send an iterable object as POST data (with a non-chunked way), it works in Python 3 but not 2, and this effects behaviour and performance...

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    easy stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants