Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http.client request and send method have some datatype issues #67928

Open
bitdancer opened this issue Mar 22, 2015 · 12 comments
Open

http.client request and send method have some datatype issues #67928

bitdancer opened this issue Mar 22, 2015 · 12 comments
Labels
easy stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@bitdancer
Copy link
Member

BPO 23740
Nosy @bitdancer, @axitkhurana, @vadmium, @serhiy-storchaka, @demianbrecht, @moreati, @Mariatta
Dependencies
  • bpo-27340: bytes-like objects with socket.sendall(), SSL, and http.client
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2015-03-22.19:28:44.718>
    labels = ['easy', 'type-feature']
    title = 'http.client request and send method have some datatype issues'
    updated_at = <Date 2021-03-05.21:15:27.127>
    user = 'https://github.com/bitdancer'

    bugs.python.org fields:

    activity = <Date 2021-03-05.21:15:27.127>
    actor = 'Alex.Willmer'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = []
    creation = <Date 2015-03-22.19:28:44.718>
    creator = 'r.david.murray'
    dependencies = ['27340']
    files = []
    hgrepos = []
    issue_num = 23740
    keywords = ['easy']
    message_count = 12.0
    messages = ['238928', '238935', '238952', '238960', '238964', '238969', '239023', '241190', '272118', '387967', '387975', '388165']
    nosy_count = 7.0
    nosy_names = ['r.david.murray', 'axitkhurana', 'martin.panter', 'serhiy.storchaka', 'demian.brecht', 'Alex.Willmer', 'Mariatta']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = 'needs patch'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue23740'
    versions = ['Python 3.5']

    @bitdancer
    Copy link
    Member Author

    While committing the patch for bpo-23539 I decided to rewrite the 'request' docs for clarity. I doing so I found that http.client isn't as consistent as it could be about how it handles bytes and strings. Two points specifically: it will only take the length of a bytes-like object (to supply a default Content-Length header) if isinstance(x, bytes) is true (that is, it doesn't take the length of eg array or memoryview objects), and (2) if an iterable is passed in, it must be an iterable of bytes-like objects. Since it already automatically encodes string objects and text files, for consistency it should probably also encode strings if they are passed in via an iterator.

    @bitdancer bitdancer added easy type-feature A feature request or enhancement labels Mar 22, 2015
    @serhiy-storchaka
    Copy link
    Member

    See also bpo-23350 and bpo-23360.

    @vadmium
    Copy link
    Member

    vadmium commented Mar 22, 2015

    Summary of the main supported types as I see them, whether documented, undocumented, or only working by accident:

    • None
    • Bytes-like sequences, e.g. bytes(), bytearray. I believe Content-Length is actually automatically set for all these types.
    • Arbitrary bytes-like objects, including array.array("I") and ctypes structures, if custom Content-Length provided (HTTP over TCP only)
    • str() objects, to be automatically encoded with Latin-1
    • File reader objects not supporting stat(), e.g. BytesIO
    • File with valid stat().st_size, i.e. not named pipes
    • Binary file reader
    • Text file reader with mode attribute without a "b", automatically encoded with Latin-1
    • Any iterable of bytes-like sequences
    • Any iterable of arbitrary bytes-like objects (HTTP over TCP only)

    Arbitrary bytes-like objects are not properly supported by SSLSocket.sendall(). After all, the non-SSL socket.sendall() documentation does not explicitly mention supporting arbitrary bytes-like objects either, though it does seem to support them in practice.

    Suggested documentation fixes and additions:

    • Clarify Content-Length is added for all sequence objects, not just “string or bytes objects”
    • Warn that a custom Content-Length should be provided with non-bytes sequences, and with special files like named pipes, since the len() or stat() call will give the wrong value
    • end_headers() should mention byte string rather than just “string”, since it does not (yet) accept Latin-1 text strings
    • end_headers() should not mention sending in the same packet either, since separate sendall() calls are made for all cases; see bpo-23302
    • send() should clarify what it accepts
    • Either omit mentioning arbitrary bytes-like objects, or add support to SSLSocket.sendall() and document support by non-SSL socket.sendall()

    I would support encoding iterables of str() objects for consistency. The patch for bpo-23350 already does this, although I am starting to question the wisdom of special-casing lists and tuples in that patch.

    @vadmium
    Copy link
    Member

    vadmium commented Mar 23, 2015

    As well as encoding iterables of str(), text files could also be handled more consistently by checking the read() return type. That would eliminate the complication of checking for a "b" mode.

    @demianbrecht
    Copy link
    Mannequin

    demianbrecht mannequin commented Mar 23, 2015

    FWIW, I've done some additional work to request/send in issue bpo-12319 where I've added support for chunked request encoding.

    @david

    if an iterable is passed in, it must be an iterable of bytes-like objects

    This specific issue is addressed in the patch in bpo-23350.

    @martin

    text files could also be handled more consistently by checking the read() return type

    I wouldn't be opposed to this at all. In fact, I was going to initially make that change in bpo-12319, but wanted to keep the change surface minimal and realistically, peeking at the data type rather than checking for 'b' in mode doesn't /really/ make that much of a difference.

    @bitdancer
    Copy link
    Member Author

    Yeah, if we're going to check the type for iterables to convert strings, we might as well check the type for read() as well.

    The bit about the len not being set except for str and bytes was me mis-remembering what I read in the code. (The isinstance check is about whether _send_output sends the body in the same packet.)

    @serhiy-storchaka
    Copy link
    Member

    Note that for file-like objects we have also the same issue as bpo-22468. Content-Length is not determined correctly for GzipFile and like.

    @vadmium
    Copy link
    Member

    vadmium commented Apr 16, 2015

    The priority of file-like objects versus bytes-like and iterable objects should be well defined. See bpo-5038: mmap objects seem to satisfy all three interfaces, but the result may be different depending on the file position.

    @vadmium
    Copy link
    Member

    vadmium commented Aug 7, 2016

    I’ve decided I would prefer deprecating the Latin-1 encoding, rather than adding more encoding support for iterables and text files. If not deprecating it altogether, at least prefer just ASCII encoding (like Python 2). The urlopen() function already rejects text str, and in bpo-26045 people often want or expect UTF-8.

    Here is a summary I made of the different data types handled for the body object, from highest priority to lowest priority.

             HTTPConnection   HTTPConn. default                    
    

    body _send_output() Content-Length urlopen()
    =========== =============== ================= ==================
    None No body 0 or unset bpo-23539 C-L unset
    Has read() read() bpo-1065257 * bpo-5038

    • Possible bugs
      † Degenerate cases not worth supporting IMO
      C-L = Content-Length header field, set by default or required to be specified in various cases
      Byte seq. = Sequence of bytes, including “bytes” type, bytearray, array("B"), etc
      Bytes-like = Any C-contiguous buffer, including zero- and multi-dimensional arrays, and with items other than bytes
      Iterable = Iterable of bytes or bytes-like objects (depending on bpo-27340 about SSL)

    @moreati
    Copy link
    Mannequin

    moreati mannequin commented Mar 2, 2021

    A data point found while I researched this

    MyPy typeshed [1] currently declares

    _DataType = Union[bytes, IO[Any], Iterable[bytes], str]
    class HTTPConnection:
        def send(self, data: _DataType) -> None: ...

    [1] https://github.com/python/typeshed/blob/e2967a8beee9e079963ea91a67087ba8fded1d0b/stdlib/http/client.pyi#L26

    @moreati
    Copy link
    Mannequin

    moreati mannequin commented Mar 2, 2021

    First stab at characterising http.client.HTTPConnnection.send().

    https://github.com/moreati/bpo-23740

    This uses a webserver that returns request details, in the body of the response. Raw (TCP level) content is included. It shares a similar purpose to HTTP TRACE command. In principal the bytes that HTTPConection.send() writes will match to the bytes returned (after they're decapsulated from the JSON). I've not tested that aspect yet.

    TODO

    • further testing (verify round trip, bytes in = bytes out)
    • cover multiple Python versions
    • cover cases such client manually setting Content-Length

    @moreati
    Copy link
    Mannequin

    moreati mannequin commented Mar 5, 2021

    http_dump.py now covers CPython 3.6-3.10 (via Tox), and HTTPSConnection https://github.com/moreati/bpo-23740

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @iritkatriel iritkatriel added the stdlib Python modules in the Lib dir label Nov 27, 2023
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    easy stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants