Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTPMessage not documented and has inconsistent API across Py2/Py3 #49023

Closed
beazley mannequin opened this issue Dec 29, 2008 · 16 comments
Closed

HTTPMessage not documented and has inconsistent API across Py2/Py3 #49023

beazley mannequin opened this issue Dec 29, 2008 · 16 comments
Labels
docs Documentation in the Doc dir stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@beazley
Copy link
Mannequin

beazley mannequin commented Dec 29, 2008

BPO 4773
Nosy @warsaw, @birkenfeld, @orsenthil, @devdanzin, @ezio-melotti, @merwok, @akheron, @vadmium
Dependencies
  • bpo-3428: httplib.HTTPMessage undocumented
  • Files
  • addinfourl_removal.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2015-02-14.18:34:37.829>
    created_at = <Date 2008-12-29.21:42:31.884>
    labels = ['type-bug', 'library', 'docs']
    title = 'HTTPMessage not documented and has inconsistent API across Py2/Py3'
    updated_at = <Date 2017-04-28.09:20:24.183>
    user = 'https://bugs.python.org/beazley'

    bugs.python.org fields:

    activity = <Date 2017-04-28.09:20:24.183>
    actor = 'Socob'
    assignee = 'jhylton'
    closed = True
    closed_date = <Date 2015-02-14.18:34:37.829>
    closer = 'berker.peksag'
    components = ['Documentation', 'Library (Lib)']
    creation = <Date 2008-12-29.21:42:31.884>
    creator = 'beazley'
    dependencies = ['3428']
    files = ['13473']
    hgrepos = []
    issue_num = 4773
    keywords = ['patch']
    message_count = 16.0
    messages = ['78486', '78491', '81799', '84217', '84219', '84223', '84243', '84577', '84783', '154781', '154835', '154850', '179377', '179379', '185458', '235603']
    nosy_count = 17.0
    nosy_names = ['jhylton', 'barry', 'georg.brandl', 'beazley', 'jjlee', 'orsenthil', 'ajaksu2', 'bmiller', 'ezio.melotti', 'eric.araujo', 'srid', 'petri.lehtinen', 'martin.panter', 'piotr.dobrogost', 'joel.verhagen', 'hltbra', 'Socob']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue4773'
    versions = ['Python 2.7', 'Python 3.2', 'Python 3.3']

    @beazley
    Copy link
    Mannequin Author

    beazley mannequin commented Dec 29, 2008

    A file-like object u returned by the urlopen() function in both Python
    2.6/3.0 has a method info() that returns a 'HTTPMessage' object. For
    example:

    ::: Python 2.6
    >>> from urllib2 import urlopen
    >>> u = urlopen("http://www.python.org")
    >>> u.info()
    <httplib.HTTPMessage instance at 0xce5738>
    >>> 
    
    ::: Python 3.0
    >>> from urllib.request import urlopen
    >>> u = urlopen("http://www.python.org")
    >>> u.info()
    <http.client.HTTPMessage object at 0x4bfa10>
    >>>

    So far, so good. HTTPMessage is defined in two different modules, but
    that's fine (it's just library reorganization).

    Two major problems:

    1. There is no documentation whatsoever on HTTPMessage. No description
      in the docs for httplib (python 2.6) or http.client (python 3.0).

    2. The HTTPMessage object in Python 2.6 derives from mimetools.Message
      and has a totally different programming interface than HTTPMessage in
      Python 3.0 which derives from email.message.Message. Check it out:

    :::Python 2.6
    >>> dir(u.info())
    ['__contains__', '__delitem__', '__doc__', '__getitem__', '__init__', 
    '__iter__', '__len__', '__module__', '__setitem__', '__str__', 
    'addcontinue', 'addheader', 'dict', 'encodingheader', 'fp', 'get', 
    'getaddr', 'getaddrlist', 'getallmatchingheaders', 'getdate', 
    'getdate_tz', 'getencoding', 'getfirstmatchingheader', 'getheader', 
    'getheaders', 'getmaintype', 'getparam', 'getparamnames', 'getplist', 
    'getrawheader', 'getsubtype', 'gettype', 'has_key', 'headers', 
    'iscomment', 'isheader', 'islast', 'items', 'keys', 'maintype', 
    'parseplist', 'parsetype', 'plist', 'plisttext', 'readheaders', 
    'rewindbody', 'seekable', 'setdefault', 'startofbody', 'startofheaders', 
    'status', 'subtype', 'type', 'typeheader', 'unixfrom', 'values']
    
    :::Python 3.0
    >>> dir(u.info())
    ['__class__', '__contains__', '__delattr__', '__delitem__', '__dict__', 
    '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', 
    '__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', 
    '__len__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', 
    '__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__', 
    '__str__', '__subclasshook__', '__weakref__', '_charset', 
    '_default_type', '_get_params_preserve', '_headers', '_payload', 
    '_unixfrom', 'add_header', 'as_string', 'attach', 'defects', 
    'del_param', 'epilogue', 'get', 'get_all', 'get_boundary', 
    'get_charset', 'get_charsets', 'get_content_charset', 
    'get_content_maintype', 'get_content_subtype', 'get_content_type', 
    'get_default_type', 'get_filename', 'get_param', 'get_params', 
    'get_payload', 'get_unixfrom', 'getallmatchingheaders', 'is_multipart', 
    'items', 'keys', 'preamble', 'replace_header', 'set_boundary', 
    'set_charset', 'set_default_type', 'set_param', 'set_payload', 
    'set_type', 'set_unixfrom', 'values', 'walk']

    I know that getting rid of mimetools was desired, but I have no idea if
    changing the API on HTTPMessage was intended or not. In any case, it's
    one of the only cases in the entire library where the programming
    interface to an object radically changes from 2.6 -> 3.0.

    I ran into this problem with code that was trying to properly determine
    the charset encoding of the byte string returned by urlopen().

    I haven't checked whether 2to3 deals with this or not, but it might be
    something for someone to look at in their copious amounts of spare time.

    @beazley beazley mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Dec 29, 2008
    @beazley
    Copy link
    Mannequin Author

    beazley mannequin commented Dec 29, 2008

    Verified that 2to3 does not fix this.

    @devdanzin
    Copy link
    Mannequin

    devdanzin mannequin commented Feb 12, 2009

    ISTM that these issues tend to go all the way up to test coverage and
    organization :/

    @devdanzin devdanzin mannequin added the docs Documentation in the Doc dir label Feb 12, 2009
    @devdanzin devdanzin mannequin assigned birkenfeld Feb 12, 2009
    @jhylton
    Copy link
    Mannequin

    jhylton mannequin commented Mar 26, 2009

    No deep thought was given to the HTTPMessage API. Here's the extent of
    the discussion that I can find. I've changed the names, but you can
    find the full discussion at http://bugs.python.org/issue2848

    A: mimetools.Message is compatible with email.message.Message, right?
    B: I don't know how compatible it is.
    C: The APIs are bit different, but it should be possible to migrate from
    the old to the new.

    @jhylton
    Copy link
    Mannequin

    jhylton mannequin commented Mar 26, 2009

    A plausible solution is to pick some core set of functionality that we
    think people need and document that API. We can modify one or both of
    the current implementations to include that functionality. What do we need?

    @warsaw
    Copy link
    Member

    warsaw commented Mar 26, 2009

    I propose that you only document the getitem header access API. I.e.
    the thing that info() gives you can be used to access the message
    headers via message['content-type']. That's an API common to both
    rfc822.Messages (the ultimate base class of mimetools.Message) and
    email.message.Message.

    @bmiller
    Copy link
    Mannequin

    bmiller mannequin commented Mar 27, 2009

    On Thu, Mar 26, 2009 at 4:29 PM, Barry A. Warsaw <report@bugs.python.org>wrote:

    Barry A. Warsaw <barry@python.org> added the comment:

    I propose that you only document the getitem header access API. I.e.
    the thing that info() gives you can be used to access the message
    headers via message['content-type']. That's an API common to both
    rfc822.Messages (the ultimate base class of mimetools.Message) and
    email.message.Message.

    As I've found myself in the awkward position of having to explain the new
    3.0 api to my students I've thought about this and have some
    ideas/questions.
    I'm also willing to help with the documentation or any enhancements.

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: 'addinfourl' object is unsubscriptable

    I wish I new what an addinfourl object was.

    'Fri, 27 Mar 2009 00:41:34 GMT'

    'Fri, 27 Mar 2009 00:41:34 GMT'

    ['Date', 'Server', 'Last-Modified', 'ETag', 'Accept-Ranges',
    'Content-Length', 'Connection', 'Content-Type']

    Using x.headers over x.info() makes the most sense to me, but I don't know
    that I can give any good rationale. Which would we want to document?

    'text/html; charset=ISO-8859-1'

    I guess technically this is correct since the charset is part of the
    Content-Type header in HTTP but it does make life difficult for what I think
    will be a pretty common use case in this new urllib: read from the url (as
    bytes) and then decode them into a string using the appropriate character
    set.

    As you follow this road, you have the confusing option of these three calls:

    'iso-8859-1'
    >>> x.headers.get_charsets()
    ['iso-8859-1']

    I think it should be a bug that get_charset() does not return anything in
    this case. It is not at all clear why get_content_charset() and
    get_charset() should have different behavior.

    Brad

    ----------
    nosy: +barry


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue4773\>


    @jhylton
    Copy link
    Mannequin

    jhylton mannequin commented Mar 30, 2009

    The attached file is vaguely related to the current discussion. I'd
    like to document the API for the urllib response, but I'd also like to
    simplify the implementation on the py3k side. We can document the
    simple API on the py3k side, then support some version of that API on
    the py2k side.

    Apologies for the noise in this patch. I was on a plane, and I don't
    understand DVCS yet.

    @orsenthil
    Copy link
    Member

    I spent sometime on the patch which replaces the self.msg usage with
    self.headers in http.client. Everything is fine.
    The next step is to provide an interface in the urllib.response and the
    equivalent changes to py2k.

    @orsenthil orsenthil assigned jhylton and unassigned birkenfeld Mar 31, 2009
    @joelverhagen
    Copy link
    Mannequin

    joelverhagen mannequin commented Mar 2, 2012

    There is a difference in what HTTPResponse.getheaders() returns.

    Python 2.7.2 (default, Jun 12 2011, 14:24:46) [MSC v.1500 64 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import httplib
    >>> c = httplib.HTTPConnection('www.joelverhagen.com')
    >>> c.request('GET', '/sandbox/tests/cookies.php')
    >>> c.getresponse().getheaders()
    [('content-length', '0'), ('set-cookie', 'test_cookie1=foobar; expires=Fri, 02-Mar-2012 16:54:15 GMT, test_cookie2=barfoo; expires=Fri, 02-Mar-2012 16:54:15 GMT'), ('vary', 'Accept-Encoding'), ('server', 'Apache'), ('date', 'Fri, 02 Mar 2012 16:53:15 GMT'), ('content-type', 'text/html')]
    
    Python 3.2.2 (default, Sep  4 2011, 09:07:29) [MSC v.1500 64 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from http import client
    >>> c = client.HTTPConnection('www.joelverhagen.com')
    >>> c.request('GET', '/sandbox/tests/cookies.php')
    >>> c.getresponse().getheaders()
    [('Date', 'Fri, 02 Mar 2012 16:56:40 GMT'), ('Server', 'Apache'), ('Set-Cookie', 'test_cookie1=foobar; expires=Fri, 02-Mar-2012 16:57:40 GMT'), ('Set-Cookie', 'test_cookie2=barfoo; expires=Fri, 02-Mar-2012 16:57:40 GMT'), ('Vary', 'Accept-Encoding'), ('Content-Length', '0'), ('Content-Type', 'text/html')]

    As you can see, in 2.7.2 HTTPResponse.getheaders() in 2.7.2 joins headers with the same name by ", ". In 3.2.2, the headers are kept separate and two or more 2-tuples.

    This causes problems if you convert the list of 2-tuples to a dict, because the keys collide (causing all but one of the values associated the non-unique keys to be overwritten). It looks like this problem is caused by using the email header parser (which keeps the keys and values as separate 2-tuples). In Python 2.7.2, the HTTPMessage.addheader(...) function does the comma-separating.

    Is this API change intentional? Should HTTPResponse.getheaders() comma-separate the values like the HTTPResponse.getheader(...) function (in both 2.7.2 and 3.2.2)?

    See also:
    urllib3/urllib3#3 (comment)

    @merwok
    Copy link
    Member

    merwok commented Mar 3, 2012

    Now that two Python 3 releases have been made, I don’t know if changing the code is still an option. The doc can certainly still be improved.

    Adding Ezio to nosy; I think it’s you who opened a bug report about removing superfluous getter methods in the addinfourl class (and other ugliness).

    @merwok merwok changed the title HTTPMessage not documented and has inconsistent API across 2.6/3.0 HTTPMessage not documented and has inconsistent API across Py2/Py3 Mar 3, 2012
    @ezio-melotti
    Copy link
    Member

    Yep, bpo-12707.

    @piotrdobrogost
    Copy link
    Mannequin

    piotrdobrogost mannequin commented Jan 8, 2013

    @joel.verhagen

    "Should HTTPResponse.getheaders() comma-separate the values (...)"

    No, it should not. RFC 2616 states:

    "Multiple message-header fields with the same field-name MAY be present in a message if and only if the entire field-value for that header field is defined as a comma-separated list [i.e., #(values)]."

    As field-values for some header fields ('Set-Cookie' being an example) are not defined as a comma-separated list such fields must not be merged.

    Side note:
    RFC 2616 is very soon to be obsoleted by the new RFC from httpbin working group. However, in the current/newest draft (http://trac.tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-21#section-3.2) although wording is different the sense is the same.

    @piotrdobrogost
    Copy link
    Mannequin

    piotrdobrogost mannequin commented Jan 8, 2013

    ...continuing my previous comment

    Joining headers with the same name by ", " by HTTPResponse.getheaders() in Python 2.7 is wrong and there's a bug for this - see http://bugs.python.org/issue1660009

    @hltbra
    Copy link
    Mannequin

    hltbra mannequin commented Mar 28, 2013

    I just caught a bug because on Python 3 HTTPMessage has get_param, while on Python 2 there is getparam, with a different method signature. I am trying to figure out a solution so my code can run in both python 2 and 3 without ifs on python version.

    @vadmium
    Copy link
    Member

    vadmium commented Feb 9, 2015

    Jeremy’s patch appears to have been merged in revision 9eceb618274a. A documentation entry for the HTTPMessage class was also added in 2009, pointing back to email.message.Message. So is there anything left to do for this issue?

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    docs Documentation in the Doc dir stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    7 participants