Author martin.panter
Recipients barry, brokenenglish, martin.panter, r.david.murray
Date 2017-01-23.23:56:05
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1485215765.76.0.806298331348.issue29353@psf.upfronthosting.co.za>
In-reply-to
Content
There is an inconsistency when parsing with headersonly=True. According to the documentation, get_payload() with message/rfc822 should should return a list of Message objects, not a string. But using headersonly=True produces a non-multipart Message object:

>>> m = Parser().parsestr("Content-Type: message/rfc822\r\n\r\n", headersonly=True)
>>> m.get_content_type()
'message/rfc822'
>>> m.is_multipart()  # Doc says True
False
>>> m.get_payload()  # Doc says list of Message objects
''

Related to this, setting headersonly=True can also cause a internal inconsistency. Maybe this is why it was called a “hack”:

>>> Parser().parsestr("Content-Type: message/delivery-status\r\nInvalid line\r\n\r\n", headersonly=True).as_string()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.5/email/message.py", line 159, in as_string
    g.flatten(self, unixfrom=unixfrom)
  File "/usr/lib/python3.5/email/generator.py", line 115, in flatten
    self._write(msg)
  File "/usr/lib/python3.5/email/generator.py", line 181, in _write
    self._dispatch(msg)
  File "/usr/lib/python3.5/email/generator.py", line 214, in _dispatch
    meth(msg)
  File "/usr/lib/python3.5/email/generator.py", line 331, in _handle_message_delivery_status
    g.flatten(part, unixfrom=False, linesep=self._NL)
  File "/usr/lib/python3.5/email/generator.py", line 106, in flatten
    old_msg_policy = msg.policy
AttributeError: 'str' object has no attribute 'policy'

I think it may be best only change get_payload() to return a string in the next Python version (3.7), with appropriate documentation updates. For existing Python versions, perhaps urllib3 could check if the list returned by get_payload() only has trivial empty Message objects (no header fields and only empty payloads themselves).

If we agree that only a feature change for 3.7 is appropriate, there are other problems with the current parsing of HTTP headers that could also be looked at:

* Only a blank line should end a header section (Issue 24363, Issue 26686)
* “From” line should be a defect
* Use “email” package’s HTTP parsing policy
* Don’t assume Latin-1 encoding (Issue 27716)
* Avoid double-handling (header lines are parsed in http.client, then joined together and parsed again in email.feedparser)
History
Date User Action Args
2017-01-23 23:56:05martin.pantersetrecipients: + martin.panter, barry, r.david.murray, brokenenglish
2017-01-23 23:56:05martin.pantersetmessageid: <1485215765.76.0.806298331348.issue29353@psf.upfronthosting.co.za>
2017-01-23 23:56:05martin.panterlinkissue29353 messages
2017-01-23 23:56:05martin.pantercreate