There is an inconsistency when parsing with headersonly=True. According to the documentation, get_payload() with message/rfc822 should should return a list of Message objects, not a string. But using headersonly=True produces a non-multipart Message object:
>>> m = Parser().parsestr("Content-Type: message/rfc822\r\n\r\n", headersonly=True)
>>> m.get_content_type()
'message/rfc822'
>>> m.is_multipart() # Doc says True
False
>>> m.get_payload() # Doc says list of Message objects
''
Related to this, setting headersonly=True can also cause a internal inconsistency. Maybe this is why it was called a “hack”:
>>> Parser().parsestr("Content-Type: message/delivery-status\r\nInvalid line\r\n\r\n", headersonly=True).as_string()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.5/email/message.py", line 159, in as_string
g.flatten(self, unixfrom=unixfrom)
File "/usr/lib/python3.5/email/generator.py", line 115, in flatten
self._write(msg)
File "/usr/lib/python3.5/email/generator.py", line 181, in _write
self._dispatch(msg)
File "/usr/lib/python3.5/email/generator.py", line 214, in _dispatch
meth(msg)
File "/usr/lib/python3.5/email/generator.py", line 331, in _handle_message_delivery_status
g.flatten(part, unixfrom=False, linesep=self._NL)
File "/usr/lib/python3.5/email/generator.py", line 106, in flatten
old_msg_policy = msg.policy
AttributeError: 'str' object has no attribute 'policy'
I think it may be best only change get_payload() to return a string in the next Python version (3.7), with appropriate documentation updates. For existing Python versions, perhaps urllib3 could check if the list returned by get_payload() only has trivial empty Message objects (no header fields and only empty payloads themselves).
If we agree that only a feature change for 3.7 is appropriate, there are other problems with the current parsing of HTTP headers that could also be looked at:
* Only a blank line should end a header section (Issue 24363, Issue 26686)
* “From” line should be a defect
* Use “email” package’s HTTP parsing policy
* Don’t assume Latin-1 encoding (Issue 27716)
* Avoid double-handling (header lines are parsed in http.client, then joined together and parsed again in email.feedparser)
|