classification
Title: email.decode_header fails if msg.__getitem__ returns Header object
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.2, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: r.david.murray Nosy List: python-dev, r.david.murray, sdaoden
Priority: normal Keywords: patch

Created on 2011-03-17 11:53 by sdaoden, last changed 2011-06-18 17:51 by r.david.murray. This issue is now closed.

Files
File name Uploaded Description Edit
rdm-postman.tbz sdaoden, 2011-03-17 13:52
decode_Header.patch r.david.murray, 2011-03-25 15:04
Messages (19)
msg131242 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-03-17 11:53
My minimal failing test case dragged yet another EMAIL
error to the light!!!
Man, man, man - it's really great that QNX fund money
so that you have the time to fix this broken thing!
It's got washed away, but
http://bugs.python.org/file21210/email_header.diff
can be used on top of 42cd61b96e54 to fix the following:

______
Traceback (most recent call last):
  [FOREIGN CODE]
  File "/Users/steffen/usr/bin/s-postman.py", line 1765, in _bewitch_msg
    self._msg[n] = email.header.make_header(email.header.decode_header(b))
  File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/header.py", line 73, in decode_header
    if not ecre.search(header):
Exception: TypeError: expected string or buffer
______


However, if that's patched in, we end up here

______
Traceback (most recent call last):
  [FOREIGN CODE]
  File "/Users/steffen/usr/bin/s-postman.py", line 1765, in _bewitch_msg
    self._msg[n] = email.header.make_header(email.header.decode_header(b))
  File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/header.py", line 154, in make_header
    h.append(s, charset)
  File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/header.py", line 278, in append
    s.encode(output_charset, errors)
Exception: UnicodeEncodeError: 'ascii' codec can't encode character '\ufffd' in position 7: ordinal not in range(128)
______


Let me know if you want that '456943 17 Mar 12:51 rdm-postman.tbz'
thing, it's waiting for you.
It contains a digest mbox, a config and the patched S-Postman.
Maybe i can strip it to 420000 if i spend some more time on it.
msg131244 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-03-17 13:03
I don't see a test case here, did you forget to attatch something?

Also, in this:

  elf._msg[n] = email.header.make_header(email.header.decode_header(b))

unless 'b' is an ASCII-only string, it isn't going to work.
msg131249 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-03-17 13:52
On Thu, Mar 17, 2011 at 01:03:43PM +0000, R. David Murray wrote:
> did you forget to attatch something?

I'll attach that TBZ archive you may use freely. 
It's very large, but it contains everything i have to state to EMAIL. 
It's the requested test. 
Unpack it, cd to the created rdm-postman directory, 
and run runit.sh from *within there*. 
You may want to remove the '-VV' option to s-postman.py, it 
produces a huge amount of output on STDERR...

The contained s-postman.py is patched; search 'DAVID PATCH' - 
remove that patch and see your code handling mail gracefully. 
(The only places which are of interest are indeed 'class 
Ticket._bewitch_msg()' and 'class Splitter._text_digest()', and in 
the end, but i also have a job to do ...)

> Also, in this:
> 
>   elf._msg[n] = email.header.make_header(email.header.decode_header(b))
> 
> unless 'b' is an ASCII-only string, it isn't going to work.

David, i have no idea what it is, except that it is part of 
a tuple which has been returned by Message.items(). 
This is your side of the story.
msg131255 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-03-17 14:38
If the message contains 8bit bytes in a header, then getitem is going to return a Header object.  decode_header does not operate on Header objects, as you have observed.  Thinking about it some more, having decode_header operate on a Header and return its chunks is a decent binary interface for Header.  This is right on the borderline between a feature and a bug fix, but given that getitem returning Header after a parse is a new feature in 3.2, I think I'm going to treat it as a bug that decode_header doesn't handle that case.
msg132088 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-03-25 13:51
Thinking about this some more, I now think it is incorrect that an 8bit header causes getitem to return a Header object.  I think instead it should be returning the stringified version of the header, including the unknown-8bit encoding.  That way decode_header can be used as normal to recover the original bytes.

I hate making a change like this in a bug fix release, but in this case I think it is the right thing to do.  I will propose a patch soon.
msg132091 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-03-25 14:23
On Fri, Mar 25, 2011 at 01:51:46PM +0000, R. David Murray wrote:
> I now think it is incorrect that an 8bit header causes getitem 
> to return a Header object.
> I think instead it should be returning the stringified version 
> of the header, including the unknown-8bit encoding.

It seems to be much better to either return only strings or only 
objects. 
You've prominently documented that (in a model generated from 
bytes) objects are returned, and *DesignThoughts* states that 
*all* headers will be represented as objects in the upcoming 
package, so it's my guess that many people who are currently 
programming email things using Py3K go for Header. 

Thus, to give you a neat NONONO - why not simply allowing header 
objects for decode_header(), too?
:)
msg132093 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-03-25 14:29
I documented that?  Where?

It is true that the fact that all headers will be objects when using the email6 API was one reason I did it this way, but in hindsight I don't think it was the right choice.  However, I/we may now be stuck with it, in which case you are right, we could make decode_header just return the chunks from a Header object.
msg132095 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-03-25 14:42
On Fri, Mar 25, 2011 at 02:29:24PM +0000, R. David Murray wrote:
> I documented that?  Where?

Changeset:   67447:cad1811d9e13
user:        R. David Murray <rdmurray@bitdance.com>
date:        Fri Jan 07 23:25:30 2011 +0000
summary:     #10686: recode non-ASCII headers to 'unknown-8bit' instead of ?s.

It was a friday.
So, have a nice weekend.
:)
msg132097 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-03-25 14:52
Heh.

OK, so I think we're stuck with it, then.  It does mean I don't have to handle certain other edge cases, and can punt more convenient handling of them into email6.  I'll make the patch for decode_header instead, then.
msg132099 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-03-25 15:04
OK, here is the patch.
msg132101 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-03-25 15:25
On Fri, Mar 25, 2011 at 03:04:29PM +0000, R. David Murray wrote:
> OK, here is the patch.

Works fine at a first glance and for me - i see you didn't stuck :/. 
Say, though not belonging here, can you think of problems incurred 
in message.py:_sanitize_headers() due to

   if _has_surrogates(value):

instead of:

   if _has_surrogates_or_8bit(value):

I stumbled over that place somewhen but fixing it was no real 
help.
msg132109 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-03-25 16:24
Theoretically there should be no way to get bytes into that code path.  I'm sure there's a way if you try hard enough (I haven't tried directly assigning a byte string as a header value, for example), but they would be broken uses of the API.  If you have an identifiable bug, though, open an issue.
msg132141 - (view) Author: Roundup Robot (python-dev) Date: 2011-03-25 19:32
New changeset b21fdfa0019c by R David Murray in branch '3.2':
#11584: Since __getitem__ returns headers, make decode_header handle them.
http://hg.python.org/cpython/rev/b21fdfa0019c

New changeset 12e39cd7a0e4 by R David Murray in branch 'default':
Merge #11584: Since __getitem__ returns headers, make decode_header handle them.
http://hg.python.org/cpython/rev/12e39cd7a0e4
msg134218 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-04-21 15:08
My fix (and the tests) for this are wrong.  decode_header returns (binary, charset) pairs, but the chunks list is (string, charset) pairs.
msg134245 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-04-21 20:07
Note that when this is fixed, make_header on the return value from decode_header will fail because it doesn't know now to handle unknown-8bit.
msg138589 - (view) Author: Roundup Robot (python-dev) Date: 2011-06-18 16:34
New changeset d62e5682a8ac by R David Murray in branch '3.2':
#11584: make decode_header handle Header objects correctly
http://hg.python.org/cpython/rev/d62e5682a8ac

New changeset ce033d252a6d by R David Murray in branch 'default':
merge #11584: make decode_header handle Header objects correctly
http://hg.python.org/cpython/rev/ce033d252a6d
msg138590 - (view) Author: Roundup Robot (python-dev) Date: 2011-06-18 17:03
New changeset 3875ccea6367 by R David Murray in branch '3.2':
#11584: make Header and make_header handle binary unknown-8bit input
http://hg.python.org/cpython/rev/3875ccea6367

New changeset 9569d8c4c781 by R David Murray in branch 'default':
merge #11584: make Header and make_header handle binary unknown-8bit input
http://hg.python.org/cpython/rev/9569d8c4c781
msg138591 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-06-18 17:04
OK, the invariant make_header(decode_header(x)) == x should once again work for anything returned by __getitem__.
msg138592 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-06-18 17:51
Heh, I misstated that invariant, it's only true when x is a Header.
History
Date User Action Args
2011-06-18 17:51:09r.david.murraysetmessages: + msg138592
2011-06-18 17:04:26r.david.murraysetstatus: open -> closed

messages: + msg138591
stage: needs patch -> resolved
2011-06-18 17:03:00python-devsetmessages: + msg138590
2011-06-18 16:34:17python-devsetmessages: + msg138589
2011-04-21 20:07:27r.david.murraysetmessages: + msg134245
2011-04-21 15:08:40r.david.murraysetstatus: closed -> open

messages: + msg134218
stage: resolved -> needs patch
2011-03-25 19:34:55r.david.murraysetstatus: open -> closed
resolution: fixed
stage: needs patch -> resolved
2011-03-25 19:32:57python-devsetnosy: + python-dev
messages: + msg132141
2011-03-25 16:24:04r.david.murraysetmessages: + msg132109
2011-03-25 15:25:02sdaodensetmessages: + msg132101
2011-03-25 15:04:26r.david.murraysetfiles: + decode_Header.patch
keywords: + patch
messages: + msg132099
2011-03-25 14:52:29r.david.murraysetmessages: + msg132097
2011-03-25 14:42:57sdaodensetmessages: + msg132095
2011-03-25 14:29:24r.david.murraysetmessages: + msg132093
2011-03-25 14:23:13sdaodensetmessages: + msg132091
title: email.decode_header fails if msg.__getitem__ returns Header object -> email.decode_header fails if msg.__getitem__ returns Header object
2011-03-25 13:51:45r.david.murraysetmessages: + msg132088
2011-03-17 14:38:09r.david.murraysettitle: email/header.py: missing str()ification, and bogus encode()s -> email.decode_header fails if msg.__getitem__ returns Header object
messages: + msg131255
stage: test needed -> needs patch
2011-03-17 13:52:45sdaodensetfiles: + rdm-postman.tbz

messages: + msg131249
2011-03-17 13:04:23r.david.murraysetassignee: r.david.murray
versions: + Python 3.2
type: behavior
stage: test needed
2011-03-17 13:03:42r.david.murraysetmessages: + msg131244
2011-03-17 11:53:41sdaodencreate