msg131242 - (view) |
Author: Steffen Daode Nurpmeso (sdaoden) |
Date: 2011-03-17 11:53 |
My minimal failing test case dragged yet another EMAIL
error to the light!!!
Man, man, man - it's really great that QNX fund money
so that you have the time to fix this broken thing!
It's got washed away, but
http://bugs.python.org/file21210/email_header.diff
can be used on top of 42cd61b96e54 to fix the following:
______
Traceback (most recent call last):
[FOREIGN CODE]
File "/Users/steffen/usr/bin/s-postman.py", line 1765, in _bewitch_msg
self._msg[n] = email.header.make_header(email.header.decode_header(b))
File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/header.py", line 73, in decode_header
if not ecre.search(header):
Exception: TypeError: expected string or buffer
______
However, if that's patched in, we end up here
______
Traceback (most recent call last):
[FOREIGN CODE]
File "/Users/steffen/usr/bin/s-postman.py", line 1765, in _bewitch_msg
self._msg[n] = email.header.make_header(email.header.decode_header(b))
File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/header.py", line 154, in make_header
h.append(s, charset)
File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/header.py", line 278, in append
s.encode(output_charset, errors)
Exception: UnicodeEncodeError: 'ascii' codec can't encode character '\ufffd' in position 7: ordinal not in range(128)
______
Let me know if you want that '456943 17 Mar 12:51 rdm-postman.tbz'
thing, it's waiting for you.
It contains a digest mbox, a config and the patched S-Postman.
Maybe i can strip it to 420000 if i spend some more time on it.
|
msg131244 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2011-03-17 13:03 |
I don't see a test case here, did you forget to attatch something?
Also, in this:
elf._msg[n] = email.header.make_header(email.header.decode_header(b))
unless 'b' is an ASCII-only string, it isn't going to work.
|
msg131249 - (view) |
Author: Steffen Daode Nurpmeso (sdaoden) |
Date: 2011-03-17 13:52 |
On Thu, Mar 17, 2011 at 01:03:43PM +0000, R. David Murray wrote:
> did you forget to attatch something?
I'll attach that TBZ archive you may use freely.
It's very large, but it contains everything i have to state to EMAIL.
It's the requested test.
Unpack it, cd to the created rdm-postman directory,
and run runit.sh from *within there*.
You may want to remove the '-VV' option to s-postman.py, it
produces a huge amount of output on STDERR...
The contained s-postman.py is patched; search 'DAVID PATCH' -
remove that patch and see your code handling mail gracefully.
(The only places which are of interest are indeed 'class
Ticket._bewitch_msg()' and 'class Splitter._text_digest()', and in
the end, but i also have a job to do ...)
> Also, in this:
>
> elf._msg[n] = email.header.make_header(email.header.decode_header(b))
>
> unless 'b' is an ASCII-only string, it isn't going to work.
David, i have no idea what it is, except that it is part of
a tuple which has been returned by Message.items().
This is your side of the story.
|
msg131255 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2011-03-17 14:38 |
If the message contains 8bit bytes in a header, then getitem is going to return a Header object. decode_header does not operate on Header objects, as you have observed. Thinking about it some more, having decode_header operate on a Header and return its chunks is a decent binary interface for Header. This is right on the borderline between a feature and a bug fix, but given that getitem returning Header after a parse is a new feature in 3.2, I think I'm going to treat it as a bug that decode_header doesn't handle that case.
|
msg132088 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2011-03-25 13:51 |
Thinking about this some more, I now think it is incorrect that an 8bit header causes getitem to return a Header object. I think instead it should be returning the stringified version of the header, including the unknown-8bit encoding. That way decode_header can be used as normal to recover the original bytes.
I hate making a change like this in a bug fix release, but in this case I think it is the right thing to do. I will propose a patch soon.
|
msg132091 - (view) |
Author: Steffen Daode Nurpmeso (sdaoden) |
Date: 2011-03-25 14:23 |
On Fri, Mar 25, 2011 at 01:51:46PM +0000, R. David Murray wrote:
> I now think it is incorrect that an 8bit header causes getitem
> to return a Header object.
> I think instead it should be returning the stringified version
> of the header, including the unknown-8bit encoding.
It seems to be much better to either return only strings or only
objects.
You've prominently documented that (in a model generated from
bytes) objects are returned, and *DesignThoughts* states that
*all* headers will be represented as objects in the upcoming
package, so it's my guess that many people who are currently
programming email things using Py3K go for Header.
Thus, to give you a neat NONONO - why not simply allowing header
objects for decode_header(), too?
:)
|
msg132093 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2011-03-25 14:29 |
I documented that? Where?
It is true that the fact that all headers will be objects when using the email6 API was one reason I did it this way, but in hindsight I don't think it was the right choice. However, I/we may now be stuck with it, in which case you are right, we could make decode_header just return the chunks from a Header object.
|
msg132095 - (view) |
Author: Steffen Daode Nurpmeso (sdaoden) |
Date: 2011-03-25 14:42 |
On Fri, Mar 25, 2011 at 02:29:24PM +0000, R. David Murray wrote:
> I documented that? Where?
Changeset: 67447:cad1811d9e13
user: R. David Murray <rdmurray@bitdance.com>
date: Fri Jan 07 23:25:30 2011 +0000
summary: #10686: recode non-ASCII headers to 'unknown-8bit' instead of ?s.
It was a friday.
So, have a nice weekend.
:)
|
msg132097 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2011-03-25 14:52 |
Heh.
OK, so I think we're stuck with it, then. It does mean I don't have to handle certain other edge cases, and can punt more convenient handling of them into email6. I'll make the patch for decode_header instead, then.
|
msg132099 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2011-03-25 15:04 |
OK, here is the patch.
|
msg132101 - (view) |
Author: Steffen Daode Nurpmeso (sdaoden) |
Date: 2011-03-25 15:25 |
On Fri, Mar 25, 2011 at 03:04:29PM +0000, R. David Murray wrote:
> OK, here is the patch.
Works fine at a first glance and for me - i see you didn't stuck :/.
Say, though not belonging here, can you think of problems incurred
in message.py:_sanitize_headers() due to
if _has_surrogates(value):
instead of:
if _has_surrogates_or_8bit(value):
I stumbled over that place somewhen but fixing it was no real
help.
|
msg132109 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2011-03-25 16:24 |
Theoretically there should be no way to get bytes into that code path. I'm sure there's a way if you try hard enough (I haven't tried directly assigning a byte string as a header value, for example), but they would be broken uses of the API. If you have an identifiable bug, though, open an issue.
|
msg132141 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2011-03-25 19:32 |
New changeset b21fdfa0019c by R David Murray in branch '3.2':
#11584: Since __getitem__ returns headers, make decode_header handle them.
http://hg.python.org/cpython/rev/b21fdfa0019c
New changeset 12e39cd7a0e4 by R David Murray in branch 'default':
Merge #11584: Since __getitem__ returns headers, make decode_header handle them.
http://hg.python.org/cpython/rev/12e39cd7a0e4
|
msg134218 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2011-04-21 15:08 |
My fix (and the tests) for this are wrong. decode_header returns (binary, charset) pairs, but the chunks list is (string, charset) pairs.
|
msg134245 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2011-04-21 20:07 |
Note that when this is fixed, make_header on the return value from decode_header will fail because it doesn't know now to handle unknown-8bit.
|
msg138589 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2011-06-18 16:34 |
New changeset d62e5682a8ac by R David Murray in branch '3.2':
#11584: make decode_header handle Header objects correctly
http://hg.python.org/cpython/rev/d62e5682a8ac
New changeset ce033d252a6d by R David Murray in branch 'default':
merge #11584: make decode_header handle Header objects correctly
http://hg.python.org/cpython/rev/ce033d252a6d
|
msg138590 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2011-06-18 17:03 |
New changeset 3875ccea6367 by R David Murray in branch '3.2':
#11584: make Header and make_header handle binary unknown-8bit input
http://hg.python.org/cpython/rev/3875ccea6367
New changeset 9569d8c4c781 by R David Murray in branch 'default':
merge #11584: make Header and make_header handle binary unknown-8bit input
http://hg.python.org/cpython/rev/9569d8c4c781
|
msg138591 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2011-06-18 17:04 |
OK, the invariant make_header(decode_header(x)) == x should once again work for anything returned by __getitem__.
|
msg138592 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2011-06-18 17:51 |
Heh, I misstated that invariant, it's only true when x is a Header.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:14 | admin | set | github: 55793 |
2011-06-18 17:51:09 | r.david.murray | set | messages:
+ msg138592 |
2011-06-18 17:04:26 | r.david.murray | set | status: open -> closed
messages:
+ msg138591 stage: needs patch -> resolved |
2011-06-18 17:03:00 | python-dev | set | messages:
+ msg138590 |
2011-06-18 16:34:17 | python-dev | set | messages:
+ msg138589 |
2011-04-21 20:07:27 | r.david.murray | set | messages:
+ msg134245 |
2011-04-21 15:08:40 | r.david.murray | set | status: closed -> open
messages:
+ msg134218 stage: resolved -> needs patch |
2011-03-25 19:34:55 | r.david.murray | set | status: open -> closed resolution: fixed stage: needs patch -> resolved |
2011-03-25 19:32:57 | python-dev | set | nosy:
+ python-dev messages:
+ msg132141
|
2011-03-25 16:24:04 | r.david.murray | set | messages:
+ msg132109 |
2011-03-25 15:25:02 | sdaoden | set | messages:
+ msg132101 |
2011-03-25 15:04:26 | r.david.murray | set | files:
+ decode_Header.patch keywords:
+ patch messages:
+ msg132099
|
2011-03-25 14:52:29 | r.david.murray | set | messages:
+ msg132097 |
2011-03-25 14:42:57 | sdaoden | set | messages:
+ msg132095 |
2011-03-25 14:29:24 | r.david.murray | set | messages:
+ msg132093 |
2011-03-25 14:23:13 | sdaoden | set | messages:
+ msg132091 title: email.decode_header fails if msg.__getitem__ returns Header object -> email.decode_header fails if msg.__getitem__ returns Header object |
2011-03-25 13:51:45 | r.david.murray | set | messages:
+ msg132088 |
2011-03-17 14:38:09 | r.david.murray | set | title: email/header.py: missing str()ification, and bogus encode()s -> email.decode_header fails if msg.__getitem__ returns Header object messages:
+ msg131255 stage: test needed -> needs patch |
2011-03-17 13:52:45 | sdaoden | set | files:
+ rdm-postman.tbz
messages:
+ msg131249 |
2011-03-17 13:04:23 | r.david.murray | set | assignee: r.david.murray versions:
+ Python 3.2 type: behavior stage: test needed |
2011-03-17 13:03:42 | r.david.murray | set | messages:
+ msg131244 |
2011-03-17 11:53:41 | sdaoden | create | |