Issue11243
Created on 2011-02-18 14:53 by sdaoden, last changed 2011-03-17 01:15 by r.david.murray. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| email_message.2.patch | sdaoden, 2011-02-19 15:15 | |||
| emails_with_Headers.patch | r.david.murray, 2011-03-15 04:21 | |||
| email_header.diff | sdaoden, 2011-03-15 13:43 | |||
| email-header.2.diff | sdaoden, 2011-03-16 15:07 | |||
| 11243-test.1.py | sdaoden, 2011-03-16 15:09 | |||
| Messages (24) | |||
|---|---|---|---|
| msg128782 - (view) | Author: Steffen Daode Nurpmeso (sdaoden) | Date: 2011-02-18 14:53 | |
Hy David, while hacking a bit on my thing i've found two places where header.Header needs to be explicitely converted via str(). Have a nice weekend. |
|||
| msg128783 - (view) | Author: Steffen Daode Nurpmeso (sdaoden) | Date: 2011-02-18 14:54 | |
(Will get that tracker right as time goes by.) |
|||
| msg128784 - (view) | Author: R. David Murray (r.david.murray) * ![]() |
Date: 2011-02-18 14:58 | |
Thanks for the report. I probably won't have time to look at this for a bit. |
|||
| msg128787 - (view) | Author: Steffen Daode Nurpmeso (sdaoden) | Date: 2011-02-18 15:13 | |
We all know EMail 6.0 will blow them off the streets in the end. |
|||
| msg128808 - (view) | Author: Steffen Daode Nurpmeso (sdaoden) | Date: 2011-02-18 19:58 | |
P.S.: maybe this completes the byte.
Have a nice weekend nevertheless - if you can.
Traceback (most recent call last):
File "/Users/steffen/usr/bin/s-postman.py", line 1419, in _walk
self._tickets.extend(Ticket.process_message(msg))
File "/Users/steffen/usr/bin/s-postman.py", line 1275, in process_message
splitter = splitter(msg)
File "/Users/steffen/usr/bin/s-postman.py", line 401, in _openbsd_text
charset = msg.get_content_charset('iso-8859-1')
File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/message.py", line 820, in get_content_charset
charset = self.get_param('charset', missing)
File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/message.py", line 628, in get_param
for k, v in self._get_params_preserve(failobj, header):
File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/message.py", line 565, in _get_params_preserve
for p in _parseparam(';' + value):
Exception: TypeError: Can't convert 'Header' object to str implicitly
Traceback (most recent call last):
File "/Users/steffen/usr/bin/s-postman.py", line 1419, in _walk
self._tickets.extend(Ticket.process_message(msg))
File "/Users/steffen/usr/bin/s-postman.py", line 1275, in process_message
splitter = splitter(msg)
File "/Users/steffen/usr/bin/s-postman.py", line 402, in _openbsd_text
lines = msg.get_payload().splitlines()
File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/message.py", line 244, in get_payload
cte = self.get('content-transfer-encoding', '').lower()
Exception: AttributeError: 'Header' object has no attribute 'lower'
|
|||
| msg128846 - (view) | Author: Steffen Daode Nurpmeso (sdaoden) | Date: 2011-02-19 13:56 | |
I also got this now, it happens with and without the str() patch stuff. (Note that message.py line numbers are off by 1-2 lines ..). I don't know more about that in the moment, but the only thing that's changed is that i do:
alln = self._msg.items()[:] # In fact -> ensure all are header.Header
# If any converted (str->Header) header names exist ...
if len(alln):
# Delete *all* occurrences of h (doesn't throw)
for (n, b) in alln:
del self._msg[n]
# And append in order
for (n, b) in alln:
self._msg[n] = b
Traceback (most recent call last):
File "/Users/steffen/usr/bin/s-postman.py", line 953, in save_ticket
mb.add(ticket.message())
File "/Users/steffen/usr/opt/py3k/lib/python3.2/mailbox.py", line 595, in add
self._toc[self._next_key] = self._append_message(message)
File "/Users/steffen/usr/opt/py3k/lib/python3.2/mailbox.py", line 733, in _append_message
offsets = self._install_message(message)
File "/Users/steffen/usr/opt/py3k/lib/python3.2/mailbox.py", line 805, in _install_message
self._dump_message(message, self._file, self._mangle_from_)
File "/Users/steffen/usr/opt/py3k/lib/python3.2/mailbox.py", line 215, in _dump_message
gen.flatten(message)
File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/generator.py", line 88, in flatten
self._write(msg)
File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/generator.py", line 134, in _write
self._dispatch(msg)
File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/generator.py", line 151, in _dispatch
main = msg.get_content_maintype()
File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/message.py", line 528, in get_content_maintype
ctype = self.get_content_type()
File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/message.py", line 516, in get_content_type
ctype = _splitparam(value)[0].lower()
File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/message.py", line 53, in _splitparam
a, sep, b = param.partition(';')
Exception: AttributeError: 'NoneType' object has no attribute 'partition'
|
|||
| msg128847 - (view) | Author: Steffen Daode Nurpmeso (sdaoden) | Date: 2011-02-19 15:04 | |
The latter one was my fault, i did LIST.append(name, HEADER.append(xy)), assuming that HEADER.append() returns self though it doesn't. Sorry. However - shouldn't Message.__setitem__ check for valid arguments (see msg128846 code snippet)? It would have saved some anger... |
|||
| msg128848 - (view) | Author: Steffen Daode Nurpmeso (sdaoden) | Date: 2011-02-19 15:15 | |
However, maybe that 5.1 message.py thing doesn't like header.Header instances. Also extending msg128846, this one is related to the str() issue - added an extended email_message.2.patch. Traceback (most recent call last): File "/Users/steffen/usr/bin/s-postman.py", line 953, in save_ticket mb.add(ticket.message()) File "/Users/steffen/usr/opt/py3k/lib/python3.2/mailbox.py", line 595, in add self._toc[self._next_key] = self._append_message(message) File "/Users/steffen/usr/opt/py3k/lib/python3.2/mailbox.py", line 733, in _append_message offsets = self._install_message(message) File "/Users/steffen/usr/opt/py3k/lib/python3.2/mailbox.py", line 805, in _install_message self._dump_message(message, self._file, self._mangle_from_) File "/Users/steffen/usr/opt/py3k/lib/python3.2/mailbox.py", line 215, in _dump_message gen.flatten(message) File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/generator.py", line 88, in flatten self._write(msg) File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/generator.py", line 134, in _write self._dispatch(msg) File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/generator.py", line 151, in _dispatch main = msg.get_content_maintype() File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/message.py", line 530, in get_content_maintype ctype = self.get_content_type() File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/message.py", line 518, in get_content_type ctype = _splitparam(value)[0].lower() File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/message.py", line 53, in _splitparam a, sep, b = param.partition(';') Exception: AttributeError: 'Header' object has no attribute 'partition' P.S.: i'm hard to take, and 'programming is an iterative task'... |
|||
| msg128855 - (view) | Author: Steffen Daode Nurpmeso (sdaoden) | Date: 2011-02-19 16:19 | |
David, i'm going down now. I'll raise the type to 'crash', because, in fact, EMail 5.1 doesn't really take care of header.Header objects in message.Message headers, which doesn't sound pretty useful to me! The patch is sufficient for my broken thing (it doesn't produce a traceback at the moment - time to go for a sunday!), but since i don't really have a glue of mailbox.py / email/* it may not cover all places where a header.Header may occur but the code in fact assumes a str (and implicit conversion of Header to str doesn't work). |
|||
| msg128856 - (view) | Author: Steffen Daode Nurpmeso (sdaoden) | Date: 2011-02-19 16:35 | |
... as a last though of mine, here is a header of the well known spam mail:
From MAILER-DAEMON Sat Feb 19 15:58:47 2011
Date: =?latin1?q?Tue=2C_4_Jan_2011_17=3A37=3A26_+0100_=28CET=29?=
From: =?latin1?q?=22SAJATNAPTAR=2ECOM=22_=3Cinfo=40sajatnaptar=2Ecom=3E?=
To: =?latin1?q?source-changes=40cvs=2Eopenbsd=2Eorg?=
Subject: =?latin1?q?Falinapt=3Fr_ingyenes_h=3Fzhozsz=3Fll=3Ft=3Fssal=2E_M=3Fr_rendelt=3Fl=3F_Olvass_el!?=
Message-ID: =?latin1?q?=3C20110104053726system=40sajatnaptar=2Ecom=3E?=
Content-Type: =?latin1?q?text/plain=3B_charset=3Diso-8859-1?=
Shouldn't email/* be smart enough to know that Date:, Content-Type:, Message-ID:, plus some other well-known, RFC-documented header names need *not* be converted, whatever type of object is used to represent them? This could be implemented with a simple {}.
Bye.
|
|||
| msg128895 - (view) | Author: R. David Murray (r.david.murray) * ![]() |
Date: 2011-02-20 06:43 | |
Well, it's not a crash, a crash is when the interpreter segfaults. I'm not clear on why you are having problems, actually, since if you treat the messages as binary (which they are), then you shouldn't be getting Headers introduced into the mix. But like I said I don't have time to look at this right now. The problem with Header not being handled properly does need to be addressed, though. |
|||
| msg128896 - (view) | Author: R. David Murray (r.david.murray) * ![]() |
Date: 2011-02-20 07:13 | |
Ah, I think I see what is going on. If I'm right, then you are right, this is a serious problem for actually processing spam emails using email 5.1. Unfortunately it's too late to do anything for 3.2.0. But email 5.1 is still worlds better at handling well formed binary emails. |
|||
| msg129046 - (view) | Author: Steffen Daode Nurpmeso (sdaoden) | Date: 2011-02-22 10:42 | |
(Of course you're right. It just reads, passes around and spits out that ... of a mail just the same it came in. Performance is very well, too, just about 1.5 seconds - some two weeks ago it took about 1.1 seconds, but with Python 2.7 - so! P.S.: my very own desire was just to have a single entry point where i can drop whatever ... in and get something back which may be just as silly but at least conformant, e.g. '__setitem__[x] = ADJUST(__getitem__[x])'; imagine what a swiss ;-) would need to do to get to that point with EMail 5.1: x=header.decode_header(), if x[1] is None check wether string is ASCII clean, otherwise hard-encode with latin1/unknown 8-bit; but even if x[1] is not None the content may be malformed; and then remember that all these steps can throw exceptions, which need to be handled because the mail *will* be processed. Of course, we're talking about the header here only 8-)) |
|||
| msg129055 - (view) | Author: R. David Murray (r.david.murray) * ![]() |
Date: 2011-02-22 12:11 | |
We might wind up with a relatively quick 3.2.1, in which case we can get this fixed then. The parser is supposed to operate without throwing exceptions (just setting defects), so if you find a case where *parsing* throws an exception please open an issue. Once you start manipulating the model, of course, you may get exceptions. I'm not sure what should happen if, say, the charset name is invalid (8bit), but certainly throwing an error because it is a Header rather than a string is wrong and needs fixing. |
|||
| msg130953 - (view) | Author: R. David Murray (r.david.murray) * ![]() |
Date: 2011-03-15 04:21 | |
Here is a patch that adds tests for the methods I didn't previous have test for. There may still be some headers that I'm not testing for the 'contains binary' case, but this is certainly more comprehensive than we had before. Please test and let me know if it works; it should, since the code patch is very close to the one you suggested. |
|||
| msg130972 - (view) | Author: Steffen Daode Nurpmeso (sdaoden) | Date: 2011-03-15 13:43 | |
On Tue, Mar 15, 2011 at 04:21:24AM +0000, R. David Murray wrote:
> Please test and let me know if it works; it should, since the code patch is very close to the one you suggested.
;-)
Hello David, hope you have a good time at Pycon!
(Just Googled, weather will be fine right after all of you
will see the sun and the blue sky once again!
Hey -- there is a world out there!! :)
Just like i've stated on EMAIL-SIG, you really have convinced me
of simply using the binary feedparser, but since you have found
even more places where explicit str() is necessary, this package
is once again at least 50% better than before!
But i've readded a
email.header.make_header(email.header.decode_header(b))
thing in my Ticket._bewitch_msg() and ran that patched S-Postman
;=) on an 3.8 MB mbox file (by the way, if you need f..d ..
emails, subscribe to OpenBSD Misc), and i'll end up like this:
Traceback (most recent call last):
File "/Users/steffen/usr/bin/s-postman.py", line 1815, in _walk
self._tickets.extend(Ticket.process_message(msg))
File "/Users/steffen/usr/bin/s-postman.py", line 1671, in process_message
return [Ticket(m, _targets=rsm.targets) for m in splitter]
File "/Users/steffen/usr/bin/s-postman.py", line 1671, in <listcomp>
return [Ticket(m, _targets=rsm.targets) for m in splitter]
File "/Users/steffen/usr/bin/s-postman.py", line 1681, in __init__
self._bewitch_msg()
File "/Users/steffen/usr/bin/s-postman.py", line 1752, in _bewitch_msg
self._msg[n] = email.header.make_header(email.header.decode_header(b))
File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/header.py", line 73, in decode_header
if not ecre.search(header):
Exception: TypeError: expected string or buffer
Here: header==<class 'email.header.Header'>
And:
Traceback (most recent call last):
File "/Users/steffen/usr/bin/s-postman.py", line 1815, in _walk
self._tickets.extend(Ticket.process_message(msg))
File "/Users/steffen/usr/bin/s-postman.py", line 1671, in process_message
return [Ticket(m, _targets=rsm.targets) for m in splitter]
File "/Users/steffen/usr/bin/s-postman.py", line 1671, in <listcomp>
return [Ticket(m, _targets=rsm.targets) for m in splitter]
File "/Users/steffen/usr/bin/s-postman.py", line 1681, in __init__
self._bewitch_msg()
File "/Users/steffen/usr/bin/s-postman.py", line 1752, in _bewitch_msg
self._msg[n] = email.header.make_header(email.header.decode_header(b))
File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/header.py", line 154, in make_header
h.append(s, charset)
File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/header.py", line 270, in append
s = s.decode(input_charset, errors)
Exception: AttributeError: 'Header' object has no attribute 'decode'
Here s==<class 'email.header.Header'>
And after adding
# Steffen is out now
if isinstance(s, email.header.Header):
s = str(s)
i got stuck on this:
Traceback (raising call only):
File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/header.py", line 278, in append
s.encode(output_charset, errors)
Exception: UnicodeEncodeError: 'ascii' codec can't encode character '\ufffd' in position 7: ordinal not in range(128)
{Aaaargh! Special case UNICODE replacement character, mongrel!}
s was a Header here, too.
I apply a simple email_header.diff which applies cleanly to a49bda.
Hope i could help a bit.
|
|||
| msg131102 - (view) | Author: Steffen Daode Nurpmeso (sdaoden) | Date: 2011-03-16 11:47 | |
On Tue, Mar 15, 2011 at 04:21:24AM +0000, R. David Murray wrote:
> Please test and let me know if it works
Spending some more time on that, continuing yesterdays session
where i got stuck. When i instead do (still in
header.py:Header.append()):
# Steffen is out now again
if isinstance(s, Header):
s = str(s)
errors = 'replace'
Traceback (most recent call last):
File "/Users/steffen/usr/bin/s-postman.py", line 1212, in save_ticket
mb.add(ticket.message())
File "/Users/steffen/usr/opt/py3k/lib/python3.3/mailbox.py", line 279, in add
self._dump_message(message, tmp_file)
File "/Users/steffen/usr/opt/py3k/lib/python3.3/mailbox.py", line 215, in _dump_message
gen.flatten(message)
File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/generator.py", line 91, in flatten
self._write(msg)
File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/generator.py", line 144, in _write
self._write_headers(msg)
File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/generator.py", line 363, in _write_headers
self.write(v.encode(maxlinelen=self._maxheaderlen)+NL)
File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/header.py", line 320, in encode
formatter.feed(lines[0], charset)
File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/header.py", line 386, in feed
encoded_string = charset.header_encode(string)
File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/charset.py", line 296, in header_encode
header_bytes = _encode(string, codec)
File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/charset.py", line 163, in _encode
return string.encode(codec)
Exception: UnicodeEncodeError: 'ascii' codec can't encode character '\ufffd' in position 7: ordinal not in range(128)
I've updated and am at db73857.
And i am *really* looking forward for 'defects'.
|
|||
| msg131115 - (view) | Author: R. David Murray (r.david.murray) * ![]() |
Date: 2011-03-16 13:58 | |
Steffen, these look like different kinds of errors than the one you reported in this ticket. If they are, could you open a new issue? Either way, simple reproducers would be the most helpful. |
|||
| msg131126 - (view) | Author: Steffen Daode Nurpmeso (sdaoden) | Date: 2011-03-16 15:07 | |
On Wed, Mar 16, 2011 at 01:58:40PM +0000, R. David Murray wrote: > Steffen, these look like different kinds of errors than the one you reported in this ticket. > If they are, could you open a new issue? Either way, simple reproducers would be the most helpful. I'm on db73857669fb, email/message.py is patched with your code, and email/header.py is patched with email-header.2.diff. 11243-test.1.py will traceback: 15:53 ~/tmp $ python3 11243-test.1.py Traceback (most recent call last): File "11243-test.1.py", line 17, in <module> msg[f] = email.header.make_header(email.header.decode_header(b)) File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/header.py", line 154, in make_header h.append(s, charset) File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/header.py", line 279, in append s.encode(output_charset, errors) UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 7: ordinal not in range(128) I'll be down the next couple of hours, but in the meanwhile that's all i can do anyway... And well, i won't open a new issue due to the stuff from msg131102, because that happen(ed) if the commented out code from email-header.2.diff is applied, which is non-real-life code-flow? (Though: a Message is happily read in via email.feedparser.BytesFeedParser() and finally adjusted via header.make_header(header.decode_header(b)) because you've asked me to do so, and just as is done by 11243-test.1.py.) |
|||
| msg131127 - (view) | Author: Steffen Daode Nurpmeso (sdaoden) | Date: 2011-03-16 15:09 | |
Sorry, i've forgot the test ;) |
|||
| msg131174 - (view) | Author: Steffen Daode Nurpmeso (sdaoden) | Date: 2011-03-16 20:53 | |
David, it's so hard to tell! If you want a big fat thing that misuses your code, try out my S-Postman (it's on Bitbucket and no URL from me and you need tip). And i could post you a small config and the EMAIL breaking patch for it and a 430KB TBZ archive of mail digests which contain many bad mails (this is why you need the postman in the end, it'll split the archives up for you). This is something you could stick around with for a while :) If you want all or parts of that, post and you have it tomorrow. |
|||
| msg131199 - (view) | Author: R. David Murray (r.david.murray) * ![]() |
Date: 2011-03-17 00:17 | |
Steffen, what you are doing in 11243-test is not something that the current email package supports. String input to message_as_string must be ASCII only in email 5.1/python3.2. Likewise for decode_header. To get unicode in to a header, you have to pass it in to the constructor of Header, and then it encodes it as an encoded word in whatever character set you tell it to use. The make_header(decode_header(stuff)) would theoretically return stuff, except that as you can see if stuff is non-ascii (or a Header), it won't work. If you are handling 'dirty' data, you have to stick to the binary interfaces, as discussed. Header needs a binary interface, but it doesn't have one (yet?). Yes, this interface is not an optimal interface. That's what email6 is about :) So, absent a minimal failing test case, I'm going to commit the patch. |
|||
| msg131206 - (view) | Author: Roundup Robot (python-dev) | Date: 2011-03-17 01:13 | |
New changeset 74a8c46fb272 by R David Murray in branch '3.2': #11243: tests and fixes for handling of 'dirty data' in additional methods http://hg.python.org/cpython/rev/74a8c46fb272 New changeset 82ecfcd31250 by R David Murray in branch 'default': Merge #11243 fix from 3.2. http://hg.python.org/cpython/rev/82ecfcd31250 |
|||
| msg131207 - (view) | Author: R. David Murray (r.david.murray) * ![]() |
Date: 2011-03-17 01:15 | |
I'm closing this issue. If you have a specific test case that is still failing, please open a new issue. And thanks for testing this fix. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2011-03-17 01:15:11 | r.david.murray | set | status: open -> closed messages: + msg131207 resolution: fixed stage: patch review -> resolved |
| 2011-03-17 01:13:17 | python-dev | set | nosy:
+ python-dev messages: + msg131206 |
| 2011-03-17 00:17:31 | r.david.murray | set | messages: + msg131199 |
| 2011-03-16 20:53:27 | sdaoden | set | messages: + msg131174 |
| 2011-03-16 15:09:53 | sdaoden | set | files:
+ 11243-test.1.py messages: + msg131127 |
| 2011-03-16 15:07:04 | sdaoden | set | files:
+ email-header.2.diff messages: + msg131126 |
| 2011-03-16 13:58:39 | r.david.murray | set | messages: + msg131115 |
| 2011-03-16 11:47:17 | sdaoden | set | messages: + msg131102 |
| 2011-03-15 13:43:40 | sdaoden | set | files:
+ email_header.diff messages: + msg130972 |
| 2011-03-15 04:21:21 | r.david.murray | set | files:
+ emails_with_Headers.patch messages: + msg130953 |
| 2011-02-22 12:11:14 | r.david.murray | set | messages: + msg129055 |
| 2011-02-22 10:42:45 | sdaoden | set | messages: + msg129046 |
| 2011-02-21 23:15:53 | r.david.murray | set | priority: normal -> high |
| 2011-02-20 07:13:12 | r.david.murray | set | messages: + msg128896 |
| 2011-02-20 06:43:11 | r.david.murray | set | type: crash -> behavior messages: + msg128895 |
| 2011-02-19 16:35:45 | sdaoden | set | messages: + msg128856 |
| 2011-02-19 16:19:09 | sdaoden | set | type: behavior -> crash messages: + msg128855 |
| 2011-02-19 16:02:03 | sdaoden | set | files: - email_message.patch |
| 2011-02-19 15:15:03 | sdaoden | set | files:
+ email_message.2.patch messages: + msg128848 |
| 2011-02-19 15:04:37 | sdaoden | set | messages: + msg128847 |
| 2011-02-19 13:56:07 | sdaoden | set | messages: + msg128846 |
| 2011-02-18 19:58:07 | sdaoden | set | messages: + msg128808 |
| 2011-02-18 15:13:35 | sdaoden | set | messages:
+ msg128787 components: + Library (Lib) |
| 2011-02-18 14:58:58 | r.david.murray | set | versions:
+ Python 3.3 title: email/message.py str conversion [patch] -> email/message.py str conversion messages: + msg128784 assignee: r.david.murray stage: patch review |
| 2011-02-18 14:54:23 | sdaoden | set | type: behavior messages: + msg128783 versions: + Python 3.2 |
| 2011-02-18 14:53:19 | sdaoden | create | |
