Issue11243
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2011-02-18 14:53 by sdaoden, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
email_message.2.patch | sdaoden, 2011-02-19 15:15 | |||
emails_with_Headers.patch | r.david.murray, 2011-03-15 04:21 | |||
email_header.diff | sdaoden, 2011-03-15 13:43 | |||
email-header.2.diff | sdaoden, 2011-03-16 15:07 | |||
11243-test.1.py | sdaoden, 2011-03-16 15:09 |
Messages (24) | |||
---|---|---|---|
msg128782 - (view) | Author: Steffen Daode Nurpmeso (sdaoden) | Date: 2011-02-18 14:53 | |
Hy David, while hacking a bit on my thing i've found two places where header.Header needs to be explicitely converted via str(). Have a nice weekend. |
|||
msg128783 - (view) | Author: Steffen Daode Nurpmeso (sdaoden) | Date: 2011-02-18 14:54 | |
(Will get that tracker right as time goes by.) |
|||
msg128784 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2011-02-18 14:58 | |
Thanks for the report. I probably won't have time to look at this for a bit. |
|||
msg128787 - (view) | Author: Steffen Daode Nurpmeso (sdaoden) | Date: 2011-02-18 15:13 | |
We all know EMail 6.0 will blow them off the streets in the end. |
|||
msg128808 - (view) | Author: Steffen Daode Nurpmeso (sdaoden) | Date: 2011-02-18 19:58 | |
P.S.: maybe this completes the byte. Have a nice weekend nevertheless - if you can. Traceback (most recent call last): File "/Users/steffen/usr/bin/s-postman.py", line 1419, in _walk self._tickets.extend(Ticket.process_message(msg)) File "/Users/steffen/usr/bin/s-postman.py", line 1275, in process_message splitter = splitter(msg) File "/Users/steffen/usr/bin/s-postman.py", line 401, in _openbsd_text charset = msg.get_content_charset('iso-8859-1') File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/message.py", line 820, in get_content_charset charset = self.get_param('charset', missing) File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/message.py", line 628, in get_param for k, v in self._get_params_preserve(failobj, header): File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/message.py", line 565, in _get_params_preserve for p in _parseparam(';' + value): Exception: TypeError: Can't convert 'Header' object to str implicitly Traceback (most recent call last): File "/Users/steffen/usr/bin/s-postman.py", line 1419, in _walk self._tickets.extend(Ticket.process_message(msg)) File "/Users/steffen/usr/bin/s-postman.py", line 1275, in process_message splitter = splitter(msg) File "/Users/steffen/usr/bin/s-postman.py", line 402, in _openbsd_text lines = msg.get_payload().splitlines() File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/message.py", line 244, in get_payload cte = self.get('content-transfer-encoding', '').lower() Exception: AttributeError: 'Header' object has no attribute 'lower' |
|||
msg128846 - (view) | Author: Steffen Daode Nurpmeso (sdaoden) | Date: 2011-02-19 13:56 | |
I also got this now, it happens with and without the str() patch stuff. (Note that message.py line numbers are off by 1-2 lines ..). I don't know more about that in the moment, but the only thing that's changed is that i do: alln = self._msg.items()[:] # In fact -> ensure all are header.Header # If any converted (str->Header) header names exist ... if len(alln): # Delete *all* occurrences of h (doesn't throw) for (n, b) in alln: del self._msg[n] # And append in order for (n, b) in alln: self._msg[n] = b Traceback (most recent call last): File "/Users/steffen/usr/bin/s-postman.py", line 953, in save_ticket mb.add(ticket.message()) File "/Users/steffen/usr/opt/py3k/lib/python3.2/mailbox.py", line 595, in add self._toc[self._next_key] = self._append_message(message) File "/Users/steffen/usr/opt/py3k/lib/python3.2/mailbox.py", line 733, in _append_message offsets = self._install_message(message) File "/Users/steffen/usr/opt/py3k/lib/python3.2/mailbox.py", line 805, in _install_message self._dump_message(message, self._file, self._mangle_from_) File "/Users/steffen/usr/opt/py3k/lib/python3.2/mailbox.py", line 215, in _dump_message gen.flatten(message) File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/generator.py", line 88, in flatten self._write(msg) File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/generator.py", line 134, in _write self._dispatch(msg) File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/generator.py", line 151, in _dispatch main = msg.get_content_maintype() File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/message.py", line 528, in get_content_maintype ctype = self.get_content_type() File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/message.py", line 516, in get_content_type ctype = _splitparam(value)[0].lower() File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/message.py", line 53, in _splitparam a, sep, b = param.partition(';') Exception: AttributeError: 'NoneType' object has no attribute 'partition' |
|||
msg128847 - (view) | Author: Steffen Daode Nurpmeso (sdaoden) | Date: 2011-02-19 15:04 | |
The latter one was my fault, i did LIST.append(name, HEADER.append(xy)), assuming that HEADER.append() returns self though it doesn't. Sorry. However - shouldn't Message.__setitem__ check for valid arguments (see msg128846 code snippet)? It would have saved some anger... |
|||
msg128848 - (view) | Author: Steffen Daode Nurpmeso (sdaoden) | Date: 2011-02-19 15:15 | |
However, maybe that 5.1 message.py thing doesn't like header.Header instances. Also extending msg128846, this one is related to the str() issue - added an extended email_message.2.patch. Traceback (most recent call last): File "/Users/steffen/usr/bin/s-postman.py", line 953, in save_ticket mb.add(ticket.message()) File "/Users/steffen/usr/opt/py3k/lib/python3.2/mailbox.py", line 595, in add self._toc[self._next_key] = self._append_message(message) File "/Users/steffen/usr/opt/py3k/lib/python3.2/mailbox.py", line 733, in _append_message offsets = self._install_message(message) File "/Users/steffen/usr/opt/py3k/lib/python3.2/mailbox.py", line 805, in _install_message self._dump_message(message, self._file, self._mangle_from_) File "/Users/steffen/usr/opt/py3k/lib/python3.2/mailbox.py", line 215, in _dump_message gen.flatten(message) File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/generator.py", line 88, in flatten self._write(msg) File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/generator.py", line 134, in _write self._dispatch(msg) File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/generator.py", line 151, in _dispatch main = msg.get_content_maintype() File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/message.py", line 530, in get_content_maintype ctype = self.get_content_type() File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/message.py", line 518, in get_content_type ctype = _splitparam(value)[0].lower() File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/message.py", line 53, in _splitparam a, sep, b = param.partition(';') Exception: AttributeError: 'Header' object has no attribute 'partition' P.S.: i'm hard to take, and 'programming is an iterative task'... |
|||
msg128855 - (view) | Author: Steffen Daode Nurpmeso (sdaoden) | Date: 2011-02-19 16:19 | |
David, i'm going down now. I'll raise the type to 'crash', because, in fact, EMail 5.1 doesn't really take care of header.Header objects in message.Message headers, which doesn't sound pretty useful to me! The patch is sufficient for my broken thing (it doesn't produce a traceback at the moment - time to go for a sunday!), but since i don't really have a glue of mailbox.py / email/* it may not cover all places where a header.Header may occur but the code in fact assumes a str (and implicit conversion of Header to str doesn't work). |
|||
msg128856 - (view) | Author: Steffen Daode Nurpmeso (sdaoden) | Date: 2011-02-19 16:35 | |
... as a last though of mine, here is a header of the well known spam mail: From MAILER-DAEMON Sat Feb 19 15:58:47 2011 Date: =?latin1?q?Tue=2C_4_Jan_2011_17=3A37=3A26_+0100_=28CET=29?= From: =?latin1?q?=22SAJATNAPTAR=2ECOM=22_=3Cinfo=40sajatnaptar=2Ecom=3E?= To: =?latin1?q?source-changes=40cvs=2Eopenbsd=2Eorg?= Subject: =?latin1?q?Falinapt=3Fr_ingyenes_h=3Fzhozsz=3Fll=3Ft=3Fssal=2E_M=3Fr_rendelt=3Fl=3F_Olvass_el!?= Message-ID: =?latin1?q?=3C20110104053726system=40sajatnaptar=2Ecom=3E?= Content-Type: =?latin1?q?text/plain=3B_charset=3Diso-8859-1?= Shouldn't email/* be smart enough to know that Date:, Content-Type:, Message-ID:, plus some other well-known, RFC-documented header names need *not* be converted, whatever type of object is used to represent them? This could be implemented with a simple {}. Bye. |
|||
msg128895 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2011-02-20 06:43 | |
Well, it's not a crash, a crash is when the interpreter segfaults. I'm not clear on why you are having problems, actually, since if you treat the messages as binary (which they are), then you shouldn't be getting Headers introduced into the mix. But like I said I don't have time to look at this right now. The problem with Header not being handled properly does need to be addressed, though. |
|||
msg128896 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2011-02-20 07:13 | |
Ah, I think I see what is going on. If I'm right, then you are right, this is a serious problem for actually processing spam emails using email 5.1. Unfortunately it's too late to do anything for 3.2.0. But email 5.1 is still worlds better at handling well formed binary emails. |
|||
msg129046 - (view) | Author: Steffen Daode Nurpmeso (sdaoden) | Date: 2011-02-22 10:42 | |
(Of course you're right. It just reads, passes around and spits out that ... of a mail just the same it came in. Performance is very well, too, just about 1.5 seconds - some two weeks ago it took about 1.1 seconds, but with Python 2.7 - so! P.S.: my very own desire was just to have a single entry point where i can drop whatever ... in and get something back which may be just as silly but at least conformant, e.g. '__setitem__[x] = ADJUST(__getitem__[x])'; imagine what a swiss ;-) would need to do to get to that point with EMail 5.1: x=header.decode_header(), if x[1] is None check wether string is ASCII clean, otherwise hard-encode with latin1/unknown 8-bit; but even if x[1] is not None the content may be malformed; and then remember that all these steps can throw exceptions, which need to be handled because the mail *will* be processed. Of course, we're talking about the header here only 8-)) |
|||
msg129055 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2011-02-22 12:11 | |
We might wind up with a relatively quick 3.2.1, in which case we can get this fixed then. The parser is supposed to operate without throwing exceptions (just setting defects), so if you find a case where *parsing* throws an exception please open an issue. Once you start manipulating the model, of course, you may get exceptions. I'm not sure what should happen if, say, the charset name is invalid (8bit), but certainly throwing an error because it is a Header rather than a string is wrong and needs fixing. |
|||
msg130953 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2011-03-15 04:21 | |
Here is a patch that adds tests for the methods I didn't previous have test for. There may still be some headers that I'm not testing for the 'contains binary' case, but this is certainly more comprehensive than we had before. Please test and let me know if it works; it should, since the code patch is very close to the one you suggested. |
|||
msg130972 - (view) | Author: Steffen Daode Nurpmeso (sdaoden) | Date: 2011-03-15 13:43 | |
On Tue, Mar 15, 2011 at 04:21:24AM +0000, R. David Murray wrote: > Please test and let me know if it works; it should, since the code patch is very close to the one you suggested. ;-) Hello David, hope you have a good time at Pycon! (Just Googled, weather will be fine right after all of you will see the sun and the blue sky once again! Hey -- there is a world out there!! :) Just like i've stated on EMAIL-SIG, you really have convinced me of simply using the binary feedparser, but since you have found even more places where explicit str() is necessary, this package is once again at least 50% better than before! But i've readded a email.header.make_header(email.header.decode_header(b)) thing in my Ticket._bewitch_msg() and ran that patched S-Postman ;=) on an 3.8 MB mbox file (by the way, if you need f..d .. emails, subscribe to OpenBSD Misc), and i'll end up like this: Traceback (most recent call last): File "/Users/steffen/usr/bin/s-postman.py", line 1815, in _walk self._tickets.extend(Ticket.process_message(msg)) File "/Users/steffen/usr/bin/s-postman.py", line 1671, in process_message return [Ticket(m, _targets=rsm.targets) for m in splitter] File "/Users/steffen/usr/bin/s-postman.py", line 1671, in <listcomp> return [Ticket(m, _targets=rsm.targets) for m in splitter] File "/Users/steffen/usr/bin/s-postman.py", line 1681, in __init__ self._bewitch_msg() File "/Users/steffen/usr/bin/s-postman.py", line 1752, in _bewitch_msg self._msg[n] = email.header.make_header(email.header.decode_header(b)) File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/header.py", line 73, in decode_header if not ecre.search(header): Exception: TypeError: expected string or buffer Here: header==<class 'email.header.Header'> And: Traceback (most recent call last): File "/Users/steffen/usr/bin/s-postman.py", line 1815, in _walk self._tickets.extend(Ticket.process_message(msg)) File "/Users/steffen/usr/bin/s-postman.py", line 1671, in process_message return [Ticket(m, _targets=rsm.targets) for m in splitter] File "/Users/steffen/usr/bin/s-postman.py", line 1671, in <listcomp> return [Ticket(m, _targets=rsm.targets) for m in splitter] File "/Users/steffen/usr/bin/s-postman.py", line 1681, in __init__ self._bewitch_msg() File "/Users/steffen/usr/bin/s-postman.py", line 1752, in _bewitch_msg self._msg[n] = email.header.make_header(email.header.decode_header(b)) File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/header.py", line 154, in make_header h.append(s, charset) File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/header.py", line 270, in append s = s.decode(input_charset, errors) Exception: AttributeError: 'Header' object has no attribute 'decode' Here s==<class 'email.header.Header'> And after adding # Steffen is out now if isinstance(s, email.header.Header): s = str(s) i got stuck on this: Traceback (raising call only): File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/header.py", line 278, in append s.encode(output_charset, errors) Exception: UnicodeEncodeError: 'ascii' codec can't encode character '\ufffd' in position 7: ordinal not in range(128) {Aaaargh! Special case UNICODE replacement character, mongrel!} s was a Header here, too. I apply a simple email_header.diff which applies cleanly to a49bda. Hope i could help a bit. |
|||
msg131102 - (view) | Author: Steffen Daode Nurpmeso (sdaoden) | Date: 2011-03-16 11:47 | |
On Tue, Mar 15, 2011 at 04:21:24AM +0000, R. David Murray wrote: > Please test and let me know if it works Spending some more time on that, continuing yesterdays session where i got stuck. When i instead do (still in header.py:Header.append()): # Steffen is out now again if isinstance(s, Header): s = str(s) errors = 'replace' Traceback (most recent call last): File "/Users/steffen/usr/bin/s-postman.py", line 1212, in save_ticket mb.add(ticket.message()) File "/Users/steffen/usr/opt/py3k/lib/python3.3/mailbox.py", line 279, in add self._dump_message(message, tmp_file) File "/Users/steffen/usr/opt/py3k/lib/python3.3/mailbox.py", line 215, in _dump_message gen.flatten(message) File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/generator.py", line 91, in flatten self._write(msg) File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/generator.py", line 144, in _write self._write_headers(msg) File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/generator.py", line 363, in _write_headers self.write(v.encode(maxlinelen=self._maxheaderlen)+NL) File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/header.py", line 320, in encode formatter.feed(lines[0], charset) File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/header.py", line 386, in feed encoded_string = charset.header_encode(string) File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/charset.py", line 296, in header_encode header_bytes = _encode(string, codec) File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/charset.py", line 163, in _encode return string.encode(codec) Exception: UnicodeEncodeError: 'ascii' codec can't encode character '\ufffd' in position 7: ordinal not in range(128) I've updated and am at db73857. And i am *really* looking forward for 'defects'. |
|||
msg131115 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2011-03-16 13:58 | |
Steffen, these look like different kinds of errors than the one you reported in this ticket. If they are, could you open a new issue? Either way, simple reproducers would be the most helpful. |
|||
msg131126 - (view) | Author: Steffen Daode Nurpmeso (sdaoden) | Date: 2011-03-16 15:07 | |
On Wed, Mar 16, 2011 at 01:58:40PM +0000, R. David Murray wrote: > Steffen, these look like different kinds of errors than the one you reported in this ticket. > If they are, could you open a new issue? Either way, simple reproducers would be the most helpful. I'm on db73857669fb, email/message.py is patched with your code, and email/header.py is patched with email-header.2.diff. 11243-test.1.py will traceback: 15:53 ~/tmp $ python3 11243-test.1.py Traceback (most recent call last): File "11243-test.1.py", line 17, in <module> msg[f] = email.header.make_header(email.header.decode_header(b)) File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/header.py", line 154, in make_header h.append(s, charset) File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/header.py", line 279, in append s.encode(output_charset, errors) UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 7: ordinal not in range(128) I'll be down the next couple of hours, but in the meanwhile that's all i can do anyway... And well, i won't open a new issue due to the stuff from msg131102, because that happen(ed) if the commented out code from email-header.2.diff is applied, which is non-real-life code-flow? (Though: a Message is happily read in via email.feedparser.BytesFeedParser() and finally adjusted via header.make_header(header.decode_header(b)) because you've asked me to do so, and just as is done by 11243-test.1.py.) |
|||
msg131127 - (view) | Author: Steffen Daode Nurpmeso (sdaoden) | Date: 2011-03-16 15:09 | |
Sorry, i've forgot the test ;) |
|||
msg131174 - (view) | Author: Steffen Daode Nurpmeso (sdaoden) | Date: 2011-03-16 20:53 | |
David, it's so hard to tell! If you want a big fat thing that misuses your code, try out my S-Postman (it's on Bitbucket and no URL from me and you need tip). And i could post you a small config and the EMAIL breaking patch for it and a 430KB TBZ archive of mail digests which contain many bad mails (this is why you need the postman in the end, it'll split the archives up for you). This is something you could stick around with for a while :) If you want all or parts of that, post and you have it tomorrow. |
|||
msg131199 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2011-03-17 00:17 | |
Steffen, what you are doing in 11243-test is not something that the current email package supports. String input to message_as_string must be ASCII only in email 5.1/python3.2. Likewise for decode_header. To get unicode in to a header, you have to pass it in to the constructor of Header, and then it encodes it as an encoded word in whatever character set you tell it to use. The make_header(decode_header(stuff)) would theoretically return stuff, except that as you can see if stuff is non-ascii (or a Header), it won't work. If you are handling 'dirty' data, you have to stick to the binary interfaces, as discussed. Header needs a binary interface, but it doesn't have one (yet?). Yes, this interface is not an optimal interface. That's what email6 is about :) So, absent a minimal failing test case, I'm going to commit the patch. |
|||
msg131206 - (view) | Author: Roundup Robot (python-dev) | Date: 2011-03-17 01:13 | |
New changeset 74a8c46fb272 by R David Murray in branch '3.2': #11243: tests and fixes for handling of 'dirty data' in additional methods http://hg.python.org/cpython/rev/74a8c46fb272 New changeset 82ecfcd31250 by R David Murray in branch 'default': Merge #11243 fix from 3.2. http://hg.python.org/cpython/rev/82ecfcd31250 |
|||
msg131207 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2011-03-17 01:15 | |
I'm closing this issue. If you have a specific test case that is still failing, please open a new issue. And thanks for testing this fix. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:13 | admin | set | github: 55452 |
2011-03-17 01:15:11 | r.david.murray | set | status: open -> closed messages: + msg131207 resolution: fixed stage: patch review -> resolved |
2011-03-17 01:13:17 | python-dev | set | nosy:
+ python-dev messages: + msg131206 |
2011-03-17 00:17:31 | r.david.murray | set | messages: + msg131199 |
2011-03-16 20:53:27 | sdaoden | set | messages: + msg131174 |
2011-03-16 15:09:53 | sdaoden | set | files:
+ 11243-test.1.py messages: + msg131127 |
2011-03-16 15:07:04 | sdaoden | set | files:
+ email-header.2.diff messages: + msg131126 |
2011-03-16 13:58:39 | r.david.murray | set | messages: + msg131115 |
2011-03-16 11:47:17 | sdaoden | set | messages: + msg131102 |
2011-03-15 13:43:40 | sdaoden | set | files:
+ email_header.diff messages: + msg130972 |
2011-03-15 04:21:21 | r.david.murray | set | files:
+ emails_with_Headers.patch messages: + msg130953 |
2011-02-22 12:11:14 | r.david.murray | set | messages: + msg129055 |
2011-02-22 10:42:45 | sdaoden | set | messages: + msg129046 |
2011-02-21 23:15:53 | r.david.murray | set | priority: normal -> high |
2011-02-20 07:13:12 | r.david.murray | set | messages: + msg128896 |
2011-02-20 06:43:11 | r.david.murray | set | type: crash -> behavior messages: + msg128895 |
2011-02-19 16:35:45 | sdaoden | set | messages: + msg128856 |
2011-02-19 16:19:09 | sdaoden | set | type: behavior -> crash messages: + msg128855 |
2011-02-19 16:02:03 | sdaoden | set | files: - email_message.patch |
2011-02-19 15:15:03 | sdaoden | set | files:
+ email_message.2.patch messages: + msg128848 |
2011-02-19 15:04:37 | sdaoden | set | messages: + msg128847 |
2011-02-19 13:56:07 | sdaoden | set | messages: + msg128846 |
2011-02-18 19:58:07 | sdaoden | set | messages: + msg128808 |
2011-02-18 15:13:35 | sdaoden | set | messages:
+ msg128787 components: + Library (Lib) |
2011-02-18 14:58:58 | r.david.murray | set | versions:
+ Python 3.3 title: email/message.py str conversion [patch] -> email/message.py str conversion messages: + msg128784 assignee: r.david.murray stage: patch review |
2011-02-18 14:54:23 | sdaoden | set | type: behavior messages: + msg128783 versions: + Python 3.2 |
2011-02-18 14:53:19 | sdaoden | create |