classification
Title: email/message.py str conversion
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.2, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: r.david.murray Nosy List: python-dev, r.david.murray, sdaoden
Priority: high Keywords: patch

Created on 2011-02-18 14:53 by sdaoden, last changed 2011-03-17 01:15 by r.david.murray. This issue is now closed.

Files
File name Uploaded Description Edit
email_message.2.patch sdaoden, 2011-02-19 15:15
emails_with_Headers.patch r.david.murray, 2011-03-15 04:21
email_header.diff sdaoden, 2011-03-15 13:43
email-header.2.diff sdaoden, 2011-03-16 15:07
11243-test.1.py sdaoden, 2011-03-16 15:09
Messages (24)
msg128782 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-02-18 14:53
Hy David, while hacking a bit on my thing i've found two places where header.Header needs to be explicitely converted via str().
Have a nice weekend.
msg128783 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-02-18 14:54
(Will get that tracker right as time goes by.)
msg128784 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-02-18 14:58
Thanks for the report.  I probably won't have time to look at this for a bit.
msg128787 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-02-18 15:13
We all know EMail 6.0 will blow them off the streets in the end.
msg128808 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-02-18 19:58
P.S.: maybe this completes the byte.
Have a nice weekend nevertheless - if you can.

  Traceback (most recent call last):
    File "/Users/steffen/usr/bin/s-postman.py", line 1419, in _walk
      self._tickets.extend(Ticket.process_message(msg))
    File "/Users/steffen/usr/bin/s-postman.py", line 1275, in process_message
      splitter = splitter(msg)
    File "/Users/steffen/usr/bin/s-postman.py", line 401, in _openbsd_text
      charset = msg.get_content_charset('iso-8859-1')
    File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/message.py", line 820, in get_content_charset
      charset = self.get_param('charset', missing)
    File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/message.py", line 628, in get_param
      for k, v in self._get_params_preserve(failobj, header):
    File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/message.py", line 565, in _get_params_preserve
      for p in _parseparam(';' + value):
  Exception: TypeError: Can't convert 'Header' object to str implicitly


  Traceback (most recent call last):
    File "/Users/steffen/usr/bin/s-postman.py", line 1419, in _walk
      self._tickets.extend(Ticket.process_message(msg))
    File "/Users/steffen/usr/bin/s-postman.py", line 1275, in process_message
      splitter = splitter(msg)
    File "/Users/steffen/usr/bin/s-postman.py", line 402, in _openbsd_text
      lines = msg.get_payload().splitlines()
    File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/message.py", line 244, in get_payload
      cte = self.get('content-transfer-encoding', '').lower()
  Exception: AttributeError: 'Header' object has no attribute 'lower'
msg128846 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-02-19 13:56
I also got this now, it happens with and without the str() patch stuff.  (Note that message.py line numbers are off by 1-2 lines ..).  I don't know more about that in the moment, but the only thing that's changed is that i do:

   alln = self._msg.items()[:] # In fact -> ensure all are header.Header
       # If any converted (str->Header) header names exist ...
       if len(alln):
            # Delete *all* occurrences of h (doesn't throw)
            for (n, b) in alln:
                del self._msg[n]
            # And append in order
            for (n, b) in alln:
                self._msg[n] = b


  Traceback (most recent call last):
    File "/Users/steffen/usr/bin/s-postman.py", line 953, in save_ticket
      mb.add(ticket.message())
    File "/Users/steffen/usr/opt/py3k/lib/python3.2/mailbox.py", line 595, in add
      self._toc[self._next_key] = self._append_message(message)
    File "/Users/steffen/usr/opt/py3k/lib/python3.2/mailbox.py", line 733, in _append_message
      offsets = self._install_message(message)
    File "/Users/steffen/usr/opt/py3k/lib/python3.2/mailbox.py", line 805, in _install_message
      self._dump_message(message, self._file, self._mangle_from_)
    File "/Users/steffen/usr/opt/py3k/lib/python3.2/mailbox.py", line 215, in _dump_message
      gen.flatten(message)
    File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/generator.py", line 88, in flatten
      self._write(msg)
    File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/generator.py", line 134, in _write
      self._dispatch(msg)
    File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/generator.py", line 151, in _dispatch
      main = msg.get_content_maintype()
    File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/message.py", line 528, in get_content_maintype
      ctype = self.get_content_type()
    File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/message.py", line 516, in get_content_type
      ctype = _splitparam(value)[0].lower()
    File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/message.py", line 53, in _splitparam
      a, sep, b = param.partition(';')
  Exception: AttributeError: 'NoneType' object has no attribute 'partition'
msg128847 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-02-19 15:04
The latter one was my fault, i did LIST.append(name, HEADER.append(xy)), assuming that HEADER.append() returns self though it doesn't.  Sorry.  However - shouldn't Message.__setitem__ check for valid arguments (see msg128846 code snippet)?  It would have saved some anger...
msg128848 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-02-19 15:15
However, maybe that 5.1 message.py thing doesn't like header.Header instances.  Also extending msg128846, this one is related to the str() issue - added an extended email_message.2.patch.

  Traceback (most recent call last):
    File "/Users/steffen/usr/bin/s-postman.py", line 953, in save_ticket
      mb.add(ticket.message())
    File "/Users/steffen/usr/opt/py3k/lib/python3.2/mailbox.py", line 595, in add
      self._toc[self._next_key] = self._append_message(message)
    File "/Users/steffen/usr/opt/py3k/lib/python3.2/mailbox.py", line 733, in _append_message
      offsets = self._install_message(message)
    File "/Users/steffen/usr/opt/py3k/lib/python3.2/mailbox.py", line 805, in _install_message
      self._dump_message(message, self._file, self._mangle_from_)
    File "/Users/steffen/usr/opt/py3k/lib/python3.2/mailbox.py", line 215, in _dump_message
      gen.flatten(message)
    File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/generator.py", line 88, in flatten
      self._write(msg)
    File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/generator.py", line 134, in _write
      self._dispatch(msg)
    File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/generator.py", line 151, in _dispatch
      main = msg.get_content_maintype()
    File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/message.py", line 530, in get_content_maintype
      ctype = self.get_content_type()
    File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/message.py", line 518, in get_content_type
      ctype = _splitparam(value)[0].lower()
    File "/Users/steffen/usr/opt/py3k/lib/python3.2/email/message.py", line 53, in _splitparam
      a, sep, b = param.partition(';')
  Exception: AttributeError: 'Header' object has no attribute 'partition'

P.S.: i'm hard to take, and 'programming is an iterative task'...
msg128855 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-02-19 16:19
David, i'm going down now.  I'll raise the type to 'crash', because, in fact, EMail 5.1 doesn't really take care of header.Header objects in message.Message headers, which doesn't sound pretty useful to me!  The patch is sufficient for my broken thing (it doesn't produce a traceback at the moment - time to go for a sunday!), but since i don't really have a glue of mailbox.py / email/* it may not cover all places where a header.Header may occur but the code in fact assumes a str (and implicit conversion of Header to str doesn't work).
msg128856 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-02-19 16:35
... as a last though of mine, here is a header of the well known spam mail:


From MAILER-DAEMON Sat Feb 19 15:58:47 2011
Date: =?latin1?q?Tue=2C_4_Jan_2011_17=3A37=3A26_+0100_=28CET=29?=
From: =?latin1?q?=22SAJATNAPTAR=2ECOM=22_=3Cinfo=40sajatnaptar=2Ecom=3E?=
To: =?latin1?q?source-changes=40cvs=2Eopenbsd=2Eorg?=
Subject: =?latin1?q?Falinapt=3Fr_ingyenes_h=3Fzhozsz=3Fll=3Ft=3Fssal=2E_M=3Fr_rendelt=3Fl=3F_Olvass_el!?=
Message-ID: =?latin1?q?=3C20110104053726system=40sajatnaptar=2Ecom=3E?=
Content-Type: =?latin1?q?text/plain=3B_charset=3Diso-8859-1?=


Shouldn't email/* be smart enough to know that Date:, Content-Type:, Message-ID:, plus some other well-known, RFC-documented header names need *not* be converted, whatever type of object is used to represent them?  This could be implemented with a simple {}.

Bye.
msg128895 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-02-20 06:43
Well, it's not a crash, a crash is when the interpreter segfaults.

I'm not clear on why you are having problems, actually, since if you treat the messages as binary (which they are), then you shouldn't be getting Headers introduced into the mix.  But like I said I don't have time to look at this right now.

The problem with Header not being handled properly does need to be addressed, though.
msg128896 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-02-20 07:13
Ah, I think I see what is going on.  If I'm right, then you are right, this is a serious problem for actually processing spam emails using email 5.1.  Unfortunately it's too late to do anything for 3.2.0.  But email 5.1 is still worlds better at handling well formed binary emails.
msg129046 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-02-22 10:42
(Of course you're right.  It just reads, passes around and spits out that ... of a mail just the same it came in.  Performance is very well, too, just about 1.5 seconds - some two weeks ago it took about 1.1 seconds, but with Python 2.7 - so!

P.S.: my very own desire was just to have a single entry point where i can drop whatever ... in and get something back which may be just as silly but at least conformant, e.g. '__setitem__[x] = ADJUST(__getitem__[x])'; imagine what a swiss ;-) would need to do to get to that point with EMail 5.1: x=header.decode_header(), if x[1] is None check wether string is ASCII clean, otherwise hard-encode with latin1/unknown 8-bit; but even if x[1] is not None the content may be malformed; and then remember that all these steps can throw exceptions, which need to be handled because the mail *will* be processed.  Of course, we're talking about the header here only 8-))
msg129055 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-02-22 12:11
We might wind up with a relatively quick 3.2.1, in which case we can get this fixed then.

The parser is supposed to operate without throwing exceptions (just setting defects), so if you find a case where *parsing* throws an exception please open an issue.  Once you start manipulating the model, of course, you may get exceptions.  I'm not sure what should happen if, say, the charset name is invalid (8bit), but certainly throwing an error because it is a Header rather than a string is wrong and needs fixing.
msg130953 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-03-15 04:21
Here is a patch that adds tests for the methods I didn't previous have test for.  There may still be some headers that I'm not testing for the 'contains binary' case, but this is certainly more comprehensive than we had before.

Please test and let me know if it works; it should, since the code patch is very close to the one you suggested.
msg130972 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-03-15 13:43
On Tue, Mar 15, 2011 at 04:21:24AM +0000, R. David Murray wrote:
> Please test and let me know if it works; it should, since the code patch is very close to the one you suggested.

;-)
Hello David, hope you have a good time at Pycon! 
(Just Googled, weather will be fine right after all of you 
will see the sun and the blue sky once again! 
Hey -- there is a world out there!! :)

Just like i've stated on EMAIL-SIG, you really have convinced me 
of simply using the binary feedparser, but since you have found 
even more places where explicit str() is necessary, this package 
is once again at least 50% better than before!

But i've readded a
    email.header.make_header(email.header.decode_header(b))
thing in my Ticket._bewitch_msg() and ran that patched S-Postman 
;=) on an 3.8 MB mbox file (by the way, if you need f..d .. 
emails, subscribe to OpenBSD Misc), and i'll end up like this:

Traceback (most recent call last):
  File "/Users/steffen/usr/bin/s-postman.py", line 1815, in _walk
    self._tickets.extend(Ticket.process_message(msg))
  File "/Users/steffen/usr/bin/s-postman.py", line 1671, in process_message
    return [Ticket(m, _targets=rsm.targets) for m in splitter]
  File "/Users/steffen/usr/bin/s-postman.py", line 1671, in <listcomp>
    return [Ticket(m, _targets=rsm.targets) for m in splitter]
  File "/Users/steffen/usr/bin/s-postman.py", line 1681, in __init__
    self._bewitch_msg()
  File "/Users/steffen/usr/bin/s-postman.py", line 1752, in _bewitch_msg
    self._msg[n] = email.header.make_header(email.header.decode_header(b))
  File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/header.py", line 73, in decode_header
    if not ecre.search(header):
Exception: TypeError: expected string or buffer

Here: header==<class 'email.header.Header'>
And:

Traceback (most recent call last):
  File "/Users/steffen/usr/bin/s-postman.py", line 1815, in _walk
    self._tickets.extend(Ticket.process_message(msg))
  File "/Users/steffen/usr/bin/s-postman.py", line 1671, in process_message
    return [Ticket(m, _targets=rsm.targets) for m in splitter]
  File "/Users/steffen/usr/bin/s-postman.py", line 1671, in <listcomp>
    return [Ticket(m, _targets=rsm.targets) for m in splitter]
  File "/Users/steffen/usr/bin/s-postman.py", line 1681, in __init__
    self._bewitch_msg()
  File "/Users/steffen/usr/bin/s-postman.py", line 1752, in _bewitch_msg
    self._msg[n] = email.header.make_header(email.header.decode_header(b))
  File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/header.py", line 154, in make_header
    h.append(s, charset)
  File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/header.py", line 270, in append
    s = s.decode(input_charset, errors)
Exception: AttributeError: 'Header' object has no attribute 'decode'

Here s==<class 'email.header.Header'>
And after adding
        # Steffen is out now
        if isinstance(s, email.header.Header):
            s = str(s)
i got stuck on this:

Traceback (raising call only):
  File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/header.py", line 278, in append
    s.encode(output_charset, errors)
Exception: UnicodeEncodeError: 'ascii' codec can't encode character '\ufffd' in position 7: ordinal not in range(128)
{Aaaargh!  Special case UNICODE replacement character, mongrel!}

s was a Header here, too.
I apply a simple email_header.diff which applies cleanly to a49bda. 
Hope i could help a bit.
msg131102 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-03-16 11:47
On Tue, Mar 15, 2011 at 04:21:24AM +0000, R. David Murray wrote:
> Please test and let me know if it works

Spending some more time on that, continuing yesterdays session 
where i got stuck. When i instead do (still in 
header.py:Header.append()):

        # Steffen is out now again
        if isinstance(s, Header):
            s = str(s)
            errors = 'replace'

Traceback (most recent call last):
  File "/Users/steffen/usr/bin/s-postman.py", line 1212, in save_ticket
    mb.add(ticket.message())
  File "/Users/steffen/usr/opt/py3k/lib/python3.3/mailbox.py", line 279, in add
    self._dump_message(message, tmp_file)
  File "/Users/steffen/usr/opt/py3k/lib/python3.3/mailbox.py", line 215, in _dump_message
    gen.flatten(message)
  File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/generator.py", line 91, in flatten
    self._write(msg)
  File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/generator.py", line 144, in _write
    self._write_headers(msg)
  File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/generator.py", line 363, in _write_headers
    self.write(v.encode(maxlinelen=self._maxheaderlen)+NL)
  File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/header.py", line 320, in encode
    formatter.feed(lines[0], charset)
  File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/header.py", line 386, in feed
    encoded_string = charset.header_encode(string)
  File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/charset.py", line 296, in header_encode
    header_bytes = _encode(string, codec)
  File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/charset.py", line 163, in _encode
    return string.encode(codec)
Exception: UnicodeEncodeError: 'ascii' codec can't encode character '\ufffd' in position 7: ordinal not in range(128)

I've updated and am at db73857.
And i am *really* looking forward for 'defects'.
msg131115 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-03-16 13:58
Steffen, these look like different kinds of errors than the one you reported in this ticket.  If they are, could you open a new issue?  Either way, simple reproducers would be the most helpful.
msg131126 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-03-16 15:07
On Wed, Mar 16, 2011 at 01:58:40PM +0000, R. David Murray wrote:
> Steffen, these look like different kinds of errors than the one you reported in this ticket.
> If they are, could you open a new issue?  Either way, simple reproducers would be the most helpful.

I'm on db73857669fb, email/message.py is patched with your code, 
and email/header.py is patched with email-header.2.diff. 
11243-test.1.py will traceback:

15:53 ~/tmp $ python3 11243-test.1.py 
Traceback (most recent call last):
  File "11243-test.1.py", line 17, in <module>
    msg[f] = email.header.make_header(email.header.decode_header(b))
  File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/header.py", line 154, in make_header
    h.append(s, charset)
  File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/header.py", line 279, in append
    s.encode(output_charset, errors)
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 7: ordinal not in range(128)

I'll be down the next couple of hours, but in the meanwhile that's 
all i can do anyway...
And well, i won't open a new issue due to the stuff from msg131102, 
because that happen(ed) if the commented out code from 
email-header.2.diff is applied, which is non-real-life code-flow? 
(Though: a Message is happily read in via 
email.feedparser.BytesFeedParser() and finally adjusted via 
header.make_header(header.decode_header(b)) because you've asked 
me to do so, and just as is done by 11243-test.1.py.)
msg131127 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-03-16 15:09
Sorry, i've forgot the test ;)
msg131174 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-03-16 20:53
David, it's so hard to tell!
If you want a big fat thing that misuses your code, try out my 
S-Postman (it's on Bitbucket and no URL from me and you need tip). 
And i could post you a small config and the EMAIL breaking patch 
for it and a 430KB TBZ archive of mail digests which contain many 
bad mails (this is why you need the postman in the end, it'll 
split the archives up for you). 
This is something you could stick around with for a while :) 
If you want all or parts of that, post and you have it tomorrow.
msg131199 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-03-17 00:17
Steffen, what you are doing in 11243-test is not something that the current email package supports.  String input to message_as_string must be ASCII only in email 5.1/python3.2.  Likewise for decode_header.  To get unicode in to a header, you have to pass it in to the constructor of Header, and then it encodes it as an encoded word in whatever character set you tell it to use.

The make_header(decode_header(stuff)) would theoretically return stuff, except that as you can see if stuff is non-ascii (or a Header), it won't work.  If you are handling 'dirty' data, you have to stick to the binary interfaces, as discussed.  Header needs a binary interface, but it doesn't have one (yet?).

Yes, this interface is not an optimal interface.  That's what email6 is about :)

So, absent a minimal failing test case, I'm going to commit the patch.
msg131206 - (view) Author: Roundup Robot (python-dev) Date: 2011-03-17 01:13
New changeset 74a8c46fb272 by R David Murray in branch '3.2':
#11243: tests and fixes for handling of 'dirty data' in additional methods
http://hg.python.org/cpython/rev/74a8c46fb272

New changeset 82ecfcd31250 by R David Murray in branch 'default':
Merge #11243 fix from 3.2.
http://hg.python.org/cpython/rev/82ecfcd31250
msg131207 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-03-17 01:15
I'm closing this issue.  If you have a specific test case that is still failing, please open a new issue.  And thanks for testing this fix.
History
Date User Action Args
2011-03-17 01:15:11r.david.murraysetstatus: open -> closed

messages: + msg131207
resolution: fixed
stage: patch review -> resolved
2011-03-17 01:13:17python-devsetnosy: + python-dev
messages: + msg131206
2011-03-17 00:17:31r.david.murraysetmessages: + msg131199
2011-03-16 20:53:27sdaodensetmessages: + msg131174
2011-03-16 15:09:53sdaodensetfiles: + 11243-test.1.py

messages: + msg131127
2011-03-16 15:07:04sdaodensetfiles: + email-header.2.diff

messages: + msg131126
2011-03-16 13:58:39r.david.murraysetmessages: + msg131115
2011-03-16 11:47:17sdaodensetmessages: + msg131102
2011-03-15 13:43:40sdaodensetfiles: + email_header.diff

messages: + msg130972
2011-03-15 04:21:21r.david.murraysetfiles: + emails_with_Headers.patch

messages: + msg130953
2011-02-22 12:11:14r.david.murraysetmessages: + msg129055
2011-02-22 10:42:45sdaodensetmessages: + msg129046
2011-02-21 23:15:53r.david.murraysetpriority: normal -> high
2011-02-20 07:13:12r.david.murraysetmessages: + msg128896
2011-02-20 06:43:11r.david.murraysettype: crash -> behavior
messages: + msg128895
2011-02-19 16:35:45sdaodensetmessages: + msg128856
2011-02-19 16:19:09sdaodensettype: behavior -> crash
messages: + msg128855
2011-02-19 16:02:03sdaodensetfiles: - email_message.patch
2011-02-19 15:15:03sdaodensetfiles: + email_message.2.patch

messages: + msg128848
2011-02-19 15:04:37sdaodensetmessages: + msg128847
2011-02-19 13:56:07sdaodensetmessages: + msg128846
2011-02-18 19:58:07sdaodensetmessages: + msg128808
2011-02-18 15:13:35sdaodensetmessages: + msg128787
components: + Library (Lib)
2011-02-18 14:58:58r.david.murraysetversions: + Python 3.3
title: email/message.py str conversion [patch] -> email/message.py str conversion
messages: + msg128784

assignee: r.david.murray
stage: patch review
2011-02-18 14:54:23sdaodensettype: behavior
messages: + msg128783
versions: + Python 3.2
2011-02-18 14:53:19sdaodencreate