This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: EMail generator.flatten() disintegrates over non-ascii multipart/alternative
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.2, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: r.david.murray Nosy List: python-dev, r.david.murray, sdaoden, ysj.ray
Priority: normal Keywords: patch

Created on 2011-03-19 10:55 by sdaoden, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
generator_booom.mbox sdaoden, 2011-03-19 10:55
parse_8bit_multipart.patch r.david.murray, 2011-04-05 20:35
Messages (11)
msg131406 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-03-19 10:55
Hi David, i'm having real problems here!
Got a multipart mail and i get this:

______
Traceback (most recent call last):
  File "/Users/steffen/usr/bin/s-postman.py", line 1239, in save_ticket
    mb.add(ticket.message())
  File "/Users/steffen/usr/opt/py3k/lib/python3.3/mailbox.py", line 279, in add
    self._dump_message(message, tmp_file)
  File "/Users/steffen/usr/opt/py3k/lib/python3.3/mailbox.py", line 215, in _dump_message
    gen.flatten(message)
  File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/generator.py", line 91, in flatten
    self._write(msg)
  File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/generator.py", line 137, in _write
    self._dispatch(msg)
  File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/generator.py", line 163, in _dispatch
    meth(msg)
  File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/generator.py", line 224, in _handle_multipart
    g.flatten(part, unixfrom=False, linesep=self._NL)
  File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/generator.py", line 91, in flatten
    self._write(msg)
  File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/generator.py", line 137, in _write
    self._dispatch(msg)
  File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/generator.py", line 163, in _dispatch
    meth(msg)
  File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/generator.py", line 388, in _handle_text
    super(BytesGenerator,self)._handle_text(msg)
  File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/generator.py", line 201, in _handle_text
    self.write(payload)
  File "/Users/steffen/usr/opt/py3k/lib/python3.3/email/generator.py", line 349, in write
    self._fp.write(s.encode('ascii', 'surrogateescape'))
Exception: UnicodeEncodeError: 'ascii' codec can't encode character '\xdf' in position 21: ordinal not in range(128)
______

I'll attach that mail.  I don't know wether your version of
my postman reproduces it (though it should, but hey, it's broken!),
i'm using tag 0.4.0a1 (revision 420fcd870797).
I would be thankful for a hint on what this is and how i can
avoid this.
msg131408 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-03-19 12:22
Just want to point out that to my knowledge the mail is absolutely 
correct, in respect to classification and content. 
BytesGenerator tries to warp a UTF-8 message (which effectively 
contains LATIN1 data in the text part) to ASCII data, 
failing on the first (correctly LATIN1-ified) character. 
Maybe the addressed behaviour would indeed belong to #11216?
msg131409 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-03-19 12:46
By the way, instead of using my postman you may also
reuse the stuff from #11401.
To crash that instead of generator_booom.mbox simply
change whatever character in the text/plain part to
a valid LATIN1 (charset=ISO-8859-1) character,
i've personally changed the second character of 'You' to

00F6;LATIN SMALL LETTER O WITH DIAERESIS;Ll;0;L;006F 0308;;;;N;LATIN SMALL LETTER O DIAERESIS;;00D6;;00D6
msg131420 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-03-19 14:22
I can reproduce this just using message_from_binary_file and BytesGenerator on your input file, so thanks for attaching the email.  I have a test in the test suite that is *supposed* to test this, but clearly there is a case here that is not being tested.  It looks like there is something about multipart that my code isn't handling.  I'll look in to it.  (I'm very annoyed with myself that I didn't put in a test case for this to start with.)
msg133041 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-04-05 15:25
ysj.ray, this rude workaround manages it yet for me
(self._msg is a BytesParser() generated Message):


            if not self._msg.is_multipart():
                return
            topmost = True
            for part in self._msg.walk():
                if topmost:
                    topmost = False
                    continue
                ct = part.get_content_type()
                if not ct.startswith('text'):
                    continue

                try:
                    payload = part.get_payload()
                    charset = part.get_param('charset')
                    if charset is not None:
                        del part['content-transfer-encoding']
                        part.set_payload(payload, charset)
                except:


Note you can't simply use encoders because those break
on byte messages.
(But it would be cool if you see quopri and base64 fail
and open issues for that!)

Have fun,
Steffen
msg133042 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-04-05 15:28
I'm working on this.  It appears to be a bug in the bytes parser, rather than the generator.
msg133044 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-04-05 15:29
Although there's a (different) bug in the generator, too, I think :)
msg133084 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-04-05 20:35
Here is a patch against 3.2, with test.  Simple fix, but it took me a while to track down the critical piece of code.
msg133119 - (view) Author: Steffen Daode Nurpmeso (sdaoden) Date: 2011-04-06 11:00
On Tue, Apr 05, 2011 at 08:35:22PM +0000, R. David Murray wrote:
> Simple fix, but it took me a while to track down the critical piece of code.

I've really tried to break it, but i can't.
msg133121 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-04-06 12:16
New changeset b807cf929e26 by R David Murray in branch '3.2':
#11605: don't use set/get_payload in feedparser; they do conversions.
http://hg.python.org/cpython/rev/b807cf929e26

New changeset 642c0d6799c5 by R David Murray in branch 'default':
Merge #11605: don't use set/get_payload in feedparser; they do conversions.
http://hg.python.org/cpython/rev/642c0d6799c5
msg133122 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-04-06 12:17
Thanks for the testing.
History
Date User Action Args
2022-04-11 14:57:15adminsetgithub: 55814
2011-04-06 12:17:52r.david.murraysetstatus: open -> closed
resolution: fixed
messages: + msg133122

stage: patch review -> resolved
2011-04-06 12:16:42python-devsetnosy: + python-dev
messages: + msg133121
2011-04-06 11:00:32sdaodensetmessages: + msg133119
2011-04-05 20:35:21r.david.murraysetfiles: + parse_8bit_multipart.patch
keywords: + patch
messages: + msg133084

stage: needs patch -> patch review
2011-04-05 15:29:32r.david.murraysetmessages: + msg133044
2011-04-05 15:28:08r.david.murraysetmessages: + msg133042
2011-04-05 15:25:47sdaodensetmessages: + msg133041
2011-04-05 12:02:02ysj.raysetnosy: + ysj.ray
2011-03-19 14:22:53r.david.murraysetmessages: + msg131420
stage: needs patch
2011-03-19 14:09:59r.david.murraysetassignee: r.david.murray
2011-03-19 12:46:14sdaodensettitle: EMail generator.flatten() disintegrates over UTF-8 multipart/alternative -> EMail generator.flatten() disintegrates over non-ascii multipart/alternative
messages: + msg131409
versions: + Python 3.2
2011-03-19 12:22:32sdaodensetmessages: + msg131408
2011-03-19 10:55:55sdaodencreate