Created on 2014-02-06 14:44 by jason.coombs, last changed 2014-02-07 18:30 by r.david.murray. This issue is now closed.
|msg210390 - (view)||Author: Jason R. Coombs (jason.coombs) *||Date: 2014-02-06 14:44|
As reported in https://bitbucket.org/dholth/wheel/issue/104, the email.parser no longer accepts Unicode content as it did in 3.3. I searched the What's New and module documentation, but found no indication that this behavior is no longer supported, so it appears to be a regression. If it's an intentional change, the behavior should be documented in one of the aforementioned documents. Consider this simple test case: # -*- coding: utf-8 -*- import email.parser meta = """ Header: ☃ """ email.parser.Parser().parsestr(meta) Run that on Python 3.3.3 or Python 2 and it executes silently. Run it on Python 3.4.0b3 and it produces this traceback: Traceback (most recent call last): File "C:\Users\jaraco\projects\public\wheel\test.py", line 6, in <module> email.parser.Parser().parsestr(meta) File "C:\Program Files\Python34\lib\email\parser.py", line 70, in parsestr return self.parse(StringIO(text), headersonly=headersonly) File "C:\Program Files\Python34\lib\email\parser.py", line 60, in parse return feedparser.close() File "C:\Program Files\Python34\lib\email\feedparser.py", line 170, in close self._call_parse() File "C:\Program Files\Python34\lib\email\feedparser.py", line 163, in _call_parse self._parse() File "C:\Program Files\Python34\lib\email\feedparser.py", line 449, in _parsegen self._cur.set_payload(EMPTYSTRING.join(lines)) File "C:\Program Files\Python34\lib\email\message.py", line 311, in set_payload " payload") from None TypeError: charset argument must be specified when non-ASCII characters are used in the payload
|msg210396 - (view)||Author: R. David Murray (r.david.murray) *||Date: 2014-02-06 14:56|
This was an intentional change, but I'm having second thoughts about it. I think I need to make it a deprecation warning in 3.4. Note that it doesn't actually do anything useful in 3.3: Python 3.3.2 (default, Dec 9 2013, 11:44:21) [GCC 4.7.2] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import email.parser >>> meta = """ ... Header: ☃ ... """ >>> m = email.parser.Parser().parsestr(meta) >>> str(m) '\nHeader: ☃\n' >>> import email.generator >>> import io >>> s = io.BytesIO() >>> g = email.generator.BytesGenerator(s) >>> g.flatten(m) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python3.3/email/generator.py", line 112, in flatten self._write(msg) File "/usr/lib/python3.3/email/generator.py", line 177, in _write self._dispatch(msg) File "/usr/lib/python3.3/email/generator.py", line 203, in _dispatch meth(msg) File "/usr/lib/python3.3/email/generator.py", line 421, in _handle_text super(BytesGenerator,self)._handle_text(msg) File "/usr/lib/python3.3/email/generator.py", line 233, in _handle_text self._write_lines(payload) File "/usr/lib/python3.3/email/generator.py", line 158, in _write_lines self.write(laststripped) File "/usr/lib/python3.3/email/generator.py", line 395, in write self._fp.write(s.encode('ascii', 'surrogateescape')) UnicodeEncodeError: 'ascii' codec can't encode character '\u2603' in position 8: ordinal not in range(128) That is, if you pretend the message is a string, it will happily output it as a string, including perhaps your writing the output to a file as utf-8...but it will *NOT* be a valid email message, since it will have non-ascii data in it with no specified CTE.
|msg210397 - (view)||Author: R. David Murray (r.david.murray) *||Date: 2014-02-06 14:59|
Ideally what should really happen here, I think, is for email to treat this as analogous to an SMTPUTF8 message. But I certainly don't have time to do that for the alpha :( So, yeah, I need to revert that check for 3.4.
|msg210398 - (view)||Author: Daniel Holth (dholth)||Date: 2014-02-06 15:04|
In bdist_wheel I've gone to some lengths to re-use the email module to parse and generate "RFC822 inspired" documents. The output is not a valid e-mail but it is useful. It is awkward to use the email module this way. We will sidestep the issue hopefully this year by switching to json.
|msg210399 - (view)||Author: R. David Murray (r.david.murray) *||Date: 2014-02-06 15:09|
The long term goal is to make it not-awkward to do the kind of thing you are doing, Daniel. The change I made was premature in hindsight, I need to comprehensively address "parsing unicode" instead...it sort-of-works now, but only by accident. That said, it is the policy stuff that will really give you the flexibility to manipulate "rfc822-inspired" data, and that doesn't really help you since you need to remain backward compatible with older pythons.
|msg210401 - (view)||Author: Daniel Holth (dholth)||Date: 2014-02-06 15:17|
We do this. https://bitbucket.org/dholth/wheel/src/tip/wheel/pkginfo.py?at=default I appreciate the long-term goal. The policy system is really neat. We are going to json largely because the next version of the metadata is more nested. The decision had nothing to do with the email module itself.
|msg210506 - (view)||Author: Roundup Robot (python-dev)||Date: 2014-02-07 18:06|
New changeset f942f1eddfea by R David Murray in branch 'default': #20531: Revert e20f98a8ed71, the 3.4 version of the #19063 fix. http://hg.python.org/cpython/rev/f942f1eddfea New changeset ef8aaace85ca by R David Murray in branch 'default': #20531: Apply the 3.3 version of the #19063 fix. http://hg.python.org/cpython/rev/ef8aaace85ca
|msg210507 - (view)||Author: R. David Murray (r.david.murray) *||Date: 2014-02-07 18:09|
OK, backward compatibility is restored. Hopefully I can fix the underlying problem right in 3.5 as part of issue 8489.
|msg210508 - (view)||Author: Jason R. Coombs (jason.coombs) *||Date: 2014-02-07 18:14|
Thanks David. I've confirmed the fix works (copying 'email' package over Python 3.4.0b3).
|2014-02-07 18:37:51||r.david.murray||link||issue20089 superseder|
|2014-02-07 18:30:31||r.david.murray||set||type: behavior|
|2014-02-07 18:14:15||jason.coombs||set||type: behavior -> (no value)|
resolution: fixed -> (no value)
messages: + msg210508
stage: resolved -> (no value)
|2014-02-07 18:09:33||r.david.murray||set||status: open -> closed|
messages: + msg210507
messages: + msg210506
|2014-02-06 15:17:48||dholth||set||messages: + msg210401|
|2014-02-06 15:09:45||r.david.murray||set||messages: + msg210399|
messages: + msg210398
|2014-02-06 14:59:37||r.david.murray||set||priority: normal -> release blocker|
nosy: + larry
messages: + msg210397
|2014-02-06 14:56:53||r.david.murray||set||messages: + msg210396|