classification
Title: email.utils.formataddr() should be rfc2047 aware
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.3
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: r.david.murray Nosy List: barry, python-dev, r.david.murray, terry.reedy, torsten.becker
Priority: normal Keywords: easy, patch

Created on 2007-03-29 13:27 by barry, last changed 2011-04-07 09:31 by torsten.becker. This issue is now closed.

Files
File name Uploaded Description Edit
issue-1690608.patch torsten.becker, 2011-03-27 13:44 Contains a test and fix for this issue review
issue-1690608-v2.patch torsten.becker, 2011-03-27 17:00 Switched to email.charset.Charset, added more tests review
issue-1690608-v3.patch torsten.becker, 2011-03-27 21:53 Optional arg can be str or Charset, trimmed down tests, added docs. review
issue-1690608-v4.patch torsten.becker, 2011-03-28 09:56 Checking for instance of str instead of Charset, added 2 more tests for that review
Messages (13)
msg31677 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2007-03-29 13:27
formataddr() should rfc2047 encode its name argument if necessary.
msg113048 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-08-05 20:43
I am just responding so this will not show up on the 'unanswered issues' list.
msg132320 - (view) Author: Torsten Becker (torsten.becker) Date: 2011-03-27 13:44
I implemented a basic test for the issue and an attempt for a fix.

I am not entirely sure with my implementation, specifically I would like to get comments concerning the following points:

  - Is is OK that formataddr() will now check if address is ascii safe and if not it will raise a UnicodeEncodeError?
  
  - I was not sure on the style how to append new tests to test_email.py, I just put it into the same spot where all the other formataddr() tests where, shall I put it to the end instead?


I am submitting this patch as part of my preparation for the Google Summer of Code to familiarize myself with the contribution process, any feedback on what I should do different is very welcome.
msg132331 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-03-27 14:28
The general approach of the patch looks good to me.  Since formataddr is designed to be called from user code that is constructing a message, having it raise for non-ascii in the address is probably OK.  However, there should be a test for that, and I'm curious to know what happens if you use such an address in an address field in the unmodified email package.

Instead of directly calling bencode, you should use the charset module and its header_encode method.  Note that you need to turn the charset into a Charset instance first.  The advantage of doing this is that it will choose the "best" encoding to use based on the charset and the contents of the string.

Your choice of location for the new tests is fine; TestMiscelaneous really should be split up a bit, but that will wait until I do a general refactoring of the tests.

Thanks for working on this.
msg132351 - (view) Author: Torsten Becker (torsten.becker) Date: 2011-03-27 17:00
> However, there should be a test for that, and I'm curious to know what happens if you use such an address in an address field in the unmodified email package.

I added a test to check if the exceptions get thrown when a address is invalid.

I also added a small test to check how a resulting message should look, it looks good to me but I am not a specialist with email.  Do you have any other ideas how to check if it does not have a negative impact to other parts of the module?


> Instead of directly calling bencode, you should use the charset module and its header_encode method.  Note that you need to turn the charset into a Charset instance first.  The advantage of doing this is that it will choose the "best" encoding to use based on the charset and the contents of the string.

The code also uses email.charset.Charset now.
msg132356 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-03-27 18:51
You should check if 'charset' is a string, and call Charset on it only if it is (a Charset may be passed directly in other email package interfaces, and so should be supported here as well.

The test doesn't need to cater for the fact that either b or B (or q or Q) are legitimate: we know which one the package is generating, so just test for that.

For the Message['To'], I wasn't clear.  What I would like is a test that includes non-ascii characters in the address part, *without* passing it through formataddr, to see what the package currently does with it.  This may in fact reveal an additional bug.  But, it is really out of scope for this issue, so you can just remove that test (sorry).

There should also be an update to the docs (Doc/library/email.utils.rst) documenting the API change.
msg132367 - (view) Author: Torsten Becker (torsten.becker) Date: 2011-03-27 21:53
I incorporated the changes as you suggested and added the text to the docs.  Just out of curiosity, why are the docs repeated in email.util.rst when they are already in the docstrings?
msg132381 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-03-28 01:00
Thanks.  Looks good except that it should check isinstance(string) rather than isinstance(Charset), that way someone can pass a custom class that implements the Charset API if they want.  (Alternatively, the check could be for an encode_header method...actually that might be better, although it isn't what the other email modules do.)

The doc strings are an abreviated version of what is in the docstrings, and the text is often not-quite-equivalent even when it is not a strict subset of the docs.  We believe it produces higher quality documentation to maintain them separately and tune each one for its intended use case (though this does mean that they occasionally get out of sync due to oversights).
msg132389 - (view) Author: Torsten Becker (torsten.becker) Date: 2011-03-28 09:56
I incorporated that change as well.  My rationale behind the previous version was to be consistent with how Lib/email/header.py handled this, unfortunately I did not look around in the other classes and didn't think about that kind of compatibility.

When formataddr() is called with a object which is not a string and which does not have a header_encode it will raise the following exception now:

> AttributeError: 'CharsetMock' object has no attribute 'header_encode'

Thank you for your patience, sorry that it took probably more of your time by taking 4 iterations for this patch than if you had just implemented it yourself.
msg132390 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-03-28 10:46
Ah, yes.  Header is probably wrong there, I should fix that at some point.

Sorry for the misytpes in my last message (it was late at night for me when I wrote it :)

As for time, it probably didn't take any more time than it would have to write it myself, and the end product is almost certainly better for having had two sets of eyes on it.  This kind of back and forth often happens even when it is an experienced developer writing the patch.  

But even if neither of those were true it would be worthwhile to do it in order to support you in learning to contribute.  Thanks again for working on this, and I'll probably commit it some time today.
msg133129 - (view) Author: Roundup Robot (python-dev) Date: 2011-04-06 13:52
New changeset 184ddd9acd5a by R David Murray in branch 'default':
#1690608: make formataddr RFC2047 aware.
http://hg.python.org/cpython/rev/184ddd9acd5a
msg133131 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-04-06 14:04
Finally got around to committing this; thanks, Torsten.  As a reward, I'm going to make you nosy on a new, related issue I'm about to create.  It is, of course, your option whether you want to work on it :)

By the way, have you submitted a contributor agreement?  This patch isn't really big enough to require one, but having one on file is always a good idea, especially if you are going to keep contributing (and I hope you do).
msg133202 - (view) Author: Torsten Becker (torsten.becker) Date: 2011-04-07 09:31
Hi David, thank you for polishing up the patch and committing it. :)
I am glad I could help and I was actually about to ask you if you knew
any follow-up issues.  I'll definitely continue contributing as time
allows.  I did not submit the agreement yet, but I'll look into that
ASAP.
History
Date User Action Args
2011-04-07 09:31:56torsten.beckersetmessages: + msg133202
2011-04-06 14:04:35r.david.murraysetstatus: open -> closed
resolution: accepted
messages: + msg133131

stage: commit review -> resolved
2011-04-06 13:52:23python-devsetnosy: + python-dev
messages: + msg133129
2011-03-28 10:46:31r.david.murraysetversions: - Python 3.1, Python 2.7, Python 3.2
2011-03-28 10:46:13r.david.murraysetmessages: + msg132390
stage: test needed -> commit review
2011-03-28 09:56:56torsten.beckersetfiles: + issue-1690608-v4.patch

messages: + msg132389
2011-03-28 01:00:27r.david.murraysetmessages: + msg132381
2011-03-27 21:53:50torsten.beckersetfiles: + issue-1690608-v3.patch

messages: + msg132367
2011-03-27 18:51:49r.david.murraysetmessages: + msg132356
2011-03-27 17:00:43torsten.beckersetfiles: + issue-1690608-v2.patch

messages: + msg132351
2011-03-27 14:28:54r.david.murraysetmessages: + msg132331
2011-03-27 13:44:28torsten.beckersetfiles: + issue-1690608.patch

nosy: + torsten.becker
messages: + msg132320

keywords: + patch
2011-03-13 22:42:29r.david.murraysetversions: + Python 3.1, Python 2.7, Python 3.3
2010-08-05 20:43:38terry.reedysetnosy: + terry.reedy
messages: + msg113048
2010-07-11 14:46:27BreamoreBoysetversions: + Python 3.2, - Python 3.1, Python 2.7
2010-05-05 13:45:45barrysetassignee: barry -> r.david.murray

nosy: + r.david.murray
2009-04-22 05:08:47ajaksu2setkeywords: + easy
2009-03-30 22:10:10ajaksu2setstage: test needed
type: enhancement
versions: + Python 3.1, Python 2.7, - Python 2.6
2007-03-29 13:27:49barrycreate