Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

email.utils.formataddr() should be rfc2047 aware #44784

Closed
warsaw opened this issue Mar 29, 2007 · 13 comments
Closed

email.utils.formataddr() should be rfc2047 aware #44784

warsaw opened this issue Mar 29, 2007 · 13 comments
Assignees
Labels
easy stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@warsaw
Copy link
Member

warsaw commented Mar 29, 2007

BPO 1690608
Nosy @warsaw, @terryjreedy, @bitdancer
Files
  • issue-1690608.patch: Contains a test and fix for this issue
  • issue-1690608-v2.patch: Switched to email.charset.Charset, added more tests
  • issue-1690608-v3.patch: Optional arg can be str or Charset, trimmed down tests, added docs.
  • issue-1690608-v4.patch: Checking for instance of str instead of Charset, added 2 more tests for that
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/bitdancer'
    closed_at = <Date 2011-04-06.14:04:35.924>
    created_at = <Date 2007-03-29.13:27:49.000>
    labels = ['easy', 'type-feature', 'library']
    title = 'email.utils.formataddr() should be rfc2047 aware'
    updated_at = <Date 2011-04-07.09:31:56.092>
    user = 'https://github.com/warsaw'

    bugs.python.org fields:

    activity = <Date 2011-04-07.09:31:56.092>
    actor = 'torsten.becker'
    assignee = 'r.david.murray'
    closed = True
    closed_date = <Date 2011-04-06.14:04:35.924>
    closer = 'r.david.murray'
    components = ['Library (Lib)']
    creation = <Date 2007-03-29.13:27:49.000>
    creator = 'barry'
    dependencies = []
    files = ['21429', '21431', '21434', '21436']
    hgrepos = []
    issue_num = 1690608
    keywords = ['patch', 'easy']
    message_count = 13.0
    messages = ['31677', '113048', '132320', '132331', '132351', '132356', '132367', '132381', '132389', '132390', '133129', '133131', '133202']
    nosy_count = 5.0
    nosy_names = ['barry', 'terry.reedy', 'r.david.murray', 'python-dev', 'torsten.becker']
    pr_nums = []
    priority = 'normal'
    resolution = 'accepted'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue1690608'
    versions = ['Python 3.3']

    @warsaw
    Copy link
    Member Author

    warsaw commented Mar 29, 2007

    formataddr() should rfc2047 encode its name argument if necessary.

    @warsaw warsaw self-assigned this Mar 29, 2007
    @warsaw warsaw added the stdlib Python modules in the Lib dir label Mar 29, 2007
    @warsaw warsaw self-assigned this Mar 29, 2007
    @warsaw warsaw added the stdlib Python modules in the Lib dir label Mar 29, 2007
    @devdanzin devdanzin mannequin added type-feature A feature request or enhancement labels Mar 30, 2009
    @devdanzin devdanzin mannequin added the easy label Apr 22, 2009
    @warsaw warsaw assigned bitdancer and unassigned warsaw May 5, 2010
    @terryjreedy
    Copy link
    Member

    I am just responding so this will not show up on the 'unanswered issues' list.

    @torstenbecker
    Copy link
    Mannequin

    torstenbecker mannequin commented Mar 27, 2011

    I implemented a basic test for the issue and an attempt for a fix.

    I am not entirely sure with my implementation, specifically I would like to get comments concerning the following points:

    • Is is OK that formataddr() will now check if address is ascii safe and if not it will raise a UnicodeEncodeError?

    • I was not sure on the style how to append new tests to test_email.py, I just put it into the same spot where all the other formataddr() tests where, shall I put it to the end instead?

    I am submitting this patch as part of my preparation for the Google Summer of Code to familiarize myself with the contribution process, any feedback on what I should do different is very welcome.

    @bitdancer
    Copy link
    Member

    The general approach of the patch looks good to me. Since formataddr is designed to be called from user code that is constructing a message, having it raise for non-ascii in the address is probably OK. However, there should be a test for that, and I'm curious to know what happens if you use such an address in an address field in the unmodified email package.

    Instead of directly calling bencode, you should use the charset module and its header_encode method. Note that you need to turn the charset into a Charset instance first. The advantage of doing this is that it will choose the "best" encoding to use based on the charset and the contents of the string.

    Your choice of location for the new tests is fine; TestMiscelaneous really should be split up a bit, but that will wait until I do a general refactoring of the tests.

    Thanks for working on this.

    @torstenbecker
    Copy link
    Mannequin

    torstenbecker mannequin commented Mar 27, 2011

    However, there should be a test for that, and I'm curious to know what happens if you use such an address in an address field in the unmodified email package.

    I added a test to check if the exceptions get thrown when a address is invalid.

    I also added a small test to check how a resulting message should look, it looks good to me but I am not a specialist with email. Do you have any other ideas how to check if it does not have a negative impact to other parts of the module?

    Instead of directly calling bencode, you should use the charset module and its header_encode method. Note that you need to turn the charset into a Charset instance first. The advantage of doing this is that it will choose the "best" encoding to use based on the charset and the contents of the string.

    The code also uses email.charset.Charset now.

    @bitdancer
    Copy link
    Member

    You should check if 'charset' is a string, and call Charset on it only if it is (a Charset may be passed directly in other email package interfaces, and so should be supported here as well.

    The test doesn't need to cater for the fact that either b or B (or q or Q) are legitimate: we know which one the package is generating, so just test for that.

    For the Message['To'], I wasn't clear. What I would like is a test that includes non-ascii characters in the address part, *without* passing it through formataddr, to see what the package currently does with it. This may in fact reveal an additional bug. But, it is really out of scope for this issue, so you can just remove that test (sorry).

    There should also be an update to the docs (Doc/library/email.utils.rst) documenting the API change.

    @torstenbecker
    Copy link
    Mannequin

    torstenbecker mannequin commented Mar 27, 2011

    I incorporated the changes as you suggested and added the text to the docs. Just out of curiosity, why are the docs repeated in email.util.rst when they are already in the docstrings?

    @bitdancer
    Copy link
    Member

    Thanks. Looks good except that it should check isinstance(string) rather than isinstance(Charset), that way someone can pass a custom class that implements the Charset API if they want. (Alternatively, the check could be for an encode_header method...actually that might be better, although it isn't what the other email modules do.)

    The doc strings are an abreviated version of what is in the docstrings, and the text is often not-quite-equivalent even when it is not a strict subset of the docs. We believe it produces higher quality documentation to maintain them separately and tune each one for its intended use case (though this does mean that they occasionally get out of sync due to oversights).

    @torstenbecker
    Copy link
    Mannequin

    torstenbecker mannequin commented Mar 28, 2011

    I incorporated that change as well. My rationale behind the previous version was to be consistent with how Lib/email/header.py handled this, unfortunately I did not look around in the other classes and didn't think about that kind of compatibility.

    When formataddr() is called with a object which is not a string and which does not have a header_encode it will raise the following exception now:

    AttributeError: 'CharsetMock' object has no attribute 'header_encode'

    Thank you for your patience, sorry that it took probably more of your time by taking 4 iterations for this patch than if you had just implemented it yourself.

    @bitdancer
    Copy link
    Member

    Ah, yes. Header is probably wrong there, I should fix that at some point.

    Sorry for the misytpes in my last message (it was late at night for me when I wrote it :)

    As for time, it probably didn't take any more time than it would have to write it myself, and the end product is almost certainly better for having had two sets of eyes on it. This kind of back and forth often happens even when it is an experienced developer writing the patch.

    But even if neither of those were true it would be worthwhile to do it in order to support you in learning to contribute. Thanks again for working on this, and I'll probably commit it some time today.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 6, 2011

    New changeset 184ddd9acd5a by R David Murray in branch 'default':
    bpo-1690608: make formataddr RFC2047 aware.
    http://hg.python.org/cpython/rev/184ddd9acd5a

    @bitdancer
    Copy link
    Member

    Finally got around to committing this; thanks, Torsten. As a reward, I'm going to make you nosy on a new, related issue I'm about to create. It is, of course, your option whether you want to work on it :)

    By the way, have you submitted a contributor agreement? This patch isn't really big enough to require one, but having one on file is always a good idea, especially if you are going to keep contributing (and I hope you do).

    @torstenbecker
    Copy link
    Mannequin

    torstenbecker mannequin commented Apr 7, 2011

    Hi David, thank you for polishing up the patch and committing it. :)
    I am glad I could help and I was actually about to ask you if you knew
    any follow-up issues. I'll definitely continue contributing as time
    allows. I did not submit the agreement yet, but I'll look into that
    ASAP.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    easy stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants