Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

email.utils.parseaddr returns garbage for invalid input #53532

Closed
merwok opened this issue Jul 17, 2010 · 12 comments
Closed

email.utils.parseaddr returns garbage for invalid input #53532

merwok opened this issue Jul 17, 2010 · 12 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@merwok
Copy link
Member

merwok commented Jul 17, 2010

BPO 9286
Nosy @warsaw, @merwok, @bitdancer
Files
  • preserve_unquoted_white_space_in_local_part.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/bitdancer'
    closed_at = <Date 2010-12-18.18:31:42.789>
    created_at = <Date 2010-07-17.14:35:58.131>
    labels = ['type-bug', 'library']
    title = 'email.utils.parseaddr returns garbage for invalid input'
    updated_at = <Date 2010-12-19.23:53:17.339>
    user = 'https://github.com/merwok'

    bugs.python.org fields:

    activity = <Date 2010-12-19.23:53:17.339>
    actor = 'eric.araujo'
    assignee = 'r.david.murray'
    closed = True
    closed_date = <Date 2010-12-18.18:31:42.789>
    closer = 'r.david.murray'
    components = ['Library (Lib)']
    creation = <Date 2010-07-17.14:35:58.131>
    creator = 'eric.araujo'
    dependencies = []
    files = ['20025']
    hgrepos = []
    issue_num = 9286
    keywords = ['patch']
    message_count = 12.0
    messages = ['110561', '117813', '120983', '120985', '123856', '124072', '124078', '124080', '124086', '124304', '124323', '124371']
    nosy_count = 3.0
    nosy_names = ['barry', 'eric.araujo', 'r.david.murray']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue9286'
    versions = ['Python 3.2']

    @merwok
    Copy link
    Member Author

    merwok commented Jul 17, 2010

    This behavior does not seem right to me:

    parsing 'merwok'
    expected ('merwok', '')
    got ('', 'merwok')

    parsing 'merwok wok@rusty'
    expected ('', 'wok@rusty')
    got ('', 'merwokwok@rusty')

    (Generated with a small script just doing a loop and prints, not attached because boring.)

    Are my expectations wrong? I don’t know if a string like “merwok” in my first example is a legal address in the relevant RFCs; Mark Sapiro replied in msg110556 that it could be consistent with most MUAs/MTAs.

    I don’t know either if the folding done in the second example is okay; I’d like an exception here, or if parseaddr is designed to never fail, empty strings to indicate failure. I’m also okay with “garbage in, garbage out” as answer.

    @merwok merwok added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Jul 17, 2010
    @bitdancer
    Copy link
    Member

    In the first of your examples, parseaddr is correct (a lone token is considered a 'local' address per RFC).

    The second one is prossibly wrong, but if so the correct way to interpret it is not clear. If you read the RFC carefully (http://tools.ietf.org/html/rfc5322#section-4.4), spaces are allowed between the 'local part' and the domain in obsolete syntax (which must be accepted). However, the space being elided here is between pieces of the local part. Note that because the address is not in '<>', the whole string is the address, there's no name field. The "correct" parse could be:

    ('', '"merwok wok"@Rusty')

    That is, we apply a 'be generous in what you accept' rule and assume the "s were forgotten. However, perhaps a more sensible 'generous' rule would be to assume the '<>' were forgotten and return

    ('merwok', 'wok@rusty')

    However, it is quite possible that the reason the space is being elided here has to do with handling the obsolete 'route' syntax. If that is the case then parseaddr is probably correct. It may be a while before I get around to understanding that part of the spec well enough to render a judgement, so in the meantime I'll assume parseaddr is correct. Feel free to read the spec and render your own opinion :)

    @merwok
    Copy link
    Member Author

    merwok commented Nov 11, 2010

    Having no time to read email RFCs, I’ll defer to you here. Please reject this report or save it for later as you prefer.

    @bitdancer
    Copy link
    Member

    In connection with another bug report I found a rather basic error in parseaddr, so I'm going to eventually dig far enough into the RFC to have a real opinion on the elided-space issue.

    @bitdancer
    Copy link
    Member

    OK, I've studied this more, and it looks to me like the legacy address format allows multiple atoms separated by white space in the local part of the address. This means that the correct parse would be

    ('', 'merwok wok@rusty.com')

    How useful this parse is is a good question. It is arguably better than losing the white space; however, the fact that it represents a behavior change and there's no actual user bug against this argues against backport. I do think it is better to conform to the RFC as much as possible, though, so I'd like to fix this in 3.2.

    Attached is a patch to the parser that preserves whitespace runs in between unquoted atoms in the local part.

    It would be interesting to know what other email programs do with such addresses.

    @merwok
    Copy link
    Member Author

    merwok commented Dec 15, 2010

    I have not read email RFCs, so I will defer to you. One suggestion for the patch, though: Use example.org instead of rusty.com (see RFC 2606).

    I tried the examples in Icedove (free Thunderbird), either it finds a matching contact or it refuses to send the message.

    @bitdancer
    Copy link
    Member

    I don't see any reason to use example.com in tests that are not talking to the network and aren't documentation.

    The interesting question about the other mailers is, if you *receive* an email with such an address (1) what does it show you and (2) what does it put into the To: field when you do a 'reply'? How you arrange to receive such a broken email, I'm not sure :)

    @bitdancer
    Copy link
    Member

    On the other hand, putting a real domain name that belongs to somebody else into our code base even as a test string is probably impolite without asking, so I'll change it when I commit.

    @merwok
    Copy link
    Member Author

    merwok commented Dec 15, 2010

    Yes, it’s either impolite or free advertisement.

    Ideas to receive such a malformed email: Use a valid email in From but not in Reply-To; write it by hand and put it in your maildir.

    @bitdancer
    Copy link
    Member

    Committed in r87384.

    Barry, I've added you as nosy in case you disagree with this fix. The essential point is that before, parseaddr would turn 'merwok wok@example.com' into 'merwokwok@example.com', and now it preserves the whitespace. My theory is that this loses data, that the obsolete syntax allows it, and that dropping the whitespace denies the application program the chance to apply its own heuristics. However applications might currently be depending on the parsed local part being a single token, so I don't plan to backport this.

    @warsaw
    Copy link
    Member

    warsaw commented Dec 18, 2010

    On Dec 18, 2010, at 06:31 PM, R. David Murray wrote:

    Barry, I've added you as nosy in case you disagree with this fix. The
    essential point is that before, parseaddr would turn 'merwok wok@example.com'
    into 'merwokwok@example.com', and now it preserves the whitespace. My theory
    is that this loses data, that the obsolete syntax allows it, and that
    dropping the whitespace denies the application program the chance to apply
    its own heuristics. However applications might currently be depending on the
    parsed local part being a single token, so I don't plan to backport this.

    Thanks. I agree with the fix, and not back porting it.

    @merwok
    Copy link
    Member Author

    merwok commented Dec 19, 2010

    Thanks for the explanations and fix!

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants