Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC 6531 (SMTPUTF8) support in smtpd #65924

Closed
bitdancer opened this issue Jun 11, 2014 · 20 comments
Closed

RFC 6531 (SMTPUTF8) support in smtpd #65924

bitdancer opened this issue Jun 11, 2014 · 20 comments
Labels
topic-email type-feature A feature request or enhancement

Comments

@bitdancer
Copy link
Member

BPO 21725
Nosy @warsaw, @pitrou, @bitdancer, @serhiy-storchaka, @zvyn
Files
  • issue21725.patch
  • issue21725-fixed.patch
  • issue21725-fixed-hg.patch
  • issue21725v2.patch
  • issue21725v3.patch
  • issue21725v4.patch
  • issue21725v5.patch
  • issue21725v5.1.patch: DebuggingServer is now fully tested.
  • issue21725v5.2.patch: Fixed (false positive) BytesWarning by using repr instead of str.
  • issue21725v5.3.patch
  • issue21725v5.4.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2014-08-09.20:42:54.173>
    created_at = <Date 2014-06-11.16:42:30.960>
    labels = ['type-feature', 'expert-email']
    title = 'RFC 6531 (SMTPUTF8) support in smtpd'
    updated_at = <Date 2014-08-12.10:11:34.839>
    user = 'https://github.com/bitdancer'

    bugs.python.org fields:

    activity = <Date 2014-08-12.10:11:34.839>
    actor = 'zvyn'
    assignee = 'none'
    closed = True
    closed_date = <Date 2014-08-09.20:42:54.173>
    closer = 'r.david.murray'
    components = ['email']
    creation = <Date 2014-06-11.16:42:30.960>
    creator = 'r.david.murray'
    dependencies = []
    files = ['35577', '35579', '35580', '35663', '35679', '36069', '36149', '36157', '36323', '36328', '36357']
    hgrepos = []
    issue_num = 21725
    keywords = ['patch']
    message_count = 20.0
    messages = ['220285', '220287', '220312', '220317', '220387', '220388', '220512', '220784', '220869', '220910', '223858', '224212', '224222', '225105', '225106', '225111', '225199', '225221', '225223', '225226']
    nosy_count = 7.0
    nosy_names = ['barry', 'pitrou', 'r.david.murray', 'jesstess', 'python-dev', 'serhiy.storchaka', 'zvyn']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue21725'
    versions = ['Python 3.5']

    @bitdancer
    Copy link
    Member Author

    I thought there was already an issue for this, but I can't find it. This is part of this summer's GSOC work, and involves adding support for the SMTPUTF8 extension to smtpd.

    @bitdancer bitdancer added topic-email type-feature A feature request or enhancement labels Jun 11, 2014
    @bitdancer
    Copy link
    Member Author

    Milan: your patch looks good for the most part. Now that I committed the decode_data patch you should modify it so that SMTPUTF8 implies decode_data=False (otherwise we *don't* have an 8-bit-clean channel). Please either attach the modified patch here or link the repository such that the patch can be generated (for rietveld review).

    I'll need to play with it for a bit to make sure there's nothing more needed.

    @zvyn
    Copy link
    Mannequin

    zvyn mannequin commented Jun 11, 2014

    I merged it into https://bitbucket.org/zvyn/cpython and made a diff from that. (I used bash for this so hopefully its compatible to the review system here.)

    @bitdancer
    Copy link
    Member Author

    Hmm. There's a conflict marker in that patch.

    I *think* that if you do an 'hg pull --rebase' and then give your repository URL to the tracker, the 'create patch' button will do the right thing. (I tried it with the URL you sent me and it generated a diff that contained unrelated changes, which is why I suggest the hg pull --rebase).

    @bitdancer
    Copy link
    Member Author

    For some reason the diff shown by the review link is very different from the one show by the patch file itself. I'm not sure what is causing that, since the diff appears to be in the correct hg format. I don't even know where reitveld is getting the stuff on the left hand side.

    So, comments:

    Instead of issuing a warning for decode_data=True when smtputf8 is specified, having decode_data explicitly set to True when smtputf8 is specified should be a ValueError.

    Your modifications to the handling of max_command_size are technically buggy (and clearly there's no test for it): when we aren't in extended smtp mode, the limit should be the hardcoded value, as in the original code. My reading of the RFC (specifically 4.1.1.5) indicates that it is legal to send a new HELO or EHLO command without issuing a QUIT, meaning a single SMTPChannel instance might have to handle both. This isn't a likely scenario except in testing...but testing is probably one of the main uses of smtpd, so I think we ought to support it.

    Also, the code you use to update the max length value for MAIL takes a bit of thought to understand. I think it would be clearer to use an unconditional set of the value based on the command_size_limit constant, which you'll have to restore to address my previous comment, instead of the pop (though the pop is a nice trick :).

    The answer to your 'XXX should we check this?' is no. Remember that just because the client *says* SMTPUTF8, that doesn't mean the message actually has to be SMTPUTF8. So whatever bytes it sends need to be passed through.

    When SMTPUTF8 is not turned on, the server should act like it doesn't know anything about it, which means that if SMTPUTF8 is specified on the BODY command it should be treated just like an unknown parameter would be. We shouldn't leak the fact that we "know about" SMTPUTF8 if the server hasn't been enabled for SMTPUTF8.

    There is one piece missing here: providing a way for process_message to know that the client claimed the message was using SMTPUTF8 (we know it might lie, but we need to transmit what the client said). Since process_message is an 'override in subclass' API, we can't just extend it's call signature. So I think we need to provide a new process_smtputf8_message method that will be called for such messages.

    Note that you also need to reset the 'this message is SMTPUTF8 flag' on RSET, HELO/EHLO, and at the completion of the call to the new process_message API.

    This notes should also give you some clues about what additional test need to be written.

    @bitdancer
    Copy link
    Member Author

    Correction on the XXX should we check this: I was thinking about the wrong section of the code. But it is still 'no': by postel's law we should accept dirty data. Currently the consumer of the library can then decide whether or not to reject the dirty data by building a subclass. It might be nice to provide an option to control rejection of invalid characters in commands, but that should be a separate issue. (And, we are ending up with so many options that we might want to think about whether or not there is a better API for controlling them).

    @zvyn
    Copy link
    Mannequin

    zvyn mannequin commented Jun 13, 2014

    Thanks a lot for your feedback!

    Your modifications to the handling of max_command_size are technically buggy (and clearly there's no test for it): when we aren't in extended smtp mode, the limit should be the hardcoded value, as in the original code. My reading of the RFC (specifically 4.1.1.5) indicates that it is legal to send a new HELO or EHLO command without issuing a QUIT, meaning a single SMTPChannel instance might have to handle both. This isn't a likely scenario except in testing...but testing is probably one of the main uses of smtpd, so I think we ought to support it.

    I agree that Section 4.1.1 should be interpreted that way and that
    should probably get fixed. The old implementation is very strict about
    EHLO/HELO being allowed exactly once. And there are tests to verify this
    bug exists.
    We could allow it for SMTPUTF8 since we do not need to care about bug
    compatibility if someone uses enable_SMTPUTF8 and we document the change.

    My intention was to leave the command_size_limits untouched on HELO so
    it will return 512 on every request (as max_command_size does then). And
    the existing implementation guaranteed me there is only one of HELO or EHLO.

    I think it would be clearer to use an unconditional set of the value based on the command_size_limit constant, which you'll have to restore to address my previous comment, instead of the pop (though the pop is a nice trick :).
    I agree on replacing the .pop() with command_size_limits['MAIL'] =
    command_size_limit. I see no easier/shorter way to add up the values
    right now however.

    There is one piece missing here: providing a way for process_message to know that the client claimed the message was using SMTPUTF8 (we know it might lie, but we need to transmit what the client said). Since process_message is an 'override in subclass' API, we can't just extend it's call signature. So I think we need to provide a new process_smtputf8_message method that will be called for such messages.
    Thanks for pointing that out, I totally missed that!

    Note that you also need to reset the 'this message is SMTPUTF8 flag' on RSET, HELO/EHLO, and at the completion of the call to the new process_message API.
    Agreed.

    I already implemented the things that are clear but am not sure how to
    handle the first point about multiple HELO/EHLO.

    • Milan

    @zvyn
    Copy link
    Mannequin

    zvyn mannequin commented Jun 17, 2014

    The new patch implements what you suggested, with the following differences:

    • I still use the default_dict and add the numbers up but maybe it's a bit more readable now?
    • Support for HELO/EHLO is a seperate issue bpo-21783

    @bitdancer
    Copy link
    Member Author

    Review comments added.

    @zvyn
    Copy link
    Mannequin

    zvyn mannequin commented Jun 17, 2014

    I replied to some review comments, and made a new patch implementing your suggestions.

    @bitdancer
    Copy link
    Member Author

    I have updated your patch, changing wording of some of the documentation, and applying style tweaks to the code. I have also done the refactoring of the _set_xxx_state method that I suggested. Seemed easier to show you what I meant in code rather than try to explain it.

    You should use Reitveld's 'patch diff' facility to look at the changes that I made to your patch.

    Things that remain to be done:

    I'm getting warnings when I run the tests. These should be either suppressed or checked for (assertWarns). (Or fixed, in the case of assertEquals :)

    In the process_message docs it says that it should use RFC 821 format for the return, but in the new process method it says RFC 6531. This makes sense for the new method, but is there a more recent RFC the old method should be referring to? Or is there an open issue about more modern return codes for smtpd? I seem to remember something, but don't have time to look now.

    There is an issue with the reset of the maximum command length, but that can be dealt with in the 'duplicate HELO/EHLO' issue.

    process_smtputf8_message in DebuggingServer should be receiving bytes, and so should error when it tries to split via '\n'. Presumably this means there is a missing test as well, and also the same issue if decode_data is False when using DebuggingServer. (Unless I"m missing something; I didn't try to test it.)

    Different issue, but have you given any thought to what it would take to make PureProxy support SMTPUTF8?

    @zvyn
    Copy link
    Mannequin

    zvyn mannequin commented Jul 29, 2014

    Thanks for the review and improvements!

    I fixed the warnings (didn't see them with running tests as described in the dev guide; I may improve that later), updated the docs to RFC 5321 (bpo-19679 is referring to RFC 3463 which is not required/updated/obsoleted by RFC 5321), and fixed the _print_message method of the DebuggingServer.

    Testing DebuggingServer seems difficult because it prints to stdout so I'm just checking that no exceptions are raised.

    I'm aware of the maximum command length thing but would fix it in bpo-21783 (because it becomes a bug when dealing with that).

    @bitdancer
    Copy link
    Member Author

    You can use test.support.captured_stdout or whatever its name is.

    @bitdancer
    Copy link
    Member Author

    I tweaked your additions. Instead of trying to strip out the 'b' to make things "look pretty", I think it is better to print them and thus make it explicit when we are dealing with binary data and when we are dealing with strings. It also clues the user in to the fact that the escaping inside the string is that used for bytes display, as opposed to string display. I also expanded the tests to include headers so that the 'X-Peer' header addition would get tested. This reveals a bug; whether it is in the tests or the code I don't know: the value of self.peer in the process_ methods is 'peer', not the expected ('b', 0). If you can look in to this issue it would be great.

    That looks like the last issue that needs to be addressed before commit.

    (Oh, yeah, and there was a bug in the original process_message code...it was printing an extra blank line at the end of the message text if there was, as there should be, a newline at the end of the message...I fixed that by switching to splitlines instead of split.)

    @zvyn
    Copy link
    Mannequin

    zvyn mannequin commented Aug 9, 2014

    I think that the peer arg is supposed to be set to the address of the peer connecting to our server.
    The value 'peer' comes from test/mock_socket.py:105 and is the best answer we can get for mock_sock.getpeername() because there is no real client when directly writing into the socket. "('b', 0)" would be the address of the remote server.

    @bitdancer
    Copy link
    Member Author

    OK, it's a bug in mock_socket, then. getpeername should be returning a tuple (address, port).

    I went ahead and fixed it, and committed the patch.

    Thanks Milan! Sorry the reviews were so delayed.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Aug 11, 2014

    New changeset ca696ca204e0 by R David Murray in branch 'default':
    bpo-21725: Add RFC 6531 (SMTPUTF8) support to smtpd.
    http://hg.python.org/cpython/rev/ca696ca204e0

    @serhiy-storchaka
    Copy link
    Member

    Adding new parameter in the middle of positional argument list can break existing code. If you want change arguments order, make their keyword-only.

    @zvyn
    Copy link
    Mannequin

    zvyn mannequin commented Aug 12, 2014

    Thanks for your review comments, serhiy.storchaka!
    I may be blind right now, but where did I add a positional parameter?

    @zvyn
    Copy link
    Mannequin

    zvyn mannequin commented Aug 12, 2014

    Sorry, I was blind (switching between languages to much)! Anyway there should be no existing code using decode_data as it was introduced in this development circle.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    topic-email type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants