classification
Title: RFC 6531 (SMTPUTF8) support in smtpd
Type: enhancement Stage: resolved
Components: email Versions: Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: barry, jesstess, pitrou, python-dev, r.david.murray, serhiy.storchaka, zvyn
Priority: normal Keywords: patch

Created on 2014-06-11 16:42 by r.david.murray, last changed 2014-08-12 10:11 by zvyn. This issue is now closed.

Files
File name Uploaded Description Edit
issue21725.patch zvyn, 2014-06-11 21:50
issue21725-fixed.patch zvyn, 2014-06-11 22:17
issue21725-fixed-hg.patch zvyn, 2014-06-11 22:35 review
issue21725v2.patch zvyn, 2014-06-17 00:40 review
issue21725v3.patch zvyn, 2014-06-17 23:03 review
issue21725v4.patch r.david.murray, 2014-07-24 16:52 review
issue21725v5.patch zvyn, 2014-07-29 01:56 review
issue21725v5.1.patch zvyn, 2014-07-29 23:10 DebuggingServer is now fully tested. review
issue21725v5.2.patch zvyn, 2014-08-08 23:46 Fixed (false positive) BytesWarning by using repr instead of str. review
issue21725v5.3.patch r.david.murray, 2014-08-09 15:43 review
issue21725v5.4.patch zvyn, 2014-08-12 09:51 review
Messages (20)
msg220285 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-06-11 16:42
I thought there was already an issue for this, but I can't find it.  This is part of this summer's GSOC work, and involves adding support for the SMTPUTF8 extension to smtpd.
msg220287 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-06-11 17:03
Milan: your patch looks good for the most part.  Now that I committed the decode_data patch you should modify it so that SMTPUTF8 implies decode_data=False (otherwise we *don't* have an 8-bit-clean channel).  Please either attach the modified patch here or link the repository such that the patch can be generated (for rietveld review).

I'll need to play with it for a bit to make sure there's nothing more needed.
msg220312 - (view) Author: Milan Oberkirch (zvyn) * Date: 2014-06-11 21:50
I merged it into https://bitbucket.org/zvyn/cpython and made a diff from that. (I used bash for this so hopefully its compatible to the review system here.)
msg220317 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-06-11 22:35
Hmm.  There's a conflict marker in that patch.

I *think* that if you do an 'hg pull --rebase' and then give your repository URL to the tracker, the 'create patch' button will do the right thing.  (I tried it with the URL you sent me and it generated a diff that contained unrelated changes, which is why I suggest the hg pull --rebase).
msg220387 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-06-12 21:31
For some reason the diff shown by the review link is very different from the one show by the patch file itself.  I'm not sure what is causing that, since the diff appears to be in the correct hg format.  I don't even know where reitveld is getting the stuff on the left hand side.

So, comments:

Instead of issuing a warning for decode_data=True when smtputf8 is specified, having decode_data explicitly set to True when smtputf8 is specified should be a ValueError.

Your modifications to the handling of max_command_size are technically buggy (and clearly there's no test for it): when we aren't in extended smtp mode, the limit should be the hardcoded value, as in the original code.  My reading of the RFC (specifically 4.1.1.5) indicates that it is legal to send a new HELO or EHLO command without issuing a QUIT, meaning a single SMTPChannel instance might have to handle both.  This isn't a likely scenario except in testing...but testing is probably one of the main uses of smtpd, so I think we ought to support it.

Also, the code you use to update the max length value for MAIL takes a bit of thought to understand.  I think it would be clearer to use an unconditional set of the value based on the command_size_limit constant, which you'll have to restore to address my previous comment, instead of the pop (though the pop is a nice trick :).

The answer to your 'XXX should we check this?' is no.  Remember that just because the client *says* SMTPUTF8, that doesn't mean the message actually has to be SMTPUTF8.  So whatever bytes it sends need to be passed through.

When SMTPUTF8 is not turned on, the server should act like it doesn't know anything about it, which means that if SMTPUTF8 is specified on the BODY command it should be treated just like an unknown parameter would be.  We shouldn't leak the fact that we "know about" SMTPUTF8 if the server hasn't been enabled for SMTPUTF8.

There is one piece missing here: providing a way for process_message to know that the client claimed the message was using SMTPUTF8 (we know it might lie, but we need to transmit what the client said).  Since process_message is an 'override in subclass' API, we can't just extend it's call signature.  So I think we need to provide a new process_smtputf8_message method that will be called for such messages.

Note that you also need to reset the 'this message is SMTPUTF8 flag' on RSET, HELO/EHLO, and at the completion of the call to the new process_message API.

This notes should also give you some clues about what additional test need to be written.
msg220388 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-06-12 21:43
Correction on the XXX should we check this: I was thinking about the wrong section of the code.  But it is still 'no': by postel's law we should accept dirty data.  Currently the consumer of the library can then decide whether or not to reject the dirty data by building a subclass.  It might be nice to provide an option to control rejection of invalid characters in commands, but that should be a separate issue.  (And, we are ending up with so many options that we might want to think about whether or not there is a better API for controlling them).
msg220512 - (view) Author: Milan Oberkirch (zvyn) * Date: 2014-06-13 23:45
Thanks a lot for your feedback!

> Your modifications to the handling of max_command_size are technically buggy (and clearly there's no test for it): when we aren't in extended smtp mode, the limit should be the hardcoded value, as in the original code.  My reading of the RFC (specifically 4.1.1.5) indicates that it is legal to send a new HELO or EHLO command without issuing a QUIT, meaning a single SMTPChannel instance might have to handle both.  This isn't a likely scenario except in testing...but testing is probably one of the main uses of smtpd, so I think we ought to support it.

I agree that Section 4.1.1 should be interpreted that way and that
should probably get fixed. The old implementation is very strict about
EHLO/HELO being allowed exactly once. And there are tests to verify this
bug exists.
We could allow it for SMTPUTF8 since we do not need to care about bug
compatibility if someone uses enable_SMTPUTF8 and we document the change.

My intention was to leave the command_size_limits untouched on HELO so
it will return 512 on every request (as max_command_size does then). And
the existing implementation guaranteed me there is only one of HELO or EHLO.

> I think it would be clearer to use an unconditional set of the value based on the command_size_limit constant, which you'll have to restore to address my previous comment, instead of the pop (though the pop is a nice trick :).
I agree on replacing the .pop() with command_size_limits['MAIL'] =
command_size_limit. I see no easier/shorter way to add up the values
right now however.

> There is one piece missing here: providing a way for process_message to know that the client claimed the message was using SMTPUTF8 (we know it might lie, but we need to transmit what the client said).  Since process_message is an 'override in subclass' API, we can't just extend it's call signature.  So I think we need to provide a new process_smtputf8_message method that will be called for such messages.
Thanks for pointing that out, I totally missed that!

> Note that you also need to reset the 'this message is SMTPUTF8 flag' on RSET, HELO/EHLO, and at the completion of the call to the new process_message API.
Agreed.

I already implemented the things that are clear but am not sure how to
handle the first point about multiple HELO/EHLO.

- Milan
msg220784 - (view) Author: Milan Oberkirch (zvyn) * Date: 2014-06-17 00:40
The new patch implements what you suggested, with the following differences:
- I still use the default_dict and add the numbers up but maybe it's a bit more readable now?
- Support for HELO/EHLO is a seperate issue #21783
msg220869 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-06-17 19:48
Review comments added.
msg220910 - (view) Author: Milan Oberkirch (zvyn) * Date: 2014-06-17 23:03
I replied to some review comments, and made a new patch implementing your suggestions.
msg223858 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-07-24 16:52
I have updated your patch, changing wording of some of the documentation, and applying style tweaks to the code.  I have also done the refactoring of the _set_xxx_state method that I suggested.  Seemed easier to show you what I meant in code rather than try to explain it.

You should use Reitveld's 'patch diff' facility to look at the changes that I made to your patch. 

Things that remain to be done:

I'm getting warnings when I run the tests.  These should be either suppressed or checked for (assertWarns).  (Or fixed, in the case of assertEquals :)

In the process_message docs it says that it should use RFC 821 format for the return, but in the new process method it says RFC 6531.  This makes sense for the new method, but is there a more recent RFC the old method should be referring to?  Or is there an open issue about more modern return codes for smtpd?  I seem to remember something, but don't have time to look now.

There is an issue with the reset of the maximum command length, but that can be dealt with in the 'duplicate HELO/EHLO' issue.

process_smtputf8_message in DebuggingServer should be receiving bytes, and so should error when it tries to split via '\n'.  Presumably this means there is a missing test as well, and also the same issue if decode_data is False when using DebuggingServer.  (Unless I"m missing something; I didn't try to test it.)

Different issue, but have you given any thought to what it would take to make PureProxy support SMTPUTF8?
msg224212 - (view) Author: Milan Oberkirch (zvyn) * Date: 2014-07-29 01:56
Thanks for the review and improvements!

I fixed the warnings (didn't see them with running tests as described in the dev guide; I may improve that later), updated the docs to RFC 5321 (issue 19679 is referring to RFC 3463 which is not required/updated/obsoleted by RFC 5321), and fixed the _print_message method of the DebuggingServer.

Testing DebuggingServer seems difficult because it prints to stdout so I'm just checking that no exceptions are raised.

I'm aware of the maximum command length thing but would fix it in issue 21783 (because it becomes a bug when dealing with that).
msg224222 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-07-29 12:03
You can use test.support.captured_stdout or whatever its name is.
msg225105 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-08-09 15:43
I tweaked your additions.  Instead of trying to strip out the 'b' to make things "look pretty", I think it is better to print them and thus make it explicit when we are dealing with binary data and when we are dealing with strings.  It also clues the user in to the fact that the escaping inside the string is that used for bytes display, as opposed to string display.  I also expanded the tests to include headers so that the 'X-Peer' header addition would get tested.  This reveals a bug; whether it is in the tests or the code I don't know: the value of self.peer in the process_ methods is 'peer', not the expected ('b', 0).  If you can look in to this issue it would be great.

That looks like the last issue that needs to be addressed before commit.

(Oh, yeah, and there was a bug in the original process_message code...it was printing an extra blank line at the end of the message text if there was, as there should be, a newline at the end of the message...I fixed that by switching to splitlines instead of split.)
msg225106 - (view) Author: Milan Oberkirch (zvyn) * Date: 2014-08-09 16:56
I think that the peer arg is supposed to be set to the address of the peer connecting to our server.
The value 'peer' comes from test/mock_socket.py:105 and is the best answer we can get for mock_sock.getpeername() because there is no real client when directly writing into the socket. "('b', 0)" would be the address of the remote server.
msg225111 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-08-09 20:42
OK, it's a bug in mock_socket, then.  getpeername should be returning a tuple (address, port).

I went ahead and fixed it, and committed the patch.

Thanks Milan!  Sorry the reviews were so delayed.
msg225199 - (view) Author: Roundup Robot (python-dev) Date: 2014-08-11 18:36
New changeset ca696ca204e0 by R David Murray in branch 'default':
#21725: Add RFC 6531 (SMTPUTF8) support to smtpd.
http://hg.python.org/cpython/rev/ca696ca204e0
msg225221 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-08-12 07:18
Adding new parameter in the middle of positional argument list can break existing code. If you want change arguments order, make their keyword-only.
msg225223 - (view) Author: Milan Oberkirch (zvyn) * Date: 2014-08-12 09:51
Thanks for your review comments, serhiy.storchaka!
I may be blind right now, but where did I add a positional parameter?
msg225226 - (view) Author: Milan Oberkirch (zvyn) * Date: 2014-08-12 10:11
Sorry, I was blind (switching between languages to much)! Anyway there should be no existing code using decode_data as it was introduced in this development circle.
History
Date User Action Args
2014-08-12 10:11:34zvynsetmessages: + msg225226
2014-08-12 09:51:37zvynsetfiles: + issue21725v5.4.patch

messages: + msg225223
2014-08-12 07:18:38serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg225221
2014-08-11 18:36:22python-devsetnosy: + python-dev
messages: + msg225199
2014-08-09 20:42:54r.david.murraysetstatus: open -> closed
resolution: fixed
messages: + msg225111

stage: patch review -> resolved
2014-08-09 16:56:40zvynsetmessages: + msg225106
2014-08-09 15:43:56r.david.murraysetfiles: + issue21725v5.3.patch

messages: + msg225105
2014-08-08 23:46:45zvynsetfiles: + issue21725v5.2.patch
2014-07-29 23:11:00zvynsetfiles: + issue21725v5.1.patch
2014-07-29 12:03:54r.david.murraysetmessages: + msg224222
2014-07-29 01:56:59zvynsetfiles: + issue21725v5.patch

messages: + msg224212
2014-07-24 16:52:27r.david.murraysetfiles: + issue21725v4.patch

messages: + msg223858
2014-06-17 23:04:00zvynsetfiles: + issue21725v3.patch

messages: + msg220910
2014-06-17 19:48:03r.david.murraysetmessages: + msg220869
2014-06-17 00:40:51zvynsetfiles: + issue21725v2.patch

messages: + msg220784
2014-06-13 23:45:34zvynsetmessages: + msg220512
2014-06-12 21:43:41r.david.murraysetmessages: + msg220388
2014-06-12 21:31:32r.david.murraysetmessages: + msg220387
2014-06-11 22:35:23zvynsetfiles: + issue21725-fixed-hg.patch
2014-06-11 22:35:22r.david.murraysetmessages: + msg220317
2014-06-11 22:17:33zvynsetfiles: + issue21725-fixed.patch
2014-06-11 22:17:15zvynsetfiles: - issue21725-fixed.patch
2014-06-11 22:10:56zvynsetfiles: + issue21725-fixed.patch
2014-06-11 21:50:44zvynsetfiles: + issue21725.patch

messages: + msg220312
2014-06-11 17:06:17r.david.murraysetfiles: - 80ea1cdf2b23.diff
2014-06-11 17:06:10r.david.murraysethgrepos: - hgrepo256
2014-06-11 17:03:23r.david.murraysetmessages: + msg220287
2014-06-11 16:44:35r.david.murraysetfiles: + 80ea1cdf2b23.diff
keywords: + patch
2014-06-11 16:43:23r.david.murraysethgrepos: + hgrepo256
2014-06-11 16:42:30r.david.murraycreate