classification
Title: imaplib RFC 6855 issue
Type: Stage: patch review
Components: email Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Sam Varshavchik, barry, gordonmessmer, maciej.szulik, r.david.murray
Priority: normal Keywords: patch

Created on 2018-07-17 12:31 by Sam Varshavchik, last changed 2018-09-20 03:43 by gordonmessmer.

Pull Requests
URL Status Linked Edit
PR 9436 open python-dev, 2018-09-20 03:30
Messages (5)
msg321819 - (view) Author: Sam Varshavchik (Sam Varshavchik) Date: 2018-07-17 12:31
Greetings. I am in the process of implementing RFC 6855 in Courier-IMAP. A Google search for IMAP clients that implement RFC 6855 led me to https://bugs.python.org/issue21800 and looking over the code that was added to imaplib, to support RFC 6855, a few things stood out. I checked, and the changes introduces in 21800 still appear to be unchanged in https://github.com/python/cpython/blob/master/Lib/imaplib.py 

Issue 21800 modified sub append(), that implements the IMAP APPEND command, thusly:

-        self.literal = MapCRLF.sub(CRLF, message)
+        literal = MapCRLF.sub(CRLF, message)
+        if self.utf8_enabled:
+            literal = b'UTF8 (' + literal + b')'
+        self.literal = literal

"literal" here appears to be the contents of the message with CRLF line ending. But section 4 of https://tools.ietf.org/html/rfc6855.html states:

  The ABNF for the "APPEND" data extension and "CATENATE" extension 
  follows:

        utf8-literal   = "UTF8" SP "(" literal8 ")"

        literal8       = <Defined in RFC 4466>

        append-data    =/ utf8-literal

        cat-part       =/ utf8-literal

As indicated above, "literal8" comes from RFC 4466, which also defines "append-data". RFC 4466 additionally states:

   In addition, the non-terminal "literal8" defined in [BINARY] got
   extended to allow for non-synchronizing literals if both [BINARY] and
   [LITERAL+] extensions are supported by the server.

I'll come back to this revealing paragraph in a moment, but, as stated, "literal8" actually comes from [BINARY] which is RFC 3516, which specifies the following:

   append         =/  "APPEND" SP mailbox [SP flag-list]
                      [SP date-time] SP literal8

   fetch-att      =/  "BINARY" [".PEEK"] section-binary [partial]
                      / "BINARY.SIZE" section-binary

   literal8       =   "~{" number "}" CRLF *OCTET
                      ; <number> represents the number of OCTETs
                      ; in the response string.

An exhaustive search of imaplib.py seems to indicate that this pesky tilde is in hiding. And the wrong thing seems to be quoted as the actual literal. Anyway, back to the RFCs: combine all of the above together, spin it in a blender, and you get the following result:

Supposing that the message being appended consists of a single header line "Subject: test", and a blank line, a sample command of what actually goes out the wire (based on the above, and other parts of these, and related RFCs):

APPEND INBOX NIL NIL UTF8 (~{17}<CR><LF>Subject: test<CR><LF><CR><LF>)<CR><LF>

I haven't tested imaplib against Courier-IMAP in this respect, but it doesn't seem like this is going to be results.

But wait, there's more!

"literal8" is a synchronizing literal, like "literal" from RFC 3501, which specifies:

                                              ...In the case of
   literals transmitted from client to server, the client MUST wait
   to receive a command continuation request (described later in
   this document) before sending the octet data (and the remainder
   of the command).

The LITERAL+ IMAP extension, that was mentioned in the excerpt from RFC 4466 that I cited above, introduced non-synchronizing literals:

   The protocol receiver of an IMAP4 server must check the end of every
   received line for an open brace ('{') followed by an octet count, a
   plus ('+'), and a close brace ('}') immediately preceeding the CRLF.
   If it finds this sequence, it is the octet count of a non-
   synchronizing literal and the server MUST treat the specified number
   of following octets and the following line as part of the same
   command.

Otherwise, after the closing brace and the <CR><LF> the IMAP client must wait for the continuation response from the server.

So, to summarize:

1) RFC 4466, combined with RFC 6855 an IMAP UTF-8 client talking to an IMAP UTF-8 server can send the following, on the wire, if the server supports LITERAL+:

APPEND INBOX NIL NIL UTF8 (~{17+}<CR><LF>Subject: test<CR><LF><CR><LF>)<CR><LF>

2) But, if the server did not advertise LITERAL+, the IMAP client is required to send only:

APPEND INBOX NIL NIL UTF8 (~{17}<CR><LF>

Then wait for the continuation response from the server, then send the rest of the command.

IMAP specifications have been painful to read, for the 20+ years I've been reading them. Historically there's been a lot of interoperability problems between IMAP clients and servers. I lay the blame squarely on the horrible specs, but that's off-topic. Suffice to say, nothing of that sort has been observed for POP3 and SMTP, and I think there's a very good reason for that.
msg321887 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-07-18 15:12
Would you care to propose a patch?  That's likely the only way this is going to get fixed, unfortunately, as currently we have no one on the core team interested in imaplib.  Which means it is also going to be hard to come up with someone to do the review (I'm the most likely candidate and have a distinct lack of time currently), but we'll deal with that when we get that far.
msg321888 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-07-18 15:14
Maybe we'll be luck and Maciej will still be interested :)
msg321920 - (view) Author: Sam Varshavchik (Sam Varshavchik) Date: 2018-07-18 22:37
I don't have sufficient python or imaplib exposure to be able to implement full UTF8 APPEND functionality. I was merely investigating and researching what IMAP UTF8 support there was, in all existing client and server code I knew of.

What I can propose is to reverse this part of the original change:

@@ -360,7 +380,10 @@
             date_time = Time2Internaldate(date_time)
         else:
             date_time = None
-        self.literal = MapCRLF.sub(CRLF, message)
+        literal = MapCRLF.sub(CRLF, message)
+        if self.utf8_enabled:
+            literal = b'UTF8 (' + literal + b')'
+        self.literal = literal
         return self._simple_command(name, mailbox, flags, date_time)

I don't see that the original patch added any code to test_imaplib.py to test UTF8 literals with APPEND. So what this should do is, is go back and use the pre-UTF8, RFC 3501 APPEND syntax, with no existing unit test fall-out.

Which is fine. IMAP UTF8 clients are not required to use UTF8 literals with APPEND. Enabling UTF8 in the IMAP server does not require using the UTF8 version of APPEND. It's only required if the IMAP client wishes to send a message with UTF8 headers to the IMAP server.

I also looked into mutt's source, and mutt appears to be taking the same approach. It enables UTF8 mode in the IMAP server, and swallows UTF8 E-mail, and deals with folders whose names are now encoded in UTF8, instead of RFC3501 IMAP's modified-UTF7 encoding convention. But I did not see anything in mutt that used UTF8 literals with APPEND. Searching mutt's source for APPEND code finds only one instance which sends the non-UTF8 literal. Looks like mutt will accept UTF8 mail, but not generate them itself. Not sure what mutt does creating a reply to E-mail with a UTF8 E-mail address. I don't use mutt, but I'll test that.

It does not surprise me, that this did not come up previously. All three other Libre IMAP server that I know of: UW-IMAP, Cyrus, and Dovecot, do not implement RFC 6855. Unless one of them is currently working on it, Courier will be the first one to support it. But, I have other sources that confirm otherwise.

I fully understand your lack of interest in imaplib (I really do), and I wish I had more Python background to help out here, myself. This is as much as I can propose right now, with some level of confidence in my meager Python skills. I mostly revolve in C++, C, and Perl orbits.

If in the future more interest develops in improving IMAP support, I'm reachable and I'll be open to integration testing as much as my own time permits, in these matters...
msg325836 - (view) Author: Gordon Messmer (gordonmessmer) * Date: 2018-09-20 03:43
PR 9436 should resolve the issue.

Since the RFC requires the "UTF8 (" prefix in the "data" and not in the literal, that had to be moved into the _command function.

This change should only affect the append() use, as that is currently the only function that sets self.literal to a value that is not a function.  The authenticate() use is not affected, and no other functions set self.literal.
History
Date User Action Args
2018-09-20 03:43:57gordonmessmersetnosy: + gordonmessmer
messages: + msg325836
2018-09-20 03:30:46python-devsetkeywords: + patch
stage: patch review
pull_requests: + pull_request8849
2018-07-18 22:37:07Sam Varshavchiksetmessages: + msg321920
2018-07-18 15:14:21r.david.murraysetnosy: + maciej.szulik
messages: + msg321888
2018-07-18 15:12:57r.david.murraysetmessages: + msg321887
title: RFC 6855 issue -> imaplib RFC 6855 issue
2018-07-17 12:31:07Sam Varshavchikcreate