Issue34138
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2018-07-17 12:31 by Sam Varshavchik, last changed 2022-04-11 14:59 by admin.
Pull Requests | |||
---|---|---|---|
URL | Status | Linked | Edit |
PR 9436 | open | python-dev, 2018-09-20 03:30 |
Messages (5) | |||
---|---|---|---|
msg321819 - (view) | Author: Sam Varshavchik (Sam Varshavchik) | Date: 2018-07-17 12:31 | |
Greetings. I am in the process of implementing RFC 6855 in Courier-IMAP. A Google search for IMAP clients that implement RFC 6855 led me to https://bugs.python.org/issue21800 and looking over the code that was added to imaplib, to support RFC 6855, a few things stood out. I checked, and the changes introduces in 21800 still appear to be unchanged in https://github.com/python/cpython/blob/master/Lib/imaplib.py Issue 21800 modified sub append(), that implements the IMAP APPEND command, thusly: - self.literal = MapCRLF.sub(CRLF, message) + literal = MapCRLF.sub(CRLF, message) + if self.utf8_enabled: + literal = b'UTF8 (' + literal + b')' + self.literal = literal "literal" here appears to be the contents of the message with CRLF line ending. But section 4 of https://tools.ietf.org/html/rfc6855.html states: The ABNF for the "APPEND" data extension and "CATENATE" extension follows: utf8-literal = "UTF8" SP "(" literal8 ")" literal8 = <Defined in RFC 4466> append-data =/ utf8-literal cat-part =/ utf8-literal As indicated above, "literal8" comes from RFC 4466, which also defines "append-data". RFC 4466 additionally states: In addition, the non-terminal "literal8" defined in [BINARY] got extended to allow for non-synchronizing literals if both [BINARY] and [LITERAL+] extensions are supported by the server. I'll come back to this revealing paragraph in a moment, but, as stated, "literal8" actually comes from [BINARY] which is RFC 3516, which specifies the following: append =/ "APPEND" SP mailbox [SP flag-list] [SP date-time] SP literal8 fetch-att =/ "BINARY" [".PEEK"] section-binary [partial] / "BINARY.SIZE" section-binary literal8 = "~{" number "}" CRLF *OCTET ; <number> represents the number of OCTETs ; in the response string. An exhaustive search of imaplib.py seems to indicate that this pesky tilde is in hiding. And the wrong thing seems to be quoted as the actual literal. Anyway, back to the RFCs: combine all of the above together, spin it in a blender, and you get the following result: Supposing that the message being appended consists of a single header line "Subject: test", and a blank line, a sample command of what actually goes out the wire (based on the above, and other parts of these, and related RFCs): APPEND INBOX NIL NIL UTF8 (~{17}<CR><LF>Subject: test<CR><LF><CR><LF>)<CR><LF> I haven't tested imaplib against Courier-IMAP in this respect, but it doesn't seem like this is going to be results. But wait, there's more! "literal8" is a synchronizing literal, like "literal" from RFC 3501, which specifies: ...In the case of literals transmitted from client to server, the client MUST wait to receive a command continuation request (described later in this document) before sending the octet data (and the remainder of the command). The LITERAL+ IMAP extension, that was mentioned in the excerpt from RFC 4466 that I cited above, introduced non-synchronizing literals: The protocol receiver of an IMAP4 server must check the end of every received line for an open brace ('{') followed by an octet count, a plus ('+'), and a close brace ('}') immediately preceeding the CRLF. If it finds this sequence, it is the octet count of a non- synchronizing literal and the server MUST treat the specified number of following octets and the following line as part of the same command. Otherwise, after the closing brace and the <CR><LF> the IMAP client must wait for the continuation response from the server. So, to summarize: 1) RFC 4466, combined with RFC 6855 an IMAP UTF-8 client talking to an IMAP UTF-8 server can send the following, on the wire, if the server supports LITERAL+: APPEND INBOX NIL NIL UTF8 (~{17+}<CR><LF>Subject: test<CR><LF><CR><LF>)<CR><LF> 2) But, if the server did not advertise LITERAL+, the IMAP client is required to send only: APPEND INBOX NIL NIL UTF8 (~{17}<CR><LF> Then wait for the continuation response from the server, then send the rest of the command. IMAP specifications have been painful to read, for the 20+ years I've been reading them. Historically there's been a lot of interoperability problems between IMAP clients and servers. I lay the blame squarely on the horrible specs, but that's off-topic. Suffice to say, nothing of that sort has been observed for POP3 and SMTP, and I think there's a very good reason for that. |
|||
msg321887 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2018-07-18 15:12 | |
Would you care to propose a patch? That's likely the only way this is going to get fixed, unfortunately, as currently we have no one on the core team interested in imaplib. Which means it is also going to be hard to come up with someone to do the review (I'm the most likely candidate and have a distinct lack of time currently), but we'll deal with that when we get that far. |
|||
msg321888 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2018-07-18 15:14 | |
Maybe we'll be luck and Maciej will still be interested :) |
|||
msg321920 - (view) | Author: Sam Varshavchik (Sam Varshavchik) | Date: 2018-07-18 22:37 | |
I don't have sufficient python or imaplib exposure to be able to implement full UTF8 APPEND functionality. I was merely investigating and researching what IMAP UTF8 support there was, in all existing client and server code I knew of. What I can propose is to reverse this part of the original change: @@ -360,7 +380,10 @@ date_time = Time2Internaldate(date_time) else: date_time = None - self.literal = MapCRLF.sub(CRLF, message) + literal = MapCRLF.sub(CRLF, message) + if self.utf8_enabled: + literal = b'UTF8 (' + literal + b')' + self.literal = literal return self._simple_command(name, mailbox, flags, date_time) I don't see that the original patch added any code to test_imaplib.py to test UTF8 literals with APPEND. So what this should do is, is go back and use the pre-UTF8, RFC 3501 APPEND syntax, with no existing unit test fall-out. Which is fine. IMAP UTF8 clients are not required to use UTF8 literals with APPEND. Enabling UTF8 in the IMAP server does not require using the UTF8 version of APPEND. It's only required if the IMAP client wishes to send a message with UTF8 headers to the IMAP server. I also looked into mutt's source, and mutt appears to be taking the same approach. It enables UTF8 mode in the IMAP server, and swallows UTF8 E-mail, and deals with folders whose names are now encoded in UTF8, instead of RFC3501 IMAP's modified-UTF7 encoding convention. But I did not see anything in mutt that used UTF8 literals with APPEND. Searching mutt's source for APPEND code finds only one instance which sends the non-UTF8 literal. Looks like mutt will accept UTF8 mail, but not generate them itself. Not sure what mutt does creating a reply to E-mail with a UTF8 E-mail address. I don't use mutt, but I'll test that. It does not surprise me, that this did not come up previously. All three other Libre IMAP server that I know of: UW-IMAP, Cyrus, and Dovecot, do not implement RFC 6855. Unless one of them is currently working on it, Courier will be the first one to support it. But, I have other sources that confirm otherwise. I fully understand your lack of interest in imaplib (I really do), and I wish I had more Python background to help out here, myself. This is as much as I can propose right now, with some level of confidence in my meager Python skills. I mostly revolve in C++, C, and Perl orbits. If in the future more interest develops in improving IMAP support, I'm reachable and I'll be open to integration testing as much as my own time permits, in these matters... |
|||
msg325836 - (view) | Author: Gordon Messmer (gordonmessmer) * | Date: 2018-09-20 03:43 | |
PR 9436 should resolve the issue. Since the RFC requires the "UTF8 (" prefix in the "data" and not in the literal, that had to be moved into the _command function. This change should only affect the append() use, as that is currently the only function that sets self.literal to a value that is not a function. The authenticate() use is not affected, and no other functions set self.literal. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:03 | admin | set | github: 78319 |
2018-09-20 03:43:57 | gordonmessmer | set | nosy:
+ gordonmessmer messages: + msg325836 |
2018-09-20 03:30:46 | python-dev | set | keywords:
+ patch stage: patch review pull_requests: + pull_request8849 |
2018-07-18 22:37:07 | Sam Varshavchik | set | messages: + msg321920 |
2018-07-18 15:14:21 | r.david.murray | set | nosy:
+ maciej.szulik messages: + msg321888 |
2018-07-18 15:12:57 | r.david.murray | set | messages:
+ msg321887 title: RFC 6855 issue -> imaplib RFC 6855 issue |
2018-07-17 12:31:07 | Sam Varshavchik | create |