classification
Title: imaplib should support international mailbox names
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.5
process
Status: open Resolution:
Dependencies: 22598 Superseder:
Assigned To: Nosy List: BabakM, Hiroaki.Kawai, astsmtl, cfraire, dveeden, haypo, jamesh, jcea, loewis
Priority: normal Keywords:

Created on 2009-02-18 05:36 by jamesh, last changed 2015-08-07 04:40 by jcea.

Messages (19)
msg82408 - (view) Author: James Henstridge (jamesh) Date: 2009-02-18 05:36
The IMAP4rev1 specification allows for non-ASCII mailbox names using a
modified UTF-7 encoding (section 5.1.3 of RFC 2060 or 3501).  However,
the imaplib routines taking a mailbox name just pass the string straight
through without any encoding.

It would be useful if Python provided an encoder/decoder for the
modified UTF-7 encoding, and optionally if imaplib would perform the
encoding and decoding at the appropriate points.
msg82411 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-02-18 06:28
Can you provide a patch?
msg82510 - (view) Author: James Henstridge (jamesh) Date: 2009-02-20 03:00
I'll have a go at implementing the algorithm.  It looks like the
modifications to UTF-7 are large enough that you can't do a search and
replace on the output of the existing UTF-7 codec, so it'll probably
require new code.

Would String2Mailbox and Mailbox2String utility functions be appropriate
here?
msg82529 - (view) Author: Jean-Paul Calderone (exarkun) * (Python committer) Date: 2009-02-20 12:58
IMAP4 UTF-7 is implemented in Twisted -
<http://twistedmatrix.com/trac/browser/trunk/twisted/mail/imap4.py#L5385>,
<http://twistedmatrix.com/trac/browser/trunk/twisted/mail/test/test_imap.py#L58>.
 Feel free to re-use any of that code that would be helpful.
msg82539 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-02-20 17:22
I don't have a good understanding of imaplib; if you think it's
appropriate to provide the conversion through two functions, I trust you.
msg82795 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2009-02-27 00:04
> The IMAP4rev1 specification allows for non-ASCII mailbox 
> names using a modified UTF-7 encoding

UTF-7 already sounds like something horrible for me, but a *modified* 
UTF-7 encoding is something a little bit more strange for me. Why not 
reusing directly UTF-7.

(sorry, it's an off topic dummy question)
msg82797 - (view) Author: Jean-Paul Calderone (exarkun) * (Python committer) Date: 2009-02-27 00:14
> UTF-7 already sounds like something horrible for me, but a *modified*
> UTF-7 encoding is something a little bit more strange for me. Why not
> reusing directly UTF-7.

UTF-7 wasn't horrible for its time, but its time has very likely passed.
 Alas, changing a standard like IMAP4 is so difficult, this mistake will
be with us for a long time to come.

As for why IMAP4 uses a modified form of UTF-7, the RFC addresses this:

   The purpose of these modifications is to correct the following
   problems with UTF-7:

      1) UTF-7 uses the "+" character for shifting; this conflicts with
         the common use of "+" in mailbox names, in particular USENET
         newsgroup names.

      2) UTF-7's encoding is BASE64 which uses the "/" character; this
         conflicts with the use of "/" as a popular hierarchy delimiter.

      3) UTF-7 prohibits the unencoded usage of "\"; this conflicts with
         the use of "\" as a popular hierarchy delimiter.

      4) UTF-7 prohibits the unencoded usage of "~"; this conflicts with
         the use of "~" in some servers as a home directory indicator.

      5) UTF-7 permits multiple alternate forms to represent the same
         string; in particular, printable US-ASCII characters can be
         represented in encoded form.

Whether you are convinced by these arguments or not is, of course,
entirely up to you.  Note also, however, that the modified UTF-7 is not
mandated by the RFC:

   By convention, international mailbox names in IMAP4rev1 are specified
   using a modified version of the UTF-7 encoding described in [UTF-7].
   Modified UTF-7 may also be usable in servers that implement an
   earlier version of this protocol.

However, it seems stupid to say that the choice if encoding is only a
convention since there is no other way to communicate the choice of
encoding between client and server.
msg127176 - (view) Author: Hiroaki Kawai (Hiroaki.Kawai) Date: 2011-01-27 10:54
twisted's code does not work good for "\t", "\r", "\n", those characters must encoded in modified base64 form according to RFC 3501.
msg132013 - (view) Author: Александр Цамутали (astsmtl) Date: 2011-03-24 18:35
So noone is working on this issue ATM?
msg148224 - (view) Author: Babak M (BabakM) Date: 2011-11-24 02:39
There's a working implementation of this in PloneMailList.
http://svn.plone.org/svn/collective/mxmImapClient/trunk/imapUTF7.py
msg151859 - (view) Author: C Fraire (cfraire) Date: 2012-01-23 22:50
I've used the PloneMailList implementation in another project. It works well to add 'imap4-utf-7' as codec.

The twisted imap implementation seems to have been updated to properly support non-printable ASCII, but the twisted imap API is problematic for imaplib because twisted seems to expect its arguments to already be Python unicode.

So can we be specific about what kind of API change would satisfy this issue:

1) a number of API methods take one or more mailbox arguments. Of course, imaplib currently expects these to be ASCII, but what kind of argument should the methods take? UTF? Unicode? So would the library need a class property to describe an optional specified input encoding? Would it be expected to take Python unicode?

2) some methods, such as list and lsub, return mailbox names UTF-7 encoded and embedded in larger ASCII strings. Would imaplib be expected to alter the contents of these large strings and transform them into another other encoding (when a switch as described in 1) is active)?
msg215115 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2014-03-29 05:45
Being bitten by this today.
msg215116 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2014-03-29 05:48
Point 2 of cfraire message is a big issue.

What about leaving this problem to the library user simply providing two helper functions in the module to encode/decode mUTF-7?.
msg215117 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2014-03-29 06:24
Or a new encoder/decoder in "codecs" module.
msg228939 - (view) Author: Jean-Paul Calderone (exarkun) * (Python committer) Date: 2014-10-10 01:32
> the twisted imap API is problematic for imaplib because twisted seems to expect its arguments to already be Python unicode.

Could you elaborate on this?  As far as I can tell, it works fine:

    >>> import twisted.mail.imap4
    >>> print u"Hello, \N{SNOWMAN}".encode('imap4-utf-7')
    Hello, &JgM-
    >>> print b'Hello, &JgM-'.decode('imap4-utf-7')
    Hello, ☃
    >>> 

What would you expect to work differently?
msg228949 - (view) Author: Hiroaki Kawai (Hiroaki.Kawai) Date: 2014-10-10 04:15
>> the twisted imap API is problematic for imaplib because twisted seems to expect its arguments to already be Python unicode.
> Could you elaborate on this?  As far as I can tell, it works fine:

twisted imap4-utf-7 seems to be improved in this 2 years. :-)
msg228980 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2014-10-10 10:28
First step is to provide mUTF-7 in Python 3.5. Then we can try to update imaplib. I am specially worried about the points cfraire raises in http://bugs.python.org/issue5305#msg151859. Lets see.
msg229056 - (view) Author: C Fraire (cfraire) Date: 2014-10-11 03:20
>> the twisted imap API is problematic for imaplib because twisted seems to expect its arguments to already be Python unicode.

>Could you elaborate on this?  As far as I can tell, it works fine:

I wasn't addressing encode/decode specifically. Both twisted and PloneMailList offer implementations with same encoding name, "imap4-utf-7".

I meant that it's difficult for the twisted API to inform what might be done for imaplib since twisted takes full unicode but imaplib expects only unicode-ASCII subset.

The first part of jamesh's original issue is just encoder/decoder, so either twisted or PloneMailList would seem to suffice. I was addressing jamesh's second part whether "optionally if imaplib would perform the encoding and decoding at the appropriate points."

Point 2 of my response seems the more difficult. imaplib list and lsub return str instances with ASCII + utf-7 stuffed together. (twisted avoids this by returning tuples of unicode, if I understand correctly).
msg248173 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2015-08-07 04:40
Ping.
History
Date User Action Args
2015-08-07 04:40:56jceasetmessages: + msg248173
2015-07-22 16:34:30astsmtlsettype: enhancement
2015-04-22 01:16:46Jean-Paul Calderonesetnosy: - exarkun
2014-10-11 03:20:45cfrairesetmessages: + msg229056
2014-10-10 10:28:40jceasetdependencies: + Add mUTF-7 codec (UTF-7 modified for IMAP)
messages: + msg228980
2014-10-10 04:15:21Hiroaki.Kawaisetmessages: + msg228949
2014-10-10 01:32:31exarkunsetmessages: + msg228939
2014-03-29 06:24:45jceasetmessages: + msg215117
2014-03-29 05:48:08jceasetmessages: + msg215116
2014-03-29 05:45:53jceasetmessages: + msg215115
2014-03-29 05:45:31jceasetnosy: + jcea

versions: + Python 3.5, - Python 3.1, Python 2.7
2013-12-12 10:44:58dveedensetnosy: + dveeden
2012-01-23 22:50:13cfrairesetnosy: + cfraire
messages: + msg151859
2011-11-24 02:39:41BabakMsetnosy: + BabakM
messages: + msg148224
2011-03-24 18:35:27astsmtlsetnosy: + astsmtl
messages: + msg132013
2011-01-27 10:54:02Hiroaki.Kawaisetnosy: + Hiroaki.Kawai
messages: + msg127176
2009-02-27 00:14:06exarkunsetmessages: + msg82797
2009-02-27 00:04:47hayposetnosy: + haypo
messages: + msg82795
2009-02-20 17:22:04loewissetmessages: + msg82539
2009-02-20 12:58:55exarkunsetnosy: + exarkun
messages: + msg82529
2009-02-20 03:00:08jameshsetmessages: + msg82510
2009-02-18 06:28:41loewissetnosy: + loewis
messages: + msg82411
2009-02-18 05:36:09jameshcreate