This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author exarkun
Recipients exarkun, jamesh, loewis, vstinner
Date 2009-02-27.00:14:05
SpamBayes Score 8.70169e-09
Marked as misclassified No
Message-id <1235693648.15.0.708947418316.issue5305@psf.upfronthosting.co.za>
In-reply-to
Content
> UTF-7 already sounds like something horrible for me, but a *modified*
> UTF-7 encoding is something a little bit more strange for me. Why not
> reusing directly UTF-7.

UTF-7 wasn't horrible for its time, but its time has very likely passed.
 Alas, changing a standard like IMAP4 is so difficult, this mistake will
be with us for a long time to come.

As for why IMAP4 uses a modified form of UTF-7, the RFC addresses this:

   The purpose of these modifications is to correct the following
   problems with UTF-7:

      1) UTF-7 uses the "+" character for shifting; this conflicts with
         the common use of "+" in mailbox names, in particular USENET
         newsgroup names.

      2) UTF-7's encoding is BASE64 which uses the "/" character; this
         conflicts with the use of "/" as a popular hierarchy delimiter.

      3) UTF-7 prohibits the unencoded usage of "\"; this conflicts with
         the use of "\" as a popular hierarchy delimiter.

      4) UTF-7 prohibits the unencoded usage of "~"; this conflicts with
         the use of "~" in some servers as a home directory indicator.

      5) UTF-7 permits multiple alternate forms to represent the same
         string; in particular, printable US-ASCII characters can be
         represented in encoded form.

Whether you are convinced by these arguments or not is, of course,
entirely up to you.  Note also, however, that the modified UTF-7 is not
mandated by the RFC:

   By convention, international mailbox names in IMAP4rev1 are specified
   using a modified version of the UTF-7 encoding described in [UTF-7].
   Modified UTF-7 may also be usable in servers that implement an
   earlier version of this protocol.

However, it seems stupid to say that the choice if encoding is only a
convention since there is no other way to communicate the choice of
encoding between client and server.
History
Date User Action Args
2009-02-27 00:14:08exarkunsetrecipients: + exarkun, loewis, jamesh, vstinner
2009-02-27 00:14:08exarkunsetmessageid: <1235693648.15.0.708947418316.issue5305@psf.upfronthosting.co.za>
2009-02-27 00:14:06exarkunlinkissue5305 messages
2009-02-27 00:14:05exarkuncreate