Author durin42
Recipients arjennienhuis, barry, benjamin.peterson, christian.heimes, durin42, ecir.hana, eric.smith, exarkun, ezio.melotti, flox, glyph, gregory.p.smith, gvanrossum, loewis, martin.panter, nlevitt@gmail.com, pitrou, serhiy.storchaka, stendec, terry.reedy, uau, vstinner
Date 2013-10-08.22:19:41
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <F1FC5C81-CAE7-40A8-9F5D-ACC7EE4FACFF@durin42.com>
In-reply-to <CAMpsgwbtgb55UWd=4NPa8dLSFn0Q8sM1zvmjwWZ+2tWe7dopTQ@mail.gmail.com>
Content
On Oct 8, 2013, at 5:24 PM, STINNER Victor <report@bugs.python.org> wrote:

> 
> STINNER Victor added the comment:
> 
> 2013/10/8 Augie Fackler <report@bugs.python.org>:
>> sys.stdout.write('%(state)s %(path)s\n' % {'state': 'M', 'path':
>> 'some/filesystem/path'})
>> 
>> except we don't know the encoding of the filesystem path (Hi unix!) so we
>> have to treat the whole thing as opaque bytes.
> 
> You are doing it wrong. In Python 3, you "should" store filenames as
> Unicode (str type). If Python fails to decode a filename, undecodable
> bytes are stored as surrogate characters (see the PEP 383).

No, I'm not. In Mercurial, all end-user data is OPAQUE BYTES, and must remain that way. We're not able to change either our on-disk data format OR our stdout format, even to support a newer version of Python. I don't know the encoding of the filename's bytes, but I _must_ faithfully reproduce them exactly as they are or I'll break tools like make(1) and patch(1). Similarly, if a file goes from ISO-8859-1 to UTF-8, I have to emit a diff that has some ISO bytes and some UTF bytes - it's not in *any* valid encoding. Changing that is a showstopper regression.

> The Unicode type became natural in Python 3, as byte string (old "str"
> type) was natural in Python 2.
> 
> sys.stdout.write() expects a Unicode string, not a byte string.

Ouch. Is there any way to write things to stderr and stdout without decoding and hopelessly breaking user data?

> Does it mean that Mercurial is moving to Python 3? Cool :-)

Not likely, honestly. I tackle this when I've got some spare cycles and my ability to handle pain is high. As it stands, I have the test-runner barely working, but it's making wrong assumptions to get there. The best estimate is that it's a year of work to upgrade to Python 3.

> 
> ----------
> 
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue3982>
> _______________________________________
History
Date User Action Args
2013-10-08 22:19:42durin42setrecipients: + durin42, gvanrossum, loewis, barry, terry.reedy, gregory.p.smith, exarkun, pitrou, vstinner, eric.smith, christian.heimes, benjamin.peterson, glyph, ezio.melotti, arjennienhuis, flox, ecir.hana, uau, martin.panter, serhiy.storchaka, nlevitt@gmail.com, stendec
2013-10-08 22:19:42durin42linkissue3982 messages
2013-10-08 22:19:41durin42create