This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author durin42
Recipients arjennienhuis, barry, benjamin.peterson, christian.heimes, durin42, ecir.hana, eric.smith, exarkun, ezio.melotti, flox, glyph, gregory.p.smith, gvanrossum, loewis, martin.panter, nlevitt@gmail.com, pitrou, serhiy.storchaka, stendec, terry.reedy, uau, vstinner
Date 2013-10-08.22:19:41
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <F1FC5C81-CAE7-40A8-9F5D-ACC7EE4FACFF@durin42.com>
In-reply-to <CAMpsgwbtgb55UWd=4NPa8dLSFn0Q8sM1zvmjwWZ+2tWe7dopTQ@mail.gmail.com>
Content
On Oct 8, 2013, at 5:24 PM, STINNER Victor <report@bugs.python.org> wrote:

> 
> STINNER Victor added the comment:
> 
> 2013/10/8 Augie Fackler <report@bugs.python.org>:
>> sys.stdout.write('%(state)s %(path)s\n' % {'state': 'M', 'path':
>> 'some/filesystem/path'})
>> 
>> except we don't know the encoding of the filesystem path (Hi unix!) so we
>> have to treat the whole thing as opaque bytes.
> 
> You are doing it wrong. In Python 3, you "should" store filenames as
> Unicode (str type). If Python fails to decode a filename, undecodable
> bytes are stored as surrogate characters (see the PEP 383).

No, I'm not. In Mercurial, all end-user data is OPAQUE BYTES, and must remain that way. We're not able to change either our on-disk data format OR our stdout format, even to support a newer version of Python. I don't know the encoding of the filename's bytes, but I _must_ faithfully reproduce them exactly as they are or I'll break tools like make(1) and patch(1). Similarly, if a file goes from ISO-8859-1 to UTF-8, I have to emit a diff that has some ISO bytes and some UTF bytes - it's not in *any* valid encoding. Changing that is a showstopper regression.

> The Unicode type became natural in Python 3, as byte string (old "str"
> type) was natural in Python 2.
> 
> sys.stdout.write() expects a Unicode string, not a byte string.

Ouch. Is there any way to write things to stderr and stdout without decoding and hopelessly breaking user data?

> Does it mean that Mercurial is moving to Python 3? Cool :-)

Not likely, honestly. I tackle this when I've got some spare cycles and my ability to handle pain is high. As it stands, I have the test-runner barely working, but it's making wrong assumptions to get there. The best estimate is that it's a year of work to upgrade to Python 3.

> 
> ----------
> 
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue3982>
> _______________________________________
History
Date User Action Args
2013-10-08 22:19:42durin42setrecipients: + durin42, gvanrossum, loewis, barry, terry.reedy, gregory.p.smith, exarkun, pitrou, vstinner, eric.smith, christian.heimes, benjamin.peterson, glyph, ezio.melotti, arjennienhuis, flox, ecir.hana, uau, martin.panter, serhiy.storchaka, nlevitt@gmail.com, stendec
2013-10-08 22:19:42durin42linkissue3982 messages
2013-10-08 22:19:41durin42create