Author eryksun
Recipients belopolsky, brian.curtin, eryksun, ocean-city, pitrou, python-dev, vstinner
Date 2015-05-20.13:30:40
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1432128640.8.0.377125320908.issue10653@psf.upfronthosting.co.za>
In-reply-to
Content
This solution no longer works. If the system is configured to use the Japanese system locale and language pack, then 3.4.3 returns codepage 932 mojibake for the "%Z" time zone name. Originally [this approach worked][1] because it called PyUnicode_Decode using the 'mbcs' encoding.
Currently it calls PyUnicode_DecodeLocaleAndSize, which just ends up calling mbstowcs. That's pretty much what wcsftime does. In the default C locale, mbstowcs casts the byte values to wchar_t:

    >>> time.strftime('%Z')
    '\x91\xbe\x95\xbd\x97m\x89\xc4\x8e\x9e\x8a\xd4'
    >>> time.strftime('%Z').encode('latin-1').decode('932')
    '太平洋夏時間'

The problem is worse for 3.5 built with VC++ 14. In the new CRT strftime decodes the format string via MultiByteToWideChar, calls _Wcsftime_l, and encodes the result back via WideCharToMultiByte. The outer conversions use the default LC_TIME codepage, which is ANSI (ACP), so they're not the problem. The problem is the internal _mbstowcs_s_l conversion of the ANSI time zone name, which creates the above-shown mojibake 'unicode' string. This is then compounded by calling WideCharToMultiByte on the result:

    >>> time.strftime('%Z')
    '?????m?A???O'

There's no way to fix this by transcoding. The result is just garbage.

[1]: https://hg.python.org/cpython/file/79e60977fc04/Modules/timemodule.c#l501
History
Date User Action Args
2015-05-20 13:30:40eryksunsetrecipients: + eryksun, belopolsky, pitrou, vstinner, ocean-city, brian.curtin, python-dev
2015-05-20 13:30:40eryksunsetmessageid: <1432128640.8.0.377125320908.issue10653@psf.upfronthosting.co.za>
2015-05-20 13:30:40eryksunlinkissue10653 messages
2015-05-20 13:30:40eryksuncreate