Message 251068 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	eryksun
Recipients	BreamoreBoy, amaury.forgeotdarc, belopolsky, eryksun, jcea, msmhrt, ocean-city, prikryl, vstinner
Date	2015-09-19.09:34:48
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1442655288.94.0.331617908978.issue16322@psf.upfronthosting.co.za>
In-reply-to

Content
To decode the tzname strings, Python calls mbstowcs, which on Windows uses Latin-1 in the "C" locale. However, in this locale the tzname strings are actually encoded using the system ANSI codepage (e.g. 1250 for Central/Eastern Europe). So it ends up decoding ANSI strings as Latin-1 mojibake. For example: >>> s 'Střední Evropa (běžný čas) \| Střední Evropa (letní čas)' >>> s.encode('1250').decode('latin-1') 'Støední Evropa (bì\x9ený èas) \| Støední Evropa (letní èas)' You can work around the inconsistency by calling setlocale(LC_ALL, "") before anything imports the time module. This should set a locale that's not "C", in which case the codepage should be consistent. Of course, this won't help if you can't control when the time module is first imported. The latter wouldn't be a issue if time.tzset were implemented on Windows. You can at least use ctypes to call the CRT's _tzset function. This solves the problem with time.strftime('%Z'). You can also get the CRT's tzname by calling the exported __tzname function. Here's a Python 3.5 example that sets the current thread to use Russian and creates a new tzname tuple: import ctypes import locale kernel32 = ctypes.WinDLL('kernel32') ucrtbase = ctypes.CDLL('ucrtbase') MUI_LANGUAGE_NAME = 8 kernel32.SetThreadPreferredUILanguages(MUI_LANGUAGE_NAME, 'ru-RU\0', None) locale.setlocale(locale.LC_ALL, 'ru-RU') # reset tzname in current locale ucrtbase._tzset() ucrtbase.__tzname.restype = ctypes.POINTER(ctypes.c_char_p * 2) c_tzname = ucrtbase.__tzname()[0] tzname = tuple(tz.decode('1251') for tz in c_tzname) # print Cyrillic characters to the console kernel32.SetConsoleOutputCP(1251) stdout = open(1, 'w', buffering=1, encoding='1251', closefd=0) >>> print(tzname, file=stdout) ('Время в формате UTC', 'Время в формате UTC')

To decode the tzname strings, Python calls mbstowcs, which on Windows uses Latin-1 in the "C" locale. However, in this locale the tzname strings are actually encoded using the system ANSI codepage (e.g. 1250 for Central/Eastern Europe). So it ends up decoding ANSI strings as Latin-1 mojibake. For example:

    >>> s
    'Střední Evropa (běžný čas) | Střední Evropa (letní čas)'
    >>> s.encode('1250').decode('latin-1')
    'Støední Evropa (bì\x9ený èas) | Støední Evropa (letní èas)'

You can work around the inconsistency by calling setlocale(LC_ALL, "") before anything imports the time module. This should set a locale that's not "C", in which case the codepage should be consistent. Of course, this won't help if you can't control when the time module is first imported. 

The latter wouldn't be a issue if time.tzset were implemented on Windows. You can at least use ctypes to call the CRT's _tzset function. This solves the problem with time.strftime('%Z'). You can also get the CRT's tzname by calling the exported __tzname function. Here's a Python 3.5 example that sets the current thread to use Russian and creates a new tzname tuple:

    import ctypes
    import locale

    kernel32 = ctypes.WinDLL('kernel32')
    ucrtbase = ctypes.CDLL('ucrtbase')

    MUI_LANGUAGE_NAME = 8
    kernel32.SetThreadPreferredUILanguages(MUI_LANGUAGE_NAME, 
                                           'ru-RU\0', None)
    locale.setlocale(locale.LC_ALL, 'ru-RU')

    # reset tzname in current locale
    ucrtbase._tzset()
    ucrtbase.__tzname.restype = ctypes.POINTER(ctypes.c_char_p * 2)
    c_tzname = ucrtbase.__tzname()[0]
    tzname = tuple(tz.decode('1251') for tz in c_tzname)

    # print Cyrillic characters to the console
    kernel32.SetConsoleOutputCP(1251)
    stdout = open(1, 'w', buffering=1, encoding='1251', closefd=0)

    >>> print(tzname, file=stdout)
    ('Время в формате UTC', 'Время в формате UTC')

History
Date	User	Action	Args
2015-09-19 09:34:49	eryksun	set	recipients: + eryksun, jcea, amaury.forgeotdarc, prikryl, belopolsky, vstinner, ocean-city, BreamoreBoy, msmhrt
2015-09-19 09:34:48	eryksun	set	messageid: <1442655288.94.0.331617908978.issue16322@psf.upfronthosting.co.za>
2015-09-19 09:34:48	eryksun	link	issue16322 messages
2015-09-19 09:34:48	eryksun	create