Message269462
time.strftime calls the CRT's strftime function, which the Windows universal CRT implements by calling wcsftime and encoding the result. The timezone name is actually stored as a char string (tzname), so wcsftime has to decode it via mbstowcs.
The problem is that in the C locale tzname is an ANSI (1252) string while mbstowcs simply casts to wchar_t, which is the same as decoding as Latin-1. This works fine for "é" (U+00E9). But the right single quote character (U+2019) is "\x92" in 1252, and a simple cast maps it to the non-character U+0092.
When the CRT's strftime encodes this back as an ANSI string, it maps U+0092 to the replacement character for 1252, a question mark. Similarly, time.tzname decodes the tzname ANSI strings using mbstowcs, with the same mismatch between LC_CTYPE and LC_TIME, resulting in the string "Est (heure d\x92été)"
In summary, the problem is that LC_TIME uses ANSI in the C locale, while LC_CTYPE uses Latin-1. A workaround (in most cases) is to delay importing the time module until after setting LC_CTYPE (also setting LC_TIME should cover all cases). For example:
>>> import sys, locale
>>> 'time' in sys.modules
False
>>> locale.setlocale(locale.LC_CTYPE, '')
'French_France.1252'
>>> import time
>>> time.tzname
('Est', 'Est (heure d’été)')
>>> time.strftime('%Z')
'Est (heure d’été)'
Note that Unix Python 3 sets LC_CTYPE at startup, so doing the same on Windows would actually improve cross-platform consistency. |
|
Date |
User |
Action |
Args |
2016-06-29 04:00:10 | eryksun | set | recipients:
+ eryksun, paul.moore, vstinner, tim.golden, ezio.melotti, r.david.murray, martin.panter, zach.ware, steve.dower, abarry |
2016-06-29 04:00:10 | eryksun | set | messageid: <1467172810.39.0.948866181211.issue26226@psf.upfronthosting.co.za> |
2016-06-29 04:00:10 | eryksun | link | issue26226 messages |
2016-06-29 04:00:09 | eryksun | create | |
|