Issue10653
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2010-12-08 15:01 by ocean-city, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
py3k_workaround_for_wcsftime.patch | ocean-city, 2010-12-08 17:57 | review | ||
main.c | ocean-city, 2010-12-09 10:54 | test code |
Messages (22) | |||
---|---|---|---|
msg123612 - (view) | Author: Hirokazu Yamamoto (ocean-city) * | Date: 2010-12-08 15:01 | |
Following tests fails on official Python3.2 Windows binary. I cannot reproduce this on VC6. ///////////////////////////////////////////////////// C:\Python32>.\python -m test.regrtest -v test_time test_strptime [1/2] test_time test_asctime (test.test_time.TimeTestCase) ... ok test_asctime_bounding_check (test.test_time.TimeTestCase) ... ok test_clock (test.test_time.TimeTestCase) ... ok test_conversions (test.test_time.TimeTestCase) ... ok test_ctime_without_arg (test.test_time.TimeTestCase) ... ok test_data_attributes (test.test_time.TimeTestCase) ... ok test_default_values_for_zero (test.test_time.TimeTestCase) ... ok test_gmtime_without_arg (test.test_time.TimeTestCase) ... ok test_insane_timestamps (test.test_time.TimeTestCase) ... ok test_localtime_without_arg (test.test_time.TimeTestCase) ... ok test_sleep (test.test_time.TimeTestCase) ... ok test_strftime (test.test_time.TimeTestCase) ... ok test_strftime_bounding_check (test.test_time.TimeTestCase) ... ok test_strptime (test.test_time.TimeTestCase) ... FAIL test_strptime_bytes (test.test_time.TimeTestCase) ... ok test_tzset (test.test_time.TimeTestCase) ... ok test_bug_3061 (test.test_time.TestLocale) ... ok ====================================================================== FAIL: test_strptime (test.test_time.TimeTestCase) ---------------------------------------------------------------------- test test_time crashed -- <class 'UnicodeEncodeError'>: 'cp932' codec can't enco de character '\x93' in position 495: illegal multibyte sequence Traceback (most recent call last): File "C:\Python32\lib\test\regrtest.py", line 960, in runtest_inner indirect_test() File "C:\Python32\lib\test\test_time.py", line 244, in test_main support.run_unittest(TimeTestCase, TestLocale) File "C:\Python32\lib\test\support.py", line 1146, in run_unittest _run_suite(suite) File "C:\Python32\lib\test\support.py", line 1120, in _run_suite result = runner.run(suite) File "C:\Python32\lib\unittest\runner.py", line 173, in run result.printErrors() File "C:\Python32\lib\unittest\runner.py", line 110, in printErrors self.printErrorList('FAIL', self.failures) File "C:\Python32\lib\unittest\runner.py", line 117, in printErrorList self.stream.writeln("%s" % err) File "C:\Python32\lib\unittest\runner.py", line 25, in writeln self.write(arg) UnicodeEncodeError: 'cp932' codec can't encode character '\x93' in position 495: illegal multibyte sequence [2/2] test_strptime test_basic (test.test_strptime.getlang_Tests) ... ok test_am_pm (test.test_strptime.LocaleTime_Tests) ... ok test_date_time (test.test_strptime.LocaleTime_Tests) ... ok test_lang (test.test_strptime.LocaleTime_Tests) ... ok test_month (test.test_strptime.LocaleTime_Tests) ... ok test_timezone (test.test_strptime.LocaleTime_Tests) ... FAIL test_weekday (test.test_strptime.LocaleTime_Tests) ... ok test_blankpattern (test.test_strptime.TimeRETests) ... ok test_compile (test.test_strptime.TimeRETests) ... FAIL test_locale_data_w_regex_metacharacters (test.test_strptime.TimeRETests) ... ok test_matching_with_escapes (test.test_strptime.TimeRETests) ... ok test_pattern (test.test_strptime.TimeRETests) ... ok test_pattern_escaping (test.test_strptime.TimeRETests) ... ok test_whitespace_substitution (test.test_strptime.TimeRETests) ... ok test_ValueError (test.test_strptime.StrptimeTests) ... ok test_bad_timezone (test.test_strptime.StrptimeTests) ... ok test_caseinsensitive (test.test_strptime.StrptimeTests) ... ok test_date (test.test_strptime.StrptimeTests) ... ok test_date_time (test.test_strptime.StrptimeTests) ... ok test_day (test.test_strptime.StrptimeTests) ... ok test_defaults (test.test_strptime.StrptimeTests) ... ok test_escaping (test.test_strptime.StrptimeTests) ... ok test_fraction (test.test_strptime.StrptimeTests) ... ok test_hour (test.test_strptime.StrptimeTests) ... ok test_julian (test.test_strptime.StrptimeTests) ... ok test_minute (test.test_strptime.StrptimeTests) ... ok test_month (test.test_strptime.StrptimeTests) ... ok test_percent (test.test_strptime.StrptimeTests) ... ok test_second (test.test_strptime.StrptimeTests) ... ok test_time (test.test_strptime.StrptimeTests) ... ok test_timezone (test.test_strptime.StrptimeTests) ... ERROR test_unconverteddata (test.test_strptime.StrptimeTests) ... ok test_weekday (test.test_strptime.StrptimeTests) ... ok test_year (test.test_strptime.StrptimeTests) ... ok test_twelve_noon_midnight (test.test_strptime.Strptime12AMPMTests) ... ok test_all_julian_days (test.test_strptime.JulianTests) ... ok test_day_of_week_calculation (test.test_strptime.CalculationTests) ... ERROR test_gregorian_calculation (test.test_strptime.CalculationTests) ... ERROR test_julian_calculation (test.test_strptime.CalculationTests) ... ERROR test_week_of_year_and_day_of_week_calculation (test.test_strptime.CalculationTes ts) ... ok test_TimeRE_recreation (test.test_strptime.CacheTests) ... ok test_new_localetime (test.test_strptime.CacheTests) ... ok test_regex_cleanup (test.test_strptime.CacheTests) ... ok test_time_re_recreation (test.test_strptime.CacheTests) ... ok ====================================================================== ERROR: test_timezone (test.test_strptime.StrptimeTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python32\lib\test\test_strptime.py", line 303, in test_timezone strp_output = _strptime._strptime_time(strf_output, "%Z") File "C:\Python32\lib\_strptime.py", line 482, in _strptime_time tt = _strptime(data_string, format)[0] File "C:\Python32\lib\_strptime.py", line 337, in _strptime (data_string, format)) ValueError: time data '\x93\x8c\x8b\x9e (\x95W\x8f\x80\x8e\x9e)' does not match format '%Z' ====================================================================== ERROR: test_day_of_week_calculation (test.test_strptime.CalculationTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python32\lib\test\test_strptime.py", line 437, in test_day_of_week_ca lculation format_string) File "C:\Python32\lib\_strptime.py", line 482, in _strptime_time tt = _strptime(data_string, format)[0] File "C:\Python32\lib\_strptime.py", line 337, in _strptime (data_string, format)) ValueError: time data '2010 12 08 14 00 342 \x93\x8c\x8b\x9e (\x95W\x8f\x80\x8e\ x9e)' does not match format '%Y %m %d %H %S %j %Z' ====================================================================== ERROR: test_gregorian_calculation (test.test_strptime.CalculationTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python32\lib\test\test_strptime.py", line 423, in test_gregorian_calc ulation format_string) File "C:\Python32\lib\_strptime.py", line 482, in _strptime_time tt = _strptime(data_string, format)[0] File "C:\Python32\lib\_strptime.py", line 337, in _strptime (data_string, format)) ValueError: time data '2010 14 58 01 3 342 \x93\x8c\x8b\x9e (\x95W\x8f\x80\x8e\x 9e)' does not match format '%Y %H %M %S %w %j %Z' ====================================================================== ERROR: test_julian_calculation (test.test_strptime.CalculationTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python32\lib\test\test_strptime.py", line 414, in test_julian_calcula tion format_string) File "C:\Python32\lib\_strptime.py", line 482, in _strptime_time tt = _strptime(data_string, format)[0] File "C:\Python32\lib\_strptime.py", line 337, in _strptime (data_string, format)) ValueError: time data '2010 12 08 14 58 01 3 \x93\x8c\x8b\x9e (\x95W\x8f\x80\x8e \x9e)' does not match format '%Y %m %d %H %M %S %w %Z' ====================================================================== FAIL: test_timezone (test.test_strptime.LocaleTime_Tests) ---------------------------------------------------------------------- test test_strptime crashed -- <class 'UnicodeEncodeError'>: 'cp932' codec can't encode character '\x93' in position 192: illegal multibyte sequence Traceback (most recent call last): File "C:\Python32\lib\test\regrtest.py", line 960, in runtest_inner indirect_test() File "C:\Python32\lib\test\test_strptime.py", line 557, in test_main CacheTests File "C:\Python32\lib\test\support.py", line 1146, in run_unittest _run_suite(suite) File "C:\Python32\lib\test\support.py", line 1120, in _run_suite result = runner.run(suite) File "C:\Python32\lib\unittest\runner.py", line 173, in run result.printErrors() File "C:\Python32\lib\unittest\runner.py", line 110, in printErrors self.printErrorList('FAIL', self.failures) File "C:\Python32\lib\unittest\runner.py", line 117, in printErrorList self.stream.writeln("%s" % err) File "C:\Python32\lib\unittest\runner.py", line 25, in writeln self.write(arg) UnicodeEncodeError: 'cp932' codec can't encode character '\x93' in position 192: illegal multibyte sequence 2 tests failed: test_strptime test_time |
|||
msg123618 - (view) | Author: Brian Curtin (brian.curtin) * | Date: 2010-12-08 15:40 | |
I don't see this on a US/English version of Windows 7 with 3.2b1 installed. cp932 is the default on a Japanese version, correct? (I'm not very good with all of this encoding stuff so I don't know how much help I can be) |
|||
msg123623 - (view) | Author: Hirokazu Yamamoto (ocean-city) * | Date: 2010-12-08 17:46 | |
I think this is locale problem. With "C" locale on windows, wcsftime doesn't return UTF16. (when non ascii characters are contained) It is just like .... char cbuf[] = "...."; /* contains non ascii chars in MBCS */ wchar_t wbuf[sizeof(cbuf)]; for (size_t i = 0; i < sizeof(cbuf); ++i) wbuf[i] = cbuf[i]; /* just copy it. non ascii chars in MBCS uses two bytes, but should use 1 char space in UTF16. But this case, it uses 2 chars space! (something strange encoding) */ In japanese, wcsftime returns non ascii characters for timezone in this strange encoding. Python converts this with #ifdef HAVE_WCSFTIME ret = PyUnicode_FromWideChar(outbuf, buflen); #else so Unicode object will contain data in this strange encoding. This is cause of problem. I investigated a little about locale, and I learned C standard does not guarantee wchar_t is always UTF16. |
|||
msg123624 - (view) | Author: Hirokazu Yamamoto (ocean-city) * | Date: 2010-12-08 17:57 | |
I'll attach workaround. I used to confirm this works on VS8, but I don't have VS8 now. I hope this still works. |
|||
msg123625 - (view) | Author: Alexander Belopolsky (belopolsky) * | Date: 2010-12-08 17:58 | |
> ValueError: time data '2010 14 58 01 3 342 \x93\x8c\x8b\x9e (\x95W\x8f\x80\x8e\x9e)' does not match format '%Y %H %M %S %w %j %Z' This looks like valid cp932 data to me >>> b'2010 14 58 01 3 342 \x93\x8c\x8b\x9e (\x95W\x8f\x80\x8e\x9e)'.decode('cp932') '2010 14 58 01 3 342 東京 (標準時)' Please help me with Japanese, but I think the above means Tokyo timezone. However, strftime should have produced decoded unicode strings, not raw cp932 in a str. What does time.strftime('%Z') return on your system? |
|||
msg123626 - (view) | Author: Hirokazu Yamamoto (ocean-city) * | Date: 2010-12-08 18:12 | |
Here you are. >>> import time >>> time.strftime('%Z') '\x93\x8c\x8b\x9e (\x95W\x8f\x80\x8e\x9e)' |
|||
msg123628 - (view) | Author: Alexander Belopolsky (belopolsky) * | Date: 2010-12-08 18:18 | |
On Wed, Dec 8, 2010 at 1:12 PM, Hirokazu Yamamoto <report@bugs.python.org> wrote: .. >>>> import time >>>> time.strftime('%Z') > '\x93\x8c\x8b\x9e (\x95W\x8f\x80\x8e\x9e)' Thanks. Please bear with me for one more question: what is >>> time.tzname ? |
|||
msg123631 - (view) | Author: Hirokazu Yamamoto (ocean-city) * | Date: 2010-12-08 18:50 | |
I got readable result. ;-) >>> import time >>> time.tzname ('東京 (標準時)', '東京 (標準時)') |
|||
msg123639 - (view) | Author: Alexander Belopolsky (belopolsky) * | Date: 2010-12-08 19:45 | |
On Wed, Dec 8, 2010 at 1:50 PM, Hirokazu Yamamoto <report@bugs.python.org> wrote: .. > I got readable result. ;-) > You mean readable to *you*. :-) >>>> import time >>>> time.tzname > ('東京 (標準時)', '東京 (標準時)') This makes sense now. There are two issues here: 1. Decoding the output of wcsftime(). Python expects mbcs (which I believe is an UTF16-like wide char encoding) while Windows apparently puts cp932 there in your locale. I don't have expertise to address this issue. 2. strptime() cannot parse strftime() output when strftime('%Z') is different from time.tzname[dst]. This issue we can address. Note that for most of the locale information such as day of the week or month names, strptime() relies on strftime() output, so the round-tripping should work even when strftime() results are nonsensical. On the other hand, tz spellings are taken from time.tzname. I think we can make strptime() more robust by adding [time.strftime('%Z', (2000,1,1,0,0,0,0,0,dst) for dst in (0,1)] to the list of recognized tz names if they differ from time.tzname. |
|||
msg123676 - (view) | Author: Hirokazu Yamamoto (ocean-city) * | Date: 2010-12-09 10:54 | |
> 1. Decoding the output of wcsftime(). Python expects mbcs (which > I believe is an UTF16-like wide char encoding) while Windows > apparently puts cp932 there in your locale. I don't have expertise > to address this issue. No, mbcs is not wide character sets (wchar_t*) but ANSI character sets (char*). In my environment, mbcs == cp932. And python expects UTF-16. > 2. strptime() cannot parse strftime() output when strftime('%Z') is > different from time.tzname[dst]. (snip) I attached test program to test behavior of strftime and wcsftime on locale. On VC6, strftime doesn't depend on locale, wheres wcsftime changed the value depends on locale. (I tested only "C" locale and "System" locale because I could not find other locales working on my environment, so ) If strftime doesn't depend on locale and equals to tzname for every locale, maybe strftime is preferred on windows. # Can somebody test this on VS9? And other locales? |
|||
msg144419 - (view) | Author: STINNER Victor (vstinner) * | Date: 2011-09-22 21:09 | |
See also issue #13029. |
|||
msg145490 - (view) | Author: Roundup Robot (python-dev) | Date: 2011-10-14 00:38 | |
New changeset e3d9c5e690fc by Victor Stinner in branch '3.2': Issue #10653: On Windows, use strftime() instead of wcsftime() because http://hg.python.org/cpython/rev/e3d9c5e690fc New changeset 79e60977fc04 by Victor Stinner in branch 'default': (Merge 3.2) Issue #10653: On Windows, use strftime() instead of wcsftime() http://hg.python.org/cpython/rev/79e60977fc04 |
|||
msg145492 - (view) | Author: STINNER Victor (vstinner) * | Date: 2011-10-14 00:41 | |
It's a bug in the Windows API: I used the workaround suggested by Hirokazu Yamamoto. Thanks Hirokazu! Python 2.7 doesn't use wcsftime() and so it is not affected by this issue. |
|||
msg145596 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2011-10-15 15:42 | |
Crashes on the Windows buildbots: f:\dd\vctools\crt_bld\self_x86\crt\src\strftime.c(832) : Assertion failed: ( "Invalid format directive" , 0 ) f:\dd\vctools\crt_bld\self_x86\crt\src\strftime.c(484) : Assertion failed: FALSE |
|||
msg145628 - (view) | Author: Roundup Robot (python-dev) | Date: 2011-10-16 17:07 | |
New changeset e3c13a1d2595 by Victor Stinner in branch 'default': Issue #10653: Fix time.strftime() on Windows, check for invalid format strings http://hg.python.org/cpython/rev/e3c13a1d2595 |
|||
msg145629 - (view) | Author: STINNER Victor (vstinner) * | Date: 2011-10-16 17:09 | |
> Crashes on the Windows buildbots: Oops, it should be fixed by my last commits. |
|||
msg145647 - (view) | Author: Roundup Robot (python-dev) | Date: 2011-10-16 21:45 | |
New changeset 977c5753ca32 by Victor Stinner in branch '3.2': Issue #10653: Fix time.strftime() on Windows, check for invalid format strings http://hg.python.org/cpython/rev/977c5753ca32 |
|||
msg243660 - (view) | Author: Eryk Sun (eryksun) * | Date: 2015-05-20 13:30 | |
This solution no longer works. If the system is configured to use the Japanese system locale and language pack, then 3.4.3 returns codepage 932 mojibake for the "%Z" time zone name. Originally [this approach worked][1] because it called PyUnicode_Decode using the 'mbcs' encoding. Currently it calls PyUnicode_DecodeLocaleAndSize, which just ends up calling mbstowcs. That's pretty much what wcsftime does. In the default C locale, mbstowcs casts the byte values to wchar_t: >>> time.strftime('%Z') '\x91\xbe\x95\xbd\x97m\x89\xc4\x8e\x9e\x8a\xd4' >>> time.strftime('%Z').encode('latin-1').decode('932') '太平洋夏時間' The problem is worse for 3.5 built with VC++ 14. In the new CRT strftime decodes the format string via MultiByteToWideChar, calls _Wcsftime_l, and encodes the result back via WideCharToMultiByte. The outer conversions use the default LC_TIME codepage, which is ANSI (ACP), so they're not the problem. The problem is the internal _mbstowcs_s_l conversion of the ANSI time zone name, which creates the above-shown mojibake 'unicode' string. This is then compounded by calling WideCharToMultiByte on the result: >>> time.strftime('%Z') '?????m?A???O' There's no way to fix this by transcoding. The result is just garbage. [1]: https://hg.python.org/cpython/file/79e60977fc04/Modules/timemodule.c#l501 |
|||
msg388237 - (view) | Author: Eryk Sun (eryksun) * | Date: 2021-03-07 12:36 | |
Update since msg243660: Python 3.8+ now calls setlocale(LC_CTYPE, "") at startup in Windows, as it has always done in POSIX, so decoding the output of strftime("%Z") with PyUnicode_DecodeLocaleAndSize() works again since both agree on using the process active code page. In 3.7+, per bpo-36779, time.tzname is set when the module is first loaded by directly querying GetTimeZoneInformation(). time.tzset() is still not supported, despite the fact that it was always supported by ucrt, so this value can become stale relative to strftime("%Z"). Starting with Windows 10 v2004 (build 19041), ucrt uses an internal wide-character version of the time-zone name that gets returned by an internal __wide_tzname() call and used for "%Z" in wcsftime(). The wide-character value gets updated by _tzset() and kept in sync with _tzname. |
|||
msg388238 - (view) | Author: Eryk Sun (eryksun) * | Date: 2021-03-07 13:06 | |
> decoding the output of strftime("%Z") with PyUnicode_DecodeLocaleAndSize() > works again since both agree on using the process active code page At least it works as much as it ever did. It depends on the process active code page being compatible with the preferred UI language of the current process or thread. For example if the UI language is Japanese ('ja-JP') for the current user, but the process active code page is Latin 1252 (based on the system locale), then the result will be garbage. In that case, given the time-zone name is in Japanese, both LC_TIME and LC_CTYPE have to be changed to "ja-JP" in order to correctly encode (as tzname in ucrt), decode-encode (for strftime in ucrt) and finally decode again via PyUnicode_DecodeLocaleAndSize(). If Python switched back to using wcsftime() in Windows 10 2004+, then the current locale encoding would no longer be a problem for any UI language. |
|||
msg388279 - (view) | Author: STINNER Victor (vstinner) * | Date: 2021-03-08 18:25 | |
Eryk Sun: This issue is now closed. If you want to enhance the time module, please open a new issue. |
|||
msg388292 - (view) | Author: Eryk Sun (eryksun) * | Date: 2021-03-08 19:52 | |
> Eryk Sun: This issue is now closed. If you want to enhance > the time module, please open a new issue. I was aware of that at the time, Victor. The problem can be worked on in a new issue, or in the older issue bpo-8304, which remains open. The two messages that I added are purely informative, to update my original comment in msg243660. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:09 | admin | set | github: 54862 |
2021-03-08 19:52:06 | eryksun | set | messages: + msg388292 |
2021-03-08 18:25:07 | vstinner | set | messages: + msg388279 |
2021-03-07 13:06:17 | eryksun | set | messages: + msg388238 |
2021-03-07 12:36:19 | eryksun | set | messages: + msg388237 |
2019-05-07 00:38:38 | jkloth | set | nosy:
+ jkloth |
2015-05-20 13:30:40 | eryksun | set | nosy:
+ eryksun messages: + msg243660 versions: + Python 3.4, Python 3.5 |
2011-10-16 21:45:14 | python-dev | set | messages: + msg145647 |
2011-10-16 20:07:05 | vstinner | set | status: open -> closed |
2011-10-16 17:09:11 | vstinner | set | messages: + msg145629 |
2011-10-16 17:07:37 | python-dev | set | messages: + msg145628 |
2011-10-15 15:42:24 | pitrou | set | status: closed -> open nosy: + pitrou messages: + msg145596 assignee: vstinner |
2011-10-14 00:41:19 | vstinner | set | status: open -> closed resolution: fixed messages: + msg145492 versions: + Python 3.3 |
2011-10-14 00:38:22 | python-dev | set | nosy:
+ python-dev messages: + msg145490 |
2011-09-22 21:09:42 | vstinner | set | nosy:
+ vstinner messages: + msg144419 |
2010-12-09 10:54:38 | ocean-city | set | files:
+ main.c messages: + msg123676 |
2010-12-08 19:45:36 | belopolsky | set | messages: + msg123639 |
2010-12-08 18:50:07 | ocean-city | set | messages: + msg123631 |
2010-12-08 18:18:54 | belopolsky | set | messages: + msg123628 |
2010-12-08 18:12:54 | ocean-city | set | messages: + msg123626 |
2010-12-08 17:58:10 | belopolsky | set | messages: + msg123625 |
2010-12-08 17:57:08 | ocean-city | set | files:
+ py3k_workaround_for_wcsftime.patch keywords: + patch messages: + msg123624 |
2010-12-08 17:46:24 | ocean-city | set | messages: + msg123623 |
2010-12-08 17:33:40 | r.david.murray | set | nosy:
+ belopolsky |
2010-12-08 15:40:53 | brian.curtin | set | nosy:
+ brian.curtin messages: + msg123618 |
2010-12-08 15:01:27 | ocean-city | create |