New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test_time test_strptime fails on windows #54862
Comments
Following tests fails on official Python3.2 Windows binary. I cannot reproduce this on VC6. ///////////////////////////////////////////////////// C:\Python32>.\python -m test.regrtest -v test_time test_strptime [1/2] test_time ====================================================================== test test_time crashed -- <class 'UnicodeEncodeError'>: 'cp932' codec can't enco
de character '\x93' in position 495: illegal multibyte sequence
Traceback (most recent call last):
File "C:\Python32\lib\test\regrtest.py", line 960, in runtest_inner
indirect_test()
File "C:\Python32\lib\test\test_time.py", line 244, in test_main
support.run_unittest(TimeTestCase, TestLocale)
File "C:\Python32\lib\test\support.py", line 1146, in run_unittest
_run_suite(suite)
File "C:\Python32\lib\test\support.py", line 1120, in _run_suite
result = runner.run(suite)
File "C:\Python32\lib\unittest\runner.py", line 173, in run
result.printErrors()
File "C:\Python32\lib\unittest\runner.py", line 110, in printErrors
self.printErrorList('FAIL', self.failures)
File "C:\Python32\lib\unittest\runner.py", line 117, in printErrorList
self.stream.writeln("%s" % err)
File "C:\Python32\lib\unittest\runner.py", line 25, in writeln
self.write(arg)
UnicodeEncodeError: 'cp932' codec can't encode character '\x93' in position 495:
illegal multibyte sequence
[2/2] test_strptime
test_basic (test.test_strptime.getlang_Tests) ... ok
test_am_pm (test.test_strptime.LocaleTime_Tests) ... ok
test_date_time (test.test_strptime.LocaleTime_Tests) ... ok
test_lang (test.test_strptime.LocaleTime_Tests) ... ok
test_month (test.test_strptime.LocaleTime_Tests) ... ok
test_timezone (test.test_strptime.LocaleTime_Tests) ... FAIL
test_weekday (test.test_strptime.LocaleTime_Tests) ... ok
test_blankpattern (test.test_strptime.TimeRETests) ... ok
test_compile (test.test_strptime.TimeRETests) ... FAIL
test_locale_data_w_regex_metacharacters (test.test_strptime.TimeRETests) ... ok
test_matching_with_escapes (test.test_strptime.TimeRETests) ... ok
test_pattern (test.test_strptime.TimeRETests) ... ok
test_pattern_escaping (test.test_strptime.TimeRETests) ... ok
test_whitespace_substitution (test.test_strptime.TimeRETests) ... ok
test_ValueError (test.test_strptime.StrptimeTests) ... ok
test_bad_timezone (test.test_strptime.StrptimeTests) ... ok
test_caseinsensitive (test.test_strptime.StrptimeTests) ... ok
test_date (test.test_strptime.StrptimeTests) ... ok
test_date_time (test.test_strptime.StrptimeTests) ... ok
test_day (test.test_strptime.StrptimeTests) ... ok
test_defaults (test.test_strptime.StrptimeTests) ... ok
test_escaping (test.test_strptime.StrptimeTests) ... ok
test_fraction (test.test_strptime.StrptimeTests) ... ok
test_hour (test.test_strptime.StrptimeTests) ... ok
test_julian (test.test_strptime.StrptimeTests) ... ok
test_minute (test.test_strptime.StrptimeTests) ... ok
test_month (test.test_strptime.StrptimeTests) ... ok
test_percent (test.test_strptime.StrptimeTests) ... ok
test_second (test.test_strptime.StrptimeTests) ... ok
test_time (test.test_strptime.StrptimeTests) ... ok
test_timezone (test.test_strptime.StrptimeTests) ... ERROR
test_unconverteddata (test.test_strptime.StrptimeTests) ... ok
test_weekday (test.test_strptime.StrptimeTests) ... ok
test_year (test.test_strptime.StrptimeTests) ... ok
test_twelve_noon_midnight (test.test_strptime.Strptime12AMPMTests) ... ok
test_all_julian_days (test.test_strptime.JulianTests) ... ok
test_day_of_week_calculation (test.test_strptime.CalculationTests) ... ERROR
test_gregorian_calculation (test.test_strptime.CalculationTests) ... ERROR
test_julian_calculation (test.test_strptime.CalculationTests) ... ERROR
test_week_of_year_and_day_of_week_calculation (test.test_strptime.CalculationTes
ts) ... ok
test_TimeRE_recreation (test.test_strptime.CacheTests) ... ok
test_new_localetime (test.test_strptime.CacheTests) ... ok
test_regex_cleanup (test.test_strptime.CacheTests) ... ok
test_time_re_recreation (test.test_strptime.CacheTests) ... ok ====================================================================== Traceback (most recent call last):
File "C:\Python32\lib\test\test_strptime.py", line 303, in test_timezone
strp_output = _strptime._strptime_time(strf_output, "%Z")
File "C:\Python32\lib\_strptime.py", line 482, in _strptime_time
tt = _strptime(data_string, format)[0]
File "C:\Python32\lib\_strptime.py", line 337, in _strptime
(data_string, format))
ValueError: time data '\x93\x8c\x8b\x9e (\x95W\x8f\x80\x8e\x9e)' does not match
format '%Z' ====================================================================== Traceback (most recent call last):
File "C:\Python32\lib\test\test_strptime.py", line 437, in test_day_of_week_ca
lculation
format_string)
File "C:\Python32\lib\_strptime.py", line 482, in _strptime_time
tt = _strptime(data_string, format)[0]
File "C:\Python32\lib\_strptime.py", line 337, in _strptime
(data_string, format))
ValueError: time data '2010 12 08 14 00 342 \x93\x8c\x8b\x9e (\x95W\x8f\x80\x8e\
x9e)' does not match format '%Y %m %d %H %S %j %Z' ====================================================================== Traceback (most recent call last):
File "C:\Python32\lib\test\test_strptime.py", line 423, in test_gregorian_calc
ulation
format_string)
File "C:\Python32\lib\_strptime.py", line 482, in _strptime_time
tt = _strptime(data_string, format)[0]
File "C:\Python32\lib\_strptime.py", line 337, in _strptime
(data_string, format))
ValueError: time data '2010 14 58 01 3 342 \x93\x8c\x8b\x9e (\x95W\x8f\x80\x8e\x
9e)' does not match format '%Y %H %M %S %w %j %Z' ====================================================================== Traceback (most recent call last):
File "C:\Python32\lib\test\test_strptime.py", line 414, in test_julian_calcula
tion
format_string)
File "C:\Python32\lib\_strptime.py", line 482, in _strptime_time
tt = _strptime(data_string, format)[0]
File "C:\Python32\lib\_strptime.py", line 337, in _strptime
(data_string, format))
ValueError: time data '2010 12 08 14 58 01 3 \x93\x8c\x8b\x9e (\x95W\x8f\x80\x8e
\x9e)' does not match format '%Y %m %d %H %M %S %w %Z' ====================================================================== test test_strptime crashed -- <class 'UnicodeEncodeError'>: 'cp932' codec can't
encode character '\x93' in position 192: illegal multibyte sequence
Traceback (most recent call last):
File "C:\Python32\lib\test\regrtest.py", line 960, in runtest_inner
indirect_test()
File "C:\Python32\lib\test\test_strptime.py", line 557, in test_main
CacheTests
File "C:\Python32\lib\test\support.py", line 1146, in run_unittest
_run_suite(suite)
File "C:\Python32\lib\test\support.py", line 1120, in _run_suite
result = runner.run(suite)
File "C:\Python32\lib\unittest\runner.py", line 173, in run
result.printErrors()
File "C:\Python32\lib\unittest\runner.py", line 110, in printErrors
self.printErrorList('FAIL', self.failures)
File "C:\Python32\lib\unittest\runner.py", line 117, in printErrorList
self.stream.writeln("%s" % err)
File "C:\Python32\lib\unittest\runner.py", line 25, in writeln
self.write(arg)
UnicodeEncodeError: 'cp932' codec can't encode character '\x93' in position 192:
illegal multibyte sequence
2 tests failed:
test_strptime test_time |
I don't see this on a US/English version of Windows 7 with 3.2b1 installed. cp932 is the default on a Japanese version, correct? (I'm not very good with all of this encoding stuff so I don't know how much help I can be) |
I think this is locale problem. With "C" locale on windows, It is just like .... In japanese, wcsftime returns non ascii characters for #ifdef HAVE_WCSFTIME
ret = PyUnicode_FromWideChar(outbuf, buflen);
#else so Unicode object will contain data in this strange encoding. I investigated a little about locale, and I learned C |
I'll attach workaround. I used to confirm this works on |
This looks like valid cp932 data to me
>>> b'2010 14 58 01 3 342 \x93\x8c\x8b\x9e (\x95W\x8f\x80\x8e\x9e)'.decode('cp932')
'2010 14 58 01 3 342 東京 (標準時)' Please help me with Japanese, but I think the above means Tokyo timezone. However, strftime should have produced decoded unicode strings, not raw cp932 in a str. What does time.strftime('%Z') return on your system? |
Here you are. >>> import time
>>> time.strftime('%Z')
'\x93\x8c\x8b\x9e (\x95W\x8f\x80\x8e\x9e)' |
On Wed, Dec 8, 2010 at 1:12 PM, Hirokazu Yamamoto
<report@bugs.python.org> wrote:
..
>>>> import time
>>>> time.strftime('%Z')
> '\x93\x8c\x8b\x9e (\x95W\x8f\x80\x8e\x9e)' Thanks. Please bear with me for one more question: what is
? |
I got readable result. ;-) >>> import time
>>> time.tzname
('東京 (標準時)', '東京 (標準時)') |
On Wed, Dec 8, 2010 at 1:50 PM, Hirokazu Yamamoto
You mean readable to *you*. :-)
This makes sense now. There are two issues here:
|
No, mbcs is not wide character sets (wchar_t*) but ANSI character sets
I attached test program to test behavior of strftime and wcsftime If strftime doesn't depend on locale and equals to tzname # Can somebody test this on VS9? And other locales? |
See also issue bpo-13029. |
New changeset e3d9c5e690fc by Victor Stinner in branch '3.2': New changeset 79e60977fc04 by Victor Stinner in branch 'default': |
It's a bug in the Windows API: I used the workaround suggested by Hirokazu Yamamoto. Thanks Hirokazu! Python 2.7 doesn't use wcsftime() and so it is not affected by this issue. |
Crashes on the Windows buildbots: f:\dd\vctools\crt_bld\self_x86\crt\src\strftime.c(832) : Assertion failed: ( "Invalid format directive" , 0 ) |
New changeset e3c13a1d2595 by Victor Stinner in branch 'default': |
Oops, it should be fixed by my last commits. |
New changeset 977c5753ca32 by Victor Stinner in branch '3.2': |
This solution no longer works. If the system is configured to use the Japanese system locale and language pack, then 3.4.3 returns codepage 932 mojibake for the "%Z" time zone name. Originally this approach worked because it called PyUnicode_Decode using the 'mbcs' encoding. >>> time.strftime('%Z')
'\x91\xbe\x95\xbd\x97m\x89\xc4\x8e\x9e\x8a\xd4'
>>> time.strftime('%Z').encode('latin-1').decode('932')
'太平洋夏時間' The problem is worse for 3.5 built with VC++ 14. In the new CRT strftime decodes the format string via MultiByteToWideChar, calls _Wcsftime_l, and encodes the result back via WideCharToMultiByte. The outer conversions use the default LC_TIME codepage, which is ANSI (ACP), so they're not the problem. The problem is the internal _mbstowcs_s_l conversion of the ANSI time zone name, which creates the above-shown mojibake 'unicode' string. This is then compounded by calling WideCharToMultiByte on the result: >>> time.strftime('%Z')
'?????m?A???O' There's no way to fix this by transcoding. The result is just garbage. |
Update since msg243660: Python 3.8+ now calls setlocale(LC_CTYPE, "") at startup in Windows, as it has always done in POSIX, so decoding the output of strftime("%Z") with PyUnicode_DecodeLocaleAndSize() works again since both agree on using the process active code page. In 3.7+, per bpo-36779, time.tzname is set when the module is first loaded by directly querying GetTimeZoneInformation(). time.tzset() is still not supported, despite the fact that it was always supported by ucrt, so this value can become stale relative to strftime("%Z"). Starting with Windows 10 v2004 (build 19041), ucrt uses an internal wide-character version of the time-zone name that gets returned by an internal __wide_tzname() call and used for "%Z" in wcsftime(). The wide-character value gets updated by _tzset() and kept in sync with _tzname. |
At least it works as much as it ever did. It depends on the process active code page being compatible with the preferred UI language of the current process or thread. For example if the UI language is Japanese ('ja-JP') for the current user, but the process active code page is Latin 1252 (based on the system locale), then the result will be garbage. In that case, given the time-zone name is in Japanese, both LC_TIME and LC_CTYPE have to be changed to "ja-JP" in order to correctly encode (as tzname in ucrt), decode-encode (for strftime in ucrt) and finally decode again via PyUnicode_DecodeLocaleAndSize(). If Python switched back to using wcsftime() in Windows 10 2004+, then the current locale encoding would no longer be a problem for any UI language. |
Eryk Sun: This issue is now closed. If you want to enhance the time module, please open a new issue. |
I was aware of that at the time, Victor. The problem can be worked on in a new issue, or in the older issue bpo-8304, which remains open. The two messages that I added are purely informative, to update my original comment in msg243660. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: