classification
Title: test_time test_strptime fails on windows
Type: Stage:
Components: Windows Versions: Python 3.5, Python 3.2, Python 3.3, Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: vstinner Nosy List: belopolsky, brian.curtin, eryksun, ocean-city, pitrou, python-dev, vstinner
Priority: normal Keywords: patch

Created on 2010-12-08 15:01 by ocean-city, last changed 2015-05-20 13:30 by eryksun. This issue is now closed.

Files
File name Uploaded Description Edit
py3k_workaround_for_wcsftime.patch ocean-city, 2010-12-08 17:57 review
main.c ocean-city, 2010-12-09 10:54 test code
Messages (18)
msg123612 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2010-12-08 15:01
Following tests fails on official Python3.2 Windows binary.

I cannot reproduce this on VC6.

/////////////////////////////////////////////////////

C:\Python32>.\python -m test.regrtest -v test_time test_strptime

[1/2] test_time
test_asctime (test.test_time.TimeTestCase) ... ok
test_asctime_bounding_check (test.test_time.TimeTestCase) ... ok
test_clock (test.test_time.TimeTestCase) ... ok
test_conversions (test.test_time.TimeTestCase) ... ok
test_ctime_without_arg (test.test_time.TimeTestCase) ... ok
test_data_attributes (test.test_time.TimeTestCase) ... ok
test_default_values_for_zero (test.test_time.TimeTestCase) ... ok
test_gmtime_without_arg (test.test_time.TimeTestCase) ... ok
test_insane_timestamps (test.test_time.TimeTestCase) ... ok
test_localtime_without_arg (test.test_time.TimeTestCase) ... ok
test_sleep (test.test_time.TimeTestCase) ... ok
test_strftime (test.test_time.TimeTestCase) ... ok
test_strftime_bounding_check (test.test_time.TimeTestCase) ... ok
test_strptime (test.test_time.TimeTestCase) ... FAIL
test_strptime_bytes (test.test_time.TimeTestCase) ... ok
test_tzset (test.test_time.TimeTestCase) ... ok
test_bug_3061 (test.test_time.TestLocale) ... ok

======================================================================
FAIL: test_strptime (test.test_time.TimeTestCase)
----------------------------------------------------------------------
test test_time crashed -- <class 'UnicodeEncodeError'>: 'cp932' codec can't enco
de character '\x93' in position 495: illegal multibyte sequence
Traceback (most recent call last):
  File "C:\Python32\lib\test\regrtest.py", line 960, in runtest_inner
    indirect_test()
  File "C:\Python32\lib\test\test_time.py", line 244, in test_main
    support.run_unittest(TimeTestCase, TestLocale)
  File "C:\Python32\lib\test\support.py", line 1146, in run_unittest
    _run_suite(suite)
  File "C:\Python32\lib\test\support.py", line 1120, in _run_suite
    result = runner.run(suite)
  File "C:\Python32\lib\unittest\runner.py", line 173, in run
    result.printErrors()
  File "C:\Python32\lib\unittest\runner.py", line 110, in printErrors
    self.printErrorList('FAIL', self.failures)
  File "C:\Python32\lib\unittest\runner.py", line 117, in printErrorList
    self.stream.writeln("%s" % err)
  File "C:\Python32\lib\unittest\runner.py", line 25, in writeln
    self.write(arg)
UnicodeEncodeError: 'cp932' codec can't encode character '\x93' in position 495:
 illegal multibyte sequence
[2/2] test_strptime
test_basic (test.test_strptime.getlang_Tests) ... ok
test_am_pm (test.test_strptime.LocaleTime_Tests) ... ok
test_date_time (test.test_strptime.LocaleTime_Tests) ... ok
test_lang (test.test_strptime.LocaleTime_Tests) ... ok
test_month (test.test_strptime.LocaleTime_Tests) ... ok
test_timezone (test.test_strptime.LocaleTime_Tests) ... FAIL
test_weekday (test.test_strptime.LocaleTime_Tests) ... ok
test_blankpattern (test.test_strptime.TimeRETests) ... ok
test_compile (test.test_strptime.TimeRETests) ... FAIL
test_locale_data_w_regex_metacharacters (test.test_strptime.TimeRETests) ... ok
test_matching_with_escapes (test.test_strptime.TimeRETests) ... ok
test_pattern (test.test_strptime.TimeRETests) ... ok
test_pattern_escaping (test.test_strptime.TimeRETests) ... ok
test_whitespace_substitution (test.test_strptime.TimeRETests) ... ok
test_ValueError (test.test_strptime.StrptimeTests) ... ok
test_bad_timezone (test.test_strptime.StrptimeTests) ... ok
test_caseinsensitive (test.test_strptime.StrptimeTests) ... ok
test_date (test.test_strptime.StrptimeTests) ... ok
test_date_time (test.test_strptime.StrptimeTests) ... ok
test_day (test.test_strptime.StrptimeTests) ... ok
test_defaults (test.test_strptime.StrptimeTests) ... ok
test_escaping (test.test_strptime.StrptimeTests) ... ok
test_fraction (test.test_strptime.StrptimeTests) ... ok
test_hour (test.test_strptime.StrptimeTests) ... ok
test_julian (test.test_strptime.StrptimeTests) ... ok
test_minute (test.test_strptime.StrptimeTests) ... ok
test_month (test.test_strptime.StrptimeTests) ... ok
test_percent (test.test_strptime.StrptimeTests) ... ok
test_second (test.test_strptime.StrptimeTests) ... ok
test_time (test.test_strptime.StrptimeTests) ... ok
test_timezone (test.test_strptime.StrptimeTests) ... ERROR
test_unconverteddata (test.test_strptime.StrptimeTests) ... ok
test_weekday (test.test_strptime.StrptimeTests) ... ok
test_year (test.test_strptime.StrptimeTests) ... ok
test_twelve_noon_midnight (test.test_strptime.Strptime12AMPMTests) ... ok
test_all_julian_days (test.test_strptime.JulianTests) ... ok
test_day_of_week_calculation (test.test_strptime.CalculationTests) ... ERROR
test_gregorian_calculation (test.test_strptime.CalculationTests) ... ERROR
test_julian_calculation (test.test_strptime.CalculationTests) ... ERROR
test_week_of_year_and_day_of_week_calculation (test.test_strptime.CalculationTes
ts) ... ok
test_TimeRE_recreation (test.test_strptime.CacheTests) ... ok
test_new_localetime (test.test_strptime.CacheTests) ... ok
test_regex_cleanup (test.test_strptime.CacheTests) ... ok
test_time_re_recreation (test.test_strptime.CacheTests) ... ok

======================================================================
ERROR: test_timezone (test.test_strptime.StrptimeTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Python32\lib\test\test_strptime.py", line 303, in test_timezone
    strp_output = _strptime._strptime_time(strf_output, "%Z")
  File "C:\Python32\lib\_strptime.py", line 482, in _strptime_time
    tt = _strptime(data_string, format)[0]
  File "C:\Python32\lib\_strptime.py", line 337, in _strptime
    (data_string, format))
ValueError: time data '\x93\x8c\x8b\x9e (\x95W\x8f\x80\x8e\x9e)' does not match
format '%Z'

======================================================================
ERROR: test_day_of_week_calculation (test.test_strptime.CalculationTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Python32\lib\test\test_strptime.py", line 437, in test_day_of_week_ca
lculation
    format_string)
  File "C:\Python32\lib\_strptime.py", line 482, in _strptime_time
    tt = _strptime(data_string, format)[0]
  File "C:\Python32\lib\_strptime.py", line 337, in _strptime
    (data_string, format))
ValueError: time data '2010 12 08 14 00 342 \x93\x8c\x8b\x9e (\x95W\x8f\x80\x8e\
x9e)' does not match format '%Y %m %d %H %S %j %Z'

======================================================================
ERROR: test_gregorian_calculation (test.test_strptime.CalculationTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Python32\lib\test\test_strptime.py", line 423, in test_gregorian_calc
ulation
    format_string)
  File "C:\Python32\lib\_strptime.py", line 482, in _strptime_time
    tt = _strptime(data_string, format)[0]
  File "C:\Python32\lib\_strptime.py", line 337, in _strptime
    (data_string, format))
ValueError: time data '2010 14 58 01 3 342 \x93\x8c\x8b\x9e (\x95W\x8f\x80\x8e\x
9e)' does not match format '%Y %H %M %S %w %j %Z'

======================================================================
ERROR: test_julian_calculation (test.test_strptime.CalculationTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Python32\lib\test\test_strptime.py", line 414, in test_julian_calcula
tion
    format_string)
  File "C:\Python32\lib\_strptime.py", line 482, in _strptime_time
    tt = _strptime(data_string, format)[0]
  File "C:\Python32\lib\_strptime.py", line 337, in _strptime
    (data_string, format))
ValueError: time data '2010 12 08 14 58 01 3 \x93\x8c\x8b\x9e (\x95W\x8f\x80\x8e
\x9e)' does not match format '%Y %m %d %H %M %S %w %Z'

======================================================================
FAIL: test_timezone (test.test_strptime.LocaleTime_Tests)
----------------------------------------------------------------------
test test_strptime crashed -- <class 'UnicodeEncodeError'>: 'cp932' codec can't
encode character '\x93' in position 192: illegal multibyte sequence
Traceback (most recent call last):
  File "C:\Python32\lib\test\regrtest.py", line 960, in runtest_inner
    indirect_test()
  File "C:\Python32\lib\test\test_strptime.py", line 557, in test_main
    CacheTests
  File "C:\Python32\lib\test\support.py", line 1146, in run_unittest
    _run_suite(suite)
  File "C:\Python32\lib\test\support.py", line 1120, in _run_suite
    result = runner.run(suite)
  File "C:\Python32\lib\unittest\runner.py", line 173, in run
    result.printErrors()
  File "C:\Python32\lib\unittest\runner.py", line 110, in printErrors
    self.printErrorList('FAIL', self.failures)
  File "C:\Python32\lib\unittest\runner.py", line 117, in printErrorList
    self.stream.writeln("%s" % err)
  File "C:\Python32\lib\unittest\runner.py", line 25, in writeln
    self.write(arg)
UnicodeEncodeError: 'cp932' codec can't encode character '\x93' in position 192:
 illegal multibyte sequence
2 tests failed:
    test_strptime test_time
msg123618 - (view) Author: Brian Curtin (brian.curtin) * (Python committer) Date: 2010-12-08 15:40
I don't see this on a US/English version of Windows 7 with 3.2b1 installed. cp932 is the default on a Japanese version, correct?

(I'm not very good with all of this encoding stuff so I don't know how much help I can be)
msg123623 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2010-12-08 17:46
I think this is locale problem. With "C" locale on windows,
wcsftime doesn't return UTF16. (when non ascii characters
are contained)

It is just like ....
char cbuf[] = "...."; /* contains non ascii chars in MBCS */
wchar_t wbuf[sizeof(cbuf)];
for (size_t i = 0; i < sizeof(cbuf); ++i)
    wbuf[i] = cbuf[i];
/* just copy it. non ascii chars in MBCS uses two bytes,
   but should use 1 char space in UTF16. But this case,
   it uses 2 chars space! (something strange encoding) */

In japanese, wcsftime returns non ascii characters for
timezone in this strange encoding. Python converts this
with

#ifdef HAVE_WCSFTIME
            ret = PyUnicode_FromWideChar(outbuf, buflen);
#else

so Unicode object will contain data in this strange encoding.
This is cause of problem.

I investigated a little about locale, and I learned C
standard does not guarantee wchar_t is always UTF16.
msg123624 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2010-12-08 17:57
I'll attach workaround. I used to confirm this works on
VS8, but I don't have VS8 now. I hope this still works.
msg123625 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-12-08 17:58
> ValueError: time data '2010 14 58 01 3 342 \x93\x8c\x8b\x9e (\x95W\x8f\x80\x8e\x9e)' does not match format '%Y %H %M %S %w %j %Z'

This looks like valid cp932 data to me
>>> b'2010 14 58 01 3 342 \x93\x8c\x8b\x9e (\x95W\x8f\x80\x8e\x9e)'.decode('cp932')
'2010 14 58 01 3 342 東京 (標準時)'

Please help me with Japanese, but I think the above means Tokyo timezone.  However, strftime should have produced decoded unicode strings, not raw cp932 in a str.  What does time.strftime('%Z') return on your system?
msg123626 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2010-12-08 18:12
Here you are.

>>> import time
>>> time.strftime('%Z')
'\x93\x8c\x8b\x9e (\x95W\x8f\x80\x8e\x9e)'
msg123628 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-12-08 18:18
On Wed, Dec 8, 2010 at 1:12 PM, Hirokazu Yamamoto
<report@bugs.python.org> wrote:
..
>>>> import time
>>>> time.strftime('%Z')
> '\x93\x8c\x8b\x9e (\x95W\x8f\x80\x8e\x9e)'

Thanks.  Please bear with me for one more question:  what is

>>> time.tzname

?
msg123631 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2010-12-08 18:50
I got readable result. ;-)

>>> import time
>>> time.tzname
('東京 (標準時)', '東京 (標準時)')
msg123639 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-12-08 19:45
On Wed, Dec 8, 2010 at 1:50 PM, Hirokazu Yamamoto
<report@bugs.python.org> wrote:
..
> I got readable result. ;-)
>
You mean readable to *you*. :-)

>>>> import time
>>>> time.tzname
> ('東京 (標準時)', '東京 (標準時)')

This makes sense now.   There are two issues here:

1.  Decoding the output of wcsftime().  Python expects mbcs (which I
believe is an UTF16-like wide char encoding) while Windows apparently
puts cp932 there in your locale.  I don't have expertise to address
this issue.

2. strptime() cannot parse strftime() output when strftime('%Z') is
different from time.tzname[dst].  This issue we can address.  Note
that for most of the locale information such as day of the week or
month names, strptime() relies on strftime() output, so the
round-tripping should work even when strftime() results are
nonsensical.  On the other hand, tz spellings are taken from
time.tzname.    I think we can make strptime() more robust by adding
[time.strftime('%Z', (2000,1,1,0,0,0,0,0,dst) for dst in (0,1)] to the
list of recognized tz names if they differ from time.tzname.
msg123676 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2010-12-09 10:54
> 1.  Decoding the output of wcsftime().  Python expects mbcs (which
> I believe is an UTF16-like wide char encoding) while Windows
> apparently puts cp932 there in your locale.  I don't have expertise
> to address this issue.

No, mbcs is not wide character sets (wchar_t*) but ANSI character sets
(char*). In my environment, mbcs == cp932. And python expects UTF-16. 

> 2. strptime() cannot parse strftime() output when strftime('%Z') is
> different from time.tzname[dst]. (snip)

I attached test program to test behavior of strftime and wcsftime
on locale. On VC6, strftime doesn't depend on locale, wheres
wcsftime changed the value depends on locale. (I tested only "C"
locale and "System" locale because I could not find other
locales working on my environment, so )

If strftime doesn't depend on locale and equals to tzname
for every locale, maybe strftime is preferred on windows.

# Can somebody test this on VS9? And other locales?
msg144419 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-09-22 21:09
See also issue #13029.
msg145490 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-10-14 00:38
New changeset e3d9c5e690fc by Victor Stinner in branch '3.2':
Issue #10653: On Windows, use strftime() instead of wcsftime() because
http://hg.python.org/cpython/rev/e3d9c5e690fc

New changeset 79e60977fc04 by Victor Stinner in branch 'default':
(Merge 3.2) Issue #10653: On Windows, use strftime() instead of wcsftime()
http://hg.python.org/cpython/rev/79e60977fc04
msg145492 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-10-14 00:41
It's a bug in the Windows API: I used the workaround suggested by Hirokazu Yamamoto. Thanks Hirokazu!

Python 2.7 doesn't use wcsftime() and so it is not affected by this issue.
msg145596 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-10-15 15:42
Crashes on the Windows buildbots:

f:\dd\vctools\crt_bld\self_x86\crt\src\strftime.c(832) : Assertion failed: ( "Invalid format directive" , 0 )
f:\dd\vctools\crt_bld\self_x86\crt\src\strftime.c(484) : Assertion failed: FALSE
msg145628 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-10-16 17:07
New changeset e3c13a1d2595 by Victor Stinner in branch 'default':
Issue #10653: Fix time.strftime() on Windows, check for invalid format strings
http://hg.python.org/cpython/rev/e3c13a1d2595
msg145629 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-10-16 17:09
> Crashes on the Windows buildbots:

Oops, it should be fixed by my last commits.
msg145647 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-10-16 21:45
New changeset 977c5753ca32 by Victor Stinner in branch '3.2':
Issue #10653: Fix time.strftime() on Windows, check for invalid format strings
http://hg.python.org/cpython/rev/977c5753ca32
msg243660 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2015-05-20 13:30
This solution no longer works. If the system is configured to use the Japanese system locale and language pack, then 3.4.3 returns codepage 932 mojibake for the "%Z" time zone name. Originally [this approach worked][1] because it called PyUnicode_Decode using the 'mbcs' encoding.
Currently it calls PyUnicode_DecodeLocaleAndSize, which just ends up calling mbstowcs. That's pretty much what wcsftime does. In the default C locale, mbstowcs casts the byte values to wchar_t:

    >>> time.strftime('%Z')
    '\x91\xbe\x95\xbd\x97m\x89\xc4\x8e\x9e\x8a\xd4'
    >>> time.strftime('%Z').encode('latin-1').decode('932')
    '太平洋夏時間'

The problem is worse for 3.5 built with VC++ 14. In the new CRT strftime decodes the format string via MultiByteToWideChar, calls _Wcsftime_l, and encodes the result back via WideCharToMultiByte. The outer conversions use the default LC_TIME codepage, which is ANSI (ACP), so they're not the problem. The problem is the internal _mbstowcs_s_l conversion of the ANSI time zone name, which creates the above-shown mojibake 'unicode' string. This is then compounded by calling WideCharToMultiByte on the result:

    >>> time.strftime('%Z')
    '?????m?A???O'

There's no way to fix this by transcoding. The result is just garbage.

[1]: https://hg.python.org/cpython/file/79e60977fc04/Modules/timemodule.c#l501
History
Date User Action Args
2015-05-20 13:30:40eryksunsetnosy: + eryksun

messages: + msg243660
versions: + Python 3.4, Python 3.5
2011-10-16 21:45:14python-devsetmessages: + msg145647
2011-10-16 20:07:05vstinnersetstatus: open -> closed
2011-10-16 17:09:11vstinnersetmessages: + msg145629
2011-10-16 17:07:37python-devsetmessages: + msg145628
2011-10-15 15:42:24pitrousetstatus: closed -> open

nosy: + pitrou
messages: + msg145596

assignee: vstinner
2011-10-14 00:41:19vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg145492

versions: + Python 3.3
2011-10-14 00:38:22python-devsetnosy: + python-dev
messages: + msg145490
2011-09-22 21:09:42vstinnersetnosy: + vstinner
messages: + msg144419
2010-12-09 10:54:38ocean-citysetfiles: + main.c

messages: + msg123676
2010-12-08 19:45:36belopolskysetmessages: + msg123639
2010-12-08 18:50:07ocean-citysetmessages: + msg123631
2010-12-08 18:18:54belopolskysetmessages: + msg123628
2010-12-08 18:12:54ocean-citysetmessages: + msg123626
2010-12-08 17:58:10belopolskysetmessages: + msg123625
2010-12-08 17:57:08ocean-citysetfiles: + py3k_workaround_for_wcsftime.patch
keywords: + patch
messages: + msg123624
2010-12-08 17:46:24ocean-citysetmessages: + msg123623
2010-12-08 17:33:40r.david.murraysetnosy: + belopolsky
2010-12-08 15:40:53brian.curtinsetnosy: + brian.curtin
messages: + msg123618
2010-12-08 15:01:27ocean-citycreate