Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_time test_strptime fails on windows #54862

Closed
ocean-city mannequin opened this issue Dec 8, 2010 · 22 comments
Closed

test_time test_strptime fails on windows #54862

ocean-city mannequin opened this issue Dec 8, 2010 · 22 comments
Assignees

Comments

@ocean-city
Copy link
Mannequin

ocean-city mannequin commented Dec 8, 2010

BPO 10653
Nosy @abalkin, @pitrou, @vstinner, @jkloth, @briancurtin, @eryksun
Files
  • py3k_workaround_for_wcsftime.patch
  • main.c: test code
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/vstinner'
    closed_at = <Date 2011-10-16.20:07:05.917>
    created_at = <Date 2010-12-08.15:01:27.776>
    labels = ['OS-windows']
    title = 'test_time test_strptime fails on windows'
    updated_at = <Date 2021-03-08.19:52:06.863>
    user = 'https://bugs.python.org/ocean-city'

    bugs.python.org fields:

    activity = <Date 2021-03-08.19:52:06.863>
    actor = 'eryksun'
    assignee = 'vstinner'
    closed = True
    closed_date = <Date 2011-10-16.20:07:05.917>
    closer = 'vstinner'
    components = ['Windows']
    creation = <Date 2010-12-08.15:01:27.776>
    creator = 'ocean-city'
    dependencies = []
    files = ['19978', '19986']
    hgrepos = []
    issue_num = 10653
    keywords = ['patch']
    message_count = 22.0
    messages = ['123612', '123618', '123623', '123624', '123625', '123626', '123628', '123631', '123639', '123676', '144419', '145490', '145492', '145596', '145628', '145629', '145647', '243660', '388237', '388238', '388279', '388292']
    nosy_count = 8.0
    nosy_names = ['belopolsky', 'pitrou', 'vstinner', 'ocean-city', 'jkloth', 'brian.curtin', 'python-dev', 'eryksun']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue10653'
    versions = ['Python 3.2', 'Python 3.3', 'Python 3.4', 'Python 3.5']

    @ocean-city
    Copy link
    Mannequin Author

    ocean-city mannequin commented Dec 8, 2010

    Following tests fails on official Python3.2 Windows binary.

    I cannot reproduce this on VC6.

    /////////////////////////////////////////////////////

    C:\Python32>.\python -m test.regrtest -v test_time test_strptime

    [1/2] test_time
    test_asctime (test.test_time.TimeTestCase) ... ok
    test_asctime_bounding_check (test.test_time.TimeTestCase) ... ok
    test_clock (test.test_time.TimeTestCase) ... ok
    test_conversions (test.test_time.TimeTestCase) ... ok
    test_ctime_without_arg (test.test_time.TimeTestCase) ... ok
    test_data_attributes (test.test_time.TimeTestCase) ... ok
    test_default_values_for_zero (test.test_time.TimeTestCase) ... ok
    test_gmtime_without_arg (test.test_time.TimeTestCase) ... ok
    test_insane_timestamps (test.test_time.TimeTestCase) ... ok
    test_localtime_without_arg (test.test_time.TimeTestCase) ... ok
    test_sleep (test.test_time.TimeTestCase) ... ok
    test_strftime (test.test_time.TimeTestCase) ... ok
    test_strftime_bounding_check (test.test_time.TimeTestCase) ... ok
    test_strptime (test.test_time.TimeTestCase) ... FAIL
    test_strptime_bytes (test.test_time.TimeTestCase) ... ok
    test_tzset (test.test_time.TimeTestCase) ... ok
    test_bug_3061 (test.test_time.TestLocale) ... ok

    ======================================================================
    FAIL: test_strptime (test.test_time.TimeTestCase)
    ----------------------------------------------------------------------

    test test_time crashed -- <class 'UnicodeEncodeError'>: 'cp932' codec can't enco
    de character '\x93' in position 495: illegal multibyte sequence
    Traceback (most recent call last):
      File "C:\Python32\lib\test\regrtest.py", line 960, in runtest_inner
        indirect_test()
      File "C:\Python32\lib\test\test_time.py", line 244, in test_main
        support.run_unittest(TimeTestCase, TestLocale)
      File "C:\Python32\lib\test\support.py", line 1146, in run_unittest
        _run_suite(suite)
      File "C:\Python32\lib\test\support.py", line 1120, in _run_suite
        result = runner.run(suite)
      File "C:\Python32\lib\unittest\runner.py", line 173, in run
        result.printErrors()
      File "C:\Python32\lib\unittest\runner.py", line 110, in printErrors
        self.printErrorList('FAIL', self.failures)
      File "C:\Python32\lib\unittest\runner.py", line 117, in printErrorList
        self.stream.writeln("%s" % err)
      File "C:\Python32\lib\unittest\runner.py", line 25, in writeln
        self.write(arg)
    UnicodeEncodeError: 'cp932' codec can't encode character '\x93' in position 495:
     illegal multibyte sequence
    [2/2] test_strptime
    test_basic (test.test_strptime.getlang_Tests) ... ok
    test_am_pm (test.test_strptime.LocaleTime_Tests) ... ok
    test_date_time (test.test_strptime.LocaleTime_Tests) ... ok
    test_lang (test.test_strptime.LocaleTime_Tests) ... ok
    test_month (test.test_strptime.LocaleTime_Tests) ... ok
    test_timezone (test.test_strptime.LocaleTime_Tests) ... FAIL
    test_weekday (test.test_strptime.LocaleTime_Tests) ... ok
    test_blankpattern (test.test_strptime.TimeRETests) ... ok
    test_compile (test.test_strptime.TimeRETests) ... FAIL
    test_locale_data_w_regex_metacharacters (test.test_strptime.TimeRETests) ... ok
    test_matching_with_escapes (test.test_strptime.TimeRETests) ... ok
    test_pattern (test.test_strptime.TimeRETests) ... ok
    test_pattern_escaping (test.test_strptime.TimeRETests) ... ok
    test_whitespace_substitution (test.test_strptime.TimeRETests) ... ok
    test_ValueError (test.test_strptime.StrptimeTests) ... ok
    test_bad_timezone (test.test_strptime.StrptimeTests) ... ok
    test_caseinsensitive (test.test_strptime.StrptimeTests) ... ok
    test_date (test.test_strptime.StrptimeTests) ... ok
    test_date_time (test.test_strptime.StrptimeTests) ... ok
    test_day (test.test_strptime.StrptimeTests) ... ok
    test_defaults (test.test_strptime.StrptimeTests) ... ok
    test_escaping (test.test_strptime.StrptimeTests) ... ok
    test_fraction (test.test_strptime.StrptimeTests) ... ok
    test_hour (test.test_strptime.StrptimeTests) ... ok
    test_julian (test.test_strptime.StrptimeTests) ... ok
    test_minute (test.test_strptime.StrptimeTests) ... ok
    test_month (test.test_strptime.StrptimeTests) ... ok
    test_percent (test.test_strptime.StrptimeTests) ... ok
    test_second (test.test_strptime.StrptimeTests) ... ok
    test_time (test.test_strptime.StrptimeTests) ... ok
    test_timezone (test.test_strptime.StrptimeTests) ... ERROR
    test_unconverteddata (test.test_strptime.StrptimeTests) ... ok
    test_weekday (test.test_strptime.StrptimeTests) ... ok
    test_year (test.test_strptime.StrptimeTests) ... ok
    test_twelve_noon_midnight (test.test_strptime.Strptime12AMPMTests) ... ok
    test_all_julian_days (test.test_strptime.JulianTests) ... ok
    test_day_of_week_calculation (test.test_strptime.CalculationTests) ... ERROR
    test_gregorian_calculation (test.test_strptime.CalculationTests) ... ERROR
    test_julian_calculation (test.test_strptime.CalculationTests) ... ERROR
    test_week_of_year_and_day_of_week_calculation (test.test_strptime.CalculationTes
    ts) ... ok
    test_TimeRE_recreation (test.test_strptime.CacheTests) ... ok
    test_new_localetime (test.test_strptime.CacheTests) ... ok
    test_regex_cleanup (test.test_strptime.CacheTests) ... ok
    test_time_re_recreation (test.test_strptime.CacheTests) ... ok

    ======================================================================
    ERROR: test_timezone (test.test_strptime.StrptimeTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "C:\Python32\lib\test\test_strptime.py", line 303, in test_timezone
        strp_output = _strptime._strptime_time(strf_output, "%Z")
      File "C:\Python32\lib\_strptime.py", line 482, in _strptime_time
        tt = _strptime(data_string, format)[0]
      File "C:\Python32\lib\_strptime.py", line 337, in _strptime
        (data_string, format))
    ValueError: time data '\x93\x8c\x8b\x9e (\x95W\x8f\x80\x8e\x9e)' does not match
    format '%Z'

    ======================================================================
    ERROR: test_day_of_week_calculation (test.test_strptime.CalculationTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "C:\Python32\lib\test\test_strptime.py", line 437, in test_day_of_week_ca
    lculation
        format_string)
      File "C:\Python32\lib\_strptime.py", line 482, in _strptime_time
        tt = _strptime(data_string, format)[0]
      File "C:\Python32\lib\_strptime.py", line 337, in _strptime
        (data_string, format))
    ValueError: time data '2010 12 08 14 00 342 \x93\x8c\x8b\x9e (\x95W\x8f\x80\x8e\
    x9e)' does not match format '%Y %m %d %H %S %j %Z'

    ======================================================================
    ERROR: test_gregorian_calculation (test.test_strptime.CalculationTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "C:\Python32\lib\test\test_strptime.py", line 423, in test_gregorian_calc
    ulation
        format_string)
      File "C:\Python32\lib\_strptime.py", line 482, in _strptime_time
        tt = _strptime(data_string, format)[0]
      File "C:\Python32\lib\_strptime.py", line 337, in _strptime
        (data_string, format))
    ValueError: time data '2010 14 58 01 3 342 \x93\x8c\x8b\x9e (\x95W\x8f\x80\x8e\x
    9e)' does not match format '%Y %H %M %S %w %j %Z'

    ======================================================================
    ERROR: test_julian_calculation (test.test_strptime.CalculationTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "C:\Python32\lib\test\test_strptime.py", line 414, in test_julian_calcula
    tion
        format_string)
      File "C:\Python32\lib\_strptime.py", line 482, in _strptime_time
        tt = _strptime(data_string, format)[0]
      File "C:\Python32\lib\_strptime.py", line 337, in _strptime
        (data_string, format))
    ValueError: time data '2010 12 08 14 58 01 3 \x93\x8c\x8b\x9e (\x95W\x8f\x80\x8e
    \x9e)' does not match format '%Y %m %d %H %M %S %w %Z'

    ======================================================================
    FAIL: test_timezone (test.test_strptime.LocaleTime_Tests)
    ----------------------------------------------------------------------

    test test_strptime crashed -- <class 'UnicodeEncodeError'>: 'cp932' codec can't
    encode character '\x93' in position 192: illegal multibyte sequence
    Traceback (most recent call last):
      File "C:\Python32\lib\test\regrtest.py", line 960, in runtest_inner
        indirect_test()
      File "C:\Python32\lib\test\test_strptime.py", line 557, in test_main
        CacheTests
      File "C:\Python32\lib\test\support.py", line 1146, in run_unittest
        _run_suite(suite)
      File "C:\Python32\lib\test\support.py", line 1120, in _run_suite
        result = runner.run(suite)
      File "C:\Python32\lib\unittest\runner.py", line 173, in run
        result.printErrors()
      File "C:\Python32\lib\unittest\runner.py", line 110, in printErrors
        self.printErrorList('FAIL', self.failures)
      File "C:\Python32\lib\unittest\runner.py", line 117, in printErrorList
        self.stream.writeln("%s" % err)
      File "C:\Python32\lib\unittest\runner.py", line 25, in writeln
        self.write(arg)
    UnicodeEncodeError: 'cp932' codec can't encode character '\x93' in position 192:
     illegal multibyte sequence
    2 tests failed:
        test_strptime test_time

    @ocean-city ocean-city mannequin added the OS-windows label Dec 8, 2010
    @briancurtin
    Copy link
    Member

    I don't see this on a US/English version of Windows 7 with 3.2b1 installed. cp932 is the default on a Japanese version, correct?

    (I'm not very good with all of this encoding stuff so I don't know how much help I can be)

    @ocean-city
    Copy link
    Mannequin Author

    ocean-city mannequin commented Dec 8, 2010

    I think this is locale problem. With "C" locale on windows,
    wcsftime doesn't return UTF16. (when non ascii characters
    are contained)

    It is just like ....
    char cbuf[] = "...."; /* contains non ascii chars in MBCS */
    wchar_t wbuf[sizeof(cbuf)];
    for (size_t i = 0; i < sizeof(cbuf); ++i)
    wbuf[i] = cbuf[i];
    /* just copy it. non ascii chars in MBCS uses two bytes,
    but should use 1 char space in UTF16. But this case,
    it uses 2 chars space! (something strange encoding) */

    In japanese, wcsftime returns non ascii characters for
    timezone in this strange encoding. Python converts this
    with

    #ifdef HAVE_WCSFTIME
                ret = PyUnicode_FromWideChar(outbuf, buflen);
    #else

    so Unicode object will contain data in this strange encoding.
    This is cause of problem.

    I investigated a little about locale, and I learned C
    standard does not guarantee wchar_t is always UTF16.

    @ocean-city
    Copy link
    Mannequin Author

    ocean-city mannequin commented Dec 8, 2010

    I'll attach workaround. I used to confirm this works on
    VS8, but I don't have VS8 now. I hope this still works.

    @abalkin
    Copy link
    Member

    abalkin commented Dec 8, 2010

    ValueError: time data '2010 14 58 01 3 342 \x93\x8c\x8b\x9e (\x95W\x8f\x80\x8e\x9e)' does not match format '%Y %H %M %S %w %j %Z'

    This looks like valid cp932 data to me
    >>> b'2010 14 58 01 3 342 \x93\x8c\x8b\x9e (\x95W\x8f\x80\x8e\x9e)'.decode('cp932')
    '2010 14 58 01 3 342 東京 (標準時)'

    Please help me with Japanese, but I think the above means Tokyo timezone. However, strftime should have produced decoded unicode strings, not raw cp932 in a str. What does time.strftime('%Z') return on your system?

    @ocean-city
    Copy link
    Mannequin Author

    ocean-city mannequin commented Dec 8, 2010

    Here you are.

    >>> import time
    >>> time.strftime('%Z')
    '\x93\x8c\x8b\x9e (\x95W\x8f\x80\x8e\x9e)'

    @abalkin
    Copy link
    Member

    abalkin commented Dec 8, 2010

    On Wed, Dec 8, 2010 at 1:12 PM, Hirokazu Yamamoto
    <report@bugs.python.org> wrote:
    ..
    >>>> import time
    >>>> time.strftime('%Z')
    > '\x93\x8c\x8b\x9e (\x95W\x8f\x80\x8e\x9e)'

    Thanks. Please bear with me for one more question: what is

    >> time.tzname

    ?

    @ocean-city
    Copy link
    Mannequin Author

    ocean-city mannequin commented Dec 8, 2010

    I got readable result. ;-)

    >>> import time
    >>> time.tzname
    ('東京 (標準時)', '東京 (標準時)')

    @abalkin
    Copy link
    Member

    abalkin commented Dec 8, 2010

    On Wed, Dec 8, 2010 at 1:50 PM, Hirokazu Yamamoto
    <report@bugs.python.org> wrote:
    ..

    I got readable result. ;-)

    You mean readable to *you*. :-)

    >>> import time
    >>> time.tzname
    ('東京 (標準時)', '東京 (標準時)')

    This makes sense now. There are two issues here:

    1. Decoding the output of wcsftime(). Python expects mbcs (which I
      believe is an UTF16-like wide char encoding) while Windows apparently
      puts cp932 there in your locale. I don't have expertise to address
      this issue.

    2. strptime() cannot parse strftime() output when strftime('%Z') is
      different from time.tzname[dst]. This issue we can address. Note
      that for most of the locale information such as day of the week or
      month names, strptime() relies on strftime() output, so the
      round-tripping should work even when strftime() results are
      nonsensical. On the other hand, tz spellings are taken from
      time.tzname. I think we can make strptime() more robust by adding
      [time.strftime('%Z', (2000,1,1,0,0,0,0,0,dst) for dst in (0,1)] to the
      list of recognized tz names if they differ from time.tzname.

    @ocean-city
    Copy link
    Mannequin Author

    ocean-city mannequin commented Dec 9, 2010

    1. Decoding the output of wcsftime(). Python expects mbcs (which
      I believe is an UTF16-like wide char encoding) while Windows
      apparently puts cp932 there in your locale. I don't have expertise
      to address this issue.

    No, mbcs is not wide character sets (wchar_t*) but ANSI character sets
    (char*). In my environment, mbcs == cp932. And python expects UTF-16.

    1. strptime() cannot parse strftime() output when strftime('%Z') is
      different from time.tzname[dst]. (snip)

    I attached test program to test behavior of strftime and wcsftime
    on locale. On VC6, strftime doesn't depend on locale, wheres
    wcsftime changed the value depends on locale. (I tested only "C"
    locale and "System" locale because I could not find other
    locales working on my environment, so )

    If strftime doesn't depend on locale and equals to tzname
    for every locale, maybe strftime is preferred on windows.

    # Can somebody test this on VS9? And other locales?

    @vstinner
    Copy link
    Member

    See also issue bpo-13029.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 14, 2011

    New changeset e3d9c5e690fc by Victor Stinner in branch '3.2':
    Issue bpo-10653: On Windows, use strftime() instead of wcsftime() because
    http://hg.python.org/cpython/rev/e3d9c5e690fc

    New changeset 79e60977fc04 by Victor Stinner in branch 'default':
    (Merge 3.2) Issue bpo-10653: On Windows, use strftime() instead of wcsftime()
    http://hg.python.org/cpython/rev/79e60977fc04

    @vstinner
    Copy link
    Member

    It's a bug in the Windows API: I used the workaround suggested by Hirokazu Yamamoto. Thanks Hirokazu!

    Python 2.7 doesn't use wcsftime() and so it is not affected by this issue.

    @pitrou
    Copy link
    Member

    pitrou commented Oct 15, 2011

    Crashes on the Windows buildbots:

    f:\dd\vctools\crt_bld\self_x86\crt\src\strftime.c(832) : Assertion failed: ( "Invalid format directive" , 0 )
    f:\dd\vctools\crt_bld\self_x86\crt\src\strftime.c(484) : Assertion failed: FALSE

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 16, 2011

    New changeset e3c13a1d2595 by Victor Stinner in branch 'default':
    Issue bpo-10653: Fix time.strftime() on Windows, check for invalid format strings
    http://hg.python.org/cpython/rev/e3c13a1d2595

    @vstinner
    Copy link
    Member

    Crashes on the Windows buildbots:

    Oops, it should be fixed by my last commits.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 16, 2011

    New changeset 977c5753ca32 by Victor Stinner in branch '3.2':
    Issue bpo-10653: Fix time.strftime() on Windows, check for invalid format strings
    http://hg.python.org/cpython/rev/977c5753ca32

    @eryksun
    Copy link
    Contributor

    eryksun commented May 20, 2015

    This solution no longer works. If the system is configured to use the Japanese system locale and language pack, then 3.4.3 returns codepage 932 mojibake for the "%Z" time zone name. Originally this approach worked because it called PyUnicode_Decode using the 'mbcs' encoding.
    Currently it calls PyUnicode_DecodeLocaleAndSize, which just ends up calling mbstowcs. That's pretty much what wcsftime does. In the default C locale, mbstowcs casts the byte values to wchar_t:

        >>> time.strftime('%Z')
        '\x91\xbe\x95\xbd\x97m\x89\xc4\x8e\x9e\x8a\xd4'
        >>> time.strftime('%Z').encode('latin-1').decode('932')
        '太平洋夏時間'

    The problem is worse for 3.5 built with VC++ 14. In the new CRT strftime decodes the format string via MultiByteToWideChar, calls _Wcsftime_l, and encodes the result back via WideCharToMultiByte. The outer conversions use the default LC_TIME codepage, which is ANSI (ACP), so they're not the problem. The problem is the internal _mbstowcs_s_l conversion of the ANSI time zone name, which creates the above-shown mojibake 'unicode' string. This is then compounded by calling WideCharToMultiByte on the result:

        >>> time.strftime('%Z')
        '?????m?A???O'

    There's no way to fix this by transcoding. The result is just garbage.

    @eryksun
    Copy link
    Contributor

    eryksun commented Mar 7, 2021

    Update since msg243660:

    Python 3.8+ now calls setlocale(LC_CTYPE, "") at startup in Windows, as it has always done in POSIX, so decoding the output of strftime("%Z") with PyUnicode_DecodeLocaleAndSize() works again since both agree on using the process active code page.

    In 3.7+, per bpo-36779, time.tzname is set when the module is first loaded by directly querying GetTimeZoneInformation(). time.tzset() is still not supported, despite the fact that it was always supported by ucrt, so this value can become stale relative to strftime("%Z").

    Starting with Windows 10 v2004 (build 19041), ucrt uses an internal wide-character version of the time-zone name that gets returned by an internal __wide_tzname() call and used for "%Z" in wcsftime(). The wide-character value gets updated by _tzset() and kept in sync with _tzname.

    @eryksun
    Copy link
    Contributor

    eryksun commented Mar 7, 2021

    decoding the output of strftime("%Z") with PyUnicode_DecodeLocaleAndSize()
    works again since both agree on using the process active code page

    At least it works as much as it ever did. It depends on the process active code page being compatible with the preferred UI language of the current process or thread. For example if the UI language is Japanese ('ja-JP') for the current user, but the process active code page is Latin 1252 (based on the system locale), then the result will be garbage. In that case, given the time-zone name is in Japanese, both LC_TIME and LC_CTYPE have to be changed to "ja-JP" in order to correctly encode (as tzname in ucrt), decode-encode (for strftime in ucrt) and finally decode again via PyUnicode_DecodeLocaleAndSize(). If Python switched back to using wcsftime() in Windows 10 2004+, then the current locale encoding would no longer be a problem for any UI language.

    @vstinner
    Copy link
    Member

    vstinner commented Mar 8, 2021

    Eryk Sun: This issue is now closed. If you want to enhance the time module, please open a new issue.

    @eryksun
    Copy link
    Contributor

    eryksun commented Mar 8, 2021

    Eryk Sun: This issue is now closed. If you want to enhance
    the time module, please open a new issue.

    I was aware of that at the time, Victor. The problem can be worked on in a new issue, or in the older issue bpo-8304, which remains open. The two messages that I added are purely informative, to update my original comment in msg243660.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants