classification
Title: [Windows] time: crash on formatting time with de_DE locale
Type: Stage:
Components: Interpreter Core, Windows Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Carsten Fuchs, CharlieClark, Dominik Geldmacher, Manjusaka, eryksun, jeremy.kloth, jkloth, paul.moore, steve.dower, tim.golden, vstinner, zach.ware
Priority: normal Keywords:

Created on 2019-05-04 12:33 by CharlieClark, last changed 2019-08-27 06:20 by Carsten Fuchs.

Files
File name Uploaded Description Edit
Report.wer CharlieClark, 2019-05-06 14:04
Report.wer CharlieClark, 2019-05-07 08:39 Crash report
WER9DB9.tmp.WERInternalMetadata.xml CharlieClark, 2019-05-07 08:40 Crash metadate file
issue36792.zip jkloth, 2019-05-07 17:12
issue36792-2.zip jkloth, 2019-05-10 02:31
Messages (36)
msg341380 - (view) Author: Charlie Clark (CharlieClark) Date: 2019-05-04 12:33
Based on a bug report (https://bitbucket.org/openpyxl/openpyxl/issues/1266/locale) from a user of the openpyxl library I've identified a bug in the zipfile module that causes the Python process to crash on Windows. Currently tested with Python 3.7.3 (32-bit on Windows 10).

Sample code

import faulthandler
import locale
from zipfile import ZipFile

faulthandler.enable()

locale.setlocale(locale.LC_ALL, 'de_DE')
out = open("out.zip", "wb") 
archive = ZipFile(out, "w")
archive.writestr("properties.xml", b"<workbookPr/>")

faulthandler fingers line 1757 as the culprit but running this line locally does not cause the crash. The issue seems to be limited to Windows.
msg341436 - (view) Author: Manjusaka (Manjusaka) * Date: 2019-05-05 12:45
I can't reproduce this error on my system by using the same code.

Could you share the system info with us?
msg341437 - (view) Author: Manjusaka (Manjusaka) * Date: 2019-05-05 12:48
Or would you share the Exception with us ?

I guess it's caused by system setting
msg341438 - (view) Author: Charlie Clark (CharlieClark) Date: 2019-05-05 12:50
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 21:26:53) [MSC v.1916 32 bit (Intel)] on win32
msg341442 - (view) Author: Manjusaka (Manjusaka) * Date: 2019-05-05 13:13
Same environment.

But I still can not reproduce this exception. I guess maybe it's about the local time or timezone problem. I will find a way to figure it out
msg341444 - (view) Author: Charlie Clark (CharlieClark) Date: 2019-05-05 13:23
That's what we thought when we looked at it, but as I said, I couldn't reproduce it with just the `time` call or the `ZInfo` instantiation, so something odd is happening. I do have a German version of Windows as I suspect the original reporter does. You'd think that sort of thing wouldn't matter…
msg341456 - (view) Author: Dominik Geldmacher (Dominik Geldmacher) Date: 2019-05-05 16:06
I can reproduce it on Python 3.7.3 german Windows10 enterprise

Windows fatal exception: code 0xc0000374

Current thread 0x00003bc8 (most recent call first):
  File "C:\Python37\lib\zipfile.py", line 1757 in writestr
msg341457 - (view) Author: Manjusaka (Manjusaka) * Date: 2019-05-05 16:38
Hi  Dominik Geldmacher

May I get your system version? 1809 or 1903?

I guess maybe it's microsoft error
msg341462 - (view) Author: Dominik Geldmacher (Dominik Geldmacher) Date: 2019-05-05 17:51
Enterprise 1809 and Professional 1803 (german localized) both reproduce the issue
msg341463 - (view) Author: Manjusaka (Manjusaka) * Date: 2019-05-05 17:56
copy that

I will reset my locale setting to figure it out
msg341470 - (view) Author: Charlie Clark (CharlieClark) Date: 2019-05-05 18:45
winver tells me I have 1809. I'm only using Windows in a VM so I'm not that familiar with its innards.

Also get the error with WinPython 3.6:

Windows fatal exception: code 0xc0000374

Current thread 0x000010c0 (most recent call first):
  File "C:\Users\charlieclark\WPy64-3680\python-3.6.8.amd64\lib\zipfile.py", line 1658 in writestr
  File "<stdin>", line 1 in <module>
msg341473 - (view) Author: Jeremy Kloth (jkloth) * Date: 2019-05-05 19:22
Related to issue bpo-36319
msg341503 - (view) Author: Charlie Clark (CharlieClark) Date: 2019-05-06 14:04
I can confirm the error is caused by time.localtime(time.time()) as indicated by the related bug.

I've also found the crash log. It's in C:\ProgramData\Microsoft\Windows\WER\ReportQueue on my machine. Well, at least that's all I can find.
msg341526 - (view) Author: Jeremy Kloth (jkloth) * Date: 2019-05-06 15:22
So does that mean that simply doing:

import locale, time
locale.setlocale(locale.LC_ALL, 'de_DE')
time.localtime(time.time())

is enough to trigger the heap corruption?  If yes, then what is the output of:

import locale, time
locale.setlocale(locale.LC_ALL, 'de_DE')
time.strftime('%Z')
msg341529 - (view) Author: Charlie Clark (CharlieClark) Date: 2019-05-06 15:28
import time, locale
locale.setlocale(locale.LC_ALL, 'de_DE')
'de_DE'
time.strftime("%Z")
aborted (disconnected)
msg341551 - (view) Author: Jeremy Kloth (jkloth) * Date: 2019-05-06 15:57
Ok, now let's try it using the C runtime directly:

import ctypes, struct
libc = ctypes.cdll.msvcrt
buf = ctypes.create_string_buffer(1024)
tm = struct.pack('9i', 2019, 5, 6, 9, 50, 4, 0, 126, 1)
print('count:', libc.strftime(buf, 1024, b'%Z', tm))
print('value:', buf.value)
msg341552 - (view) Author: Charlie Clark (CharlieClark) Date: 2019-05-06 15:58
import ctypes, struct
libc = ctypes.cdll.msvcrt
buf = ctypes.create_string_buffer(1024)
tm = struct.pack('9i', 2019, 5, 6, 9, 50, 4, 0, 126, 1)
print('count:', libc.strftime(buf, 1024, b'%Z', tm))
print('value:', buf.value)

count: 28
value: b'Mitteleurop\xe4ische Sommerzeit'
msg341561 - (view) Author: Jeremy Kloth (jkloth) * Date: 2019-05-06 16:35
Oops, I forgot to add in my snippet, the setlocale() call prior to calling the C strftime() function.  So an updated test:

import locale
locale.setlocale(locale.LC_ALL, 'de_DE')
import ctypes, struct
libc = ctypes.cdll.msvcrt
buf = ctypes.create_string_buffer(1024)
tm = struct.pack('9i', 2019, 5, 6, 9, 50, 4, 0, 126, 1)
print('count:', libc.strftime(buf, 1024, b'%Z', tm))
print('value:', buf.value)
wbuf = ctypes.create_unicode_buffer(1024)
print('count:', libc.wcsftime(wbuf, 1024, '%Z', tm))
print('value:', wbuf.value)
print('count:', libc.mbstowcs(wbuf, buf, 1024))
print('value:', wbuf.value)
msg341562 - (view) Author: Charlie Clark (CharlieClark) Date: 2019-05-06 16:45
import ctypes, struct
libc = ctypes.cdll.msvcrt
buf = ctypes.create_string_buffer(1024)
tm = struct.pack('9i', 2019, 5, 6, 9, 50, 4, 0, 126, 1)
print('count:', libc.strftime(buf, 1024, b'%Z', tm))
print('value:', buf.value)
wbuf = ctypes.create_unicode_buffer(1024)
print('count:', libc.wcsftime(wbuf, 1024, '%Z', tm))
print('value:', wbuf.value)
print('count:', libc.mbstowcs(wbuf, buf, 1024))
print('value:', wbuf.value)

count: 28
value: b'Mitteleurop\xe4ische Sommerzeit'
count: 28
value: Mitteleuropäische Sommerzeit
count: 28
value: Mitteleuropäische Sommerzeit
msg341636 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2019-05-06 20:09
> libc = ctypes.cdll.msvcrt

That's the private CRT of Windows, not the Universal CRT for applications. In a release build (python.exe), use ctypes.CDLL('ucrtbase', use_errno=True). In a debug build (python_d.exe), use ctypes.CDLL('ucrtbased', use_errno=True).

I suppose we should use API sets [1] for the release build, such as "api-ms-win-crt-locale-l1-1-0" and 
"api-ms-win-crt-time-l1-1-0". But they resolve to "ucrtbase". 

[1]: https://docs.microsoft.com/en-us/uwp/win32-and-com/win32-extension-apis
msg341669 - (view) Author: Jeremy Kloth (jkloth) * Date: 2019-05-07 02:22
Thanks for the reminder Eryk Sun.  This means the test needs to be run yet one more time :)

import ctypes, locale, struct
crt_time = ctypes.CDLL('api-ms-win-crt-time-l1-1-0', use_errno=True)
locale.setlocale(locale.LC_ALL, 'de_DE')
buf = ctypes.create_string_buffer(1024)
tm = struct.pack('9i', 2019, 5, 6, 9, 50, 4, 0, 126, 1)
print('count:', crt_time.strftime(buf, 1024, b'%Z', tm))
print('value:', buf.value)
wbuf = ctypes.create_unicode_buffer(1024)
print('count:', crt_time.wcsftime(wbuf, 1024, '%Z', tm))
print('value:', wbuf.value)
crt_convert = ctypes.CDLL('api-ms-win-crt-convert-l1-1-0', use_errno=True)
print('count:', crt_convert.mbstowcs(wbuf, buf, 1024))
print('value:', wbuf.value)
msg341698 - (view) Author: Charlie Clark (CharlieClark) Date: 2019-05-07 08:39
The code crashes on this line:
print('count:', crt_time.strftime(buf, 1024, b'%Z', tm))
msg341754 - (view) Author: Jeremy Kloth (jkloth) * Date: 2019-05-07 15:39
Thanks for your patience with this Charlie, but please try another run this time without the strftime() and mbstowcs() calls.  Honest, we are getting closer!
msg341755 - (view) Author: Charlie Clark (CharlieClark) Date: 2019-05-07 15:42
If the process crashes at the first print statement, I'm not sure how I can run the tests. Or should I try them separately?
msg341760 - (view) Author: Jeremy Kloth (jkloth) * Date: 2019-05-07 15:57
You can safely execute each line individually (omitting the aforementioned count/value pairs) or depending on how the copy/paste is being done, just paste the script into a text editor (notepad) and comment out those lines.  Then copy-paste that modified script.

This time I'm really attempting to see if the internal encoding/decoding done in the CRT strftime() is failing somehow (it calls wcsftime).
msg341762 - (view) Author: Charlie Clark (CharlieClark) Date: 2019-05-07 16:02
print('count:', crt_time.wcsftime(wbuf, 1024, '%Z', tm)) also fails
but
crt_convert = ctypes.CDLL('api-ms-win-crt-convert-l1-1-0', use_errno=True)
print('count:', crt_convert.mbstowcs(wbuf, buf, 1024))
seems to work okay.
msg341773 - (view) Author: Jeremy Kloth (jkloth) * Date: 2019-05-07 16:25
Here is another test, this time removing Python from the equation (mostly :)

import ctypes, struct
crt_locale = ctypes.CDLL('api-ms-win-crt-locale-l1-1-0', use_errno=True)
crt_time = ctypes.CDLL('api-ms-win-crt-time-l1-1-0', use_errno=True)
crt_locale._wsetlocale.restype = ctypes.c_wchar_p
print('old locale:', crt_locale._wsetlocale(0, 0))
tm = struct.pack('9i', 2019, 5, 6, 9, 50, 4, 0, 126, 1)
wbuf = ctypes.create_unicode_buffer(1024)
print('count:', crt_time.wcsftime(wbuf, 1024, '%Z', tm))
print('value:', wbuf.value)
print('new locale:', crt_locale._wsetlocale(0, 'de_DE'))
print('count:', crt_time.wcsftime(wbuf, 1024, '%Z', tm))
print('value:', wbuf.value)
msg341779 - (view) Author: Charlie Clark (CharlieClark) Date: 2019-05-07 17:05
And this is the result.

old locale: C
count: 28
value: Mitteleuropäische Sommerzeit
new locale: de_DE
count: -1
value:
Windows fatal exception: code 0xc0000374

Looks like

print('new locale:', crt_locale._wsetlocale(0, 'de_DE'))
print('count:', crt_time.wcsftime(wbuf, 1024, '%Z', tm))

is the culprit.
msg341781 - (view) Author: Jeremy Kloth (jkloth) * Date: 2019-05-07 17:12
Final test, this time, no Python what so ever.  I've added a zip containing a simple C program (source and .exe) that performs the same test.

The output should be similar to:

--------
The current locale is now: C
The time zone is: 'Mountain Daylight Time' (22 characters)
--------
The updated locale is now: de_DE
The time zone is: 'Mountain Daylight Time' (22 characters)
--------

If this crashes, the problem is then within the CRT itself :(
msg341784 - (view) Author: Charlie Clark (CharlieClark) Date: 2019-05-07 17:18
This is the result
\issue36792>test.exe
--------
The current locale is now: C
The time zone is: 'Mitteleuropõische Sommerzeit' (28 characters)
--------
The updated locale is now: de_DE
The time zone is: '' (-1 characters)
--------
NB something is wrong with that string. Should be Mitteleuropäsiche Sommerzeit but that could just be an encoding thing.
msg341790 - (view) Author: Jeremy Kloth (jkloth) * Date: 2019-05-07 17:42
Thanks again!  I will have some more tests for you to try tomorrow as I am out of time for today.

I'm currently of the belief that there is something Python is going to have to do to work around an issue within the CRT, but more testing will prove that theory.
msg342045 - (view) Author: Jeremy Kloth (jkloth) * Date: 2019-05-10 02:31
I've added another test executable (issue36792-2.zip) which should bring some insight into where things are going wrong.  Please run and post the results.
msg342232 - (view) Author: Jeremy Kloth (jkloth) * Date: 2019-05-12 01:02
I have managed to setup a VM that can reproduce the error.  Unfortunately, the error (heap corruption) is coming from within the UCRT.  Attempting to work around that, I came across another error in the UCRT.

Due to these errors in all available UCRT versions, the only available solution is to re-implement %Z handling in Python's implementation of strftime().

@steve.dower: If you could forward the following information to the interested parties for faster response, it would be great.

Everyone else, what follows contains UCRT internal details which is of no interest to Python.

In wcsftime() [ucrt\time\wcsftime.cpp, line 1018], the call to _mbstowcs_s_l() doesn't handle the error case of EILSEQ.  This error happens when the tzname (originally set by __tzset() from the Unicode name in the registry encoded to the ACP) contains an undecodable character.  In the 'de_DE' case, \u00E4.  _mbstowcs_s_l returns without changing `wnum` (which is 0).  The 'Z' case then blindly subtracts 1 (wnum=-1) and advances the string pointer, now pointing *before* the start when the format is just L"%Z" (like Python has).  Now when wcsftime() is returning it stores a '\0' at that current position thus causing a heap corruption.

Ok, no deal breaker, we can just update the _tzname array when we call setlocale().  Alas, another error :(  If _tzset() is called a second time, it fails when encoding the Unicode name due to an uninitialized variable.  In tzset_from_system_nolock() [ucrt\time\tzset.cpp, line 158], the function __acrt_WideCharToMultiByte() [appcrt\locale\widechartomultibyte.cpp] does not set the `lpUsedDefaultChar` flag when FALSE.  This is only an issue, it seems, on subsequent calls as `used_default_char` can be non-zero prior to the call.  A non-zero value there, causes the tzname to be set to the empty string even if the conversion succeeded.
msg342923 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-05-20 15:51
I've received a detailed response from the UCRT team, and there are a few pieces here.

* the fact that tzname is cached in ACP is known and will be fixed
* the decoding bug is real, but it's due to the experimental UTF-8 support
* the experimental UTF-8 support was enabled because "de_DE" is not a known Windows locale name - try with "de-DE"

Perhaps it would be easy to do the replacement of underscores with hyphens on Windows in this function? I think that's safe enough, yes?
msg342924 - (view) Author: Charlie Clark (CharlieClark) Date: 2019-05-20 16:02
I can confirm that using "de-DE" does indeed avoid the crash.
msg342966 - (view) Author: Jeremy Kloth (jeremy.kloth) Date: 2019-05-21 00:19
> * the experimental UTF-8 support was enabled because "de_DE" is not a
> known Windows locale name - try with "de-DE"
>
> Perhaps it would be easy to do the replacement of underscores with hyphens
> on Windows in this function? I think that's safe enough, yes?
>

Even some well known locale names still use the utf-8 code page.  Most seem
to uncommon, but at least es-BR (Brazil) does and would still fall victim
to these UCRT bugs.

>
History
Date User Action Args
2019-08-27 06:20:23Carsten Fuchssetnosy: + Carsten Fuchs
2019-05-21 00:19:52jeremy.klothsetnosy: + jeremy.kloth
messages: + msg342966
2019-05-20 16:02:58CharlieClarksetmessages: + msg342924
2019-05-20 15:51:44steve.dowersetmessages: + msg342923
2019-05-12 01:02:14jklothsetmessages: + msg342232
2019-05-10 02:31:46jklothsetfiles: + issue36792-2.zip

messages: + msg342045
2019-05-07 17:42:57jklothsetmessages: + msg341790
2019-05-07 17:18:15CharlieClarksetmessages: + msg341784
2019-05-07 17:12:45jklothsetfiles: + issue36792.zip

messages: + msg341781
2019-05-07 17:05:57CharlieClarksetmessages: + msg341779
2019-05-07 16:25:08jklothsetmessages: + msg341773
2019-05-07 16:02:37CharlieClarksetmessages: + msg341762
2019-05-07 15:57:10jklothsetmessages: + msg341760
2019-05-07 15:42:24CharlieClarksetmessages: + msg341755
2019-05-07 15:39:51jklothsetmessages: + msg341754
2019-05-07 08:40:03CharlieClarksetfiles: + WER9DB9.tmp.WERInternalMetadata.xml
2019-05-07 08:39:39CharlieClarksetfiles: + Report.wer

messages: + msg341698
2019-05-07 02:22:53jklothsetmessages: + msg341669
2019-05-06 20:09:43eryksunsetnosy: + eryksun
messages: + msg341636
2019-05-06 16:45:10CharlieClarksetmessages: + msg341562
2019-05-06 16:35:05jklothsetmessages: + msg341561
2019-05-06 15:58:37CharlieClarksetmessages: + msg341552
2019-05-06 15:57:30jklothsetmessages: + msg341551
2019-05-06 15:43:13vstinnersettitle: zipfile.writestr causes a Python crash on Windows if the locale is set -> [Windows] time: crash on formatting time with de_DE locale
2019-05-06 15:28:55CharlieClarksetmessages: + msg341529
2019-05-06 15:22:53jklothsetmessages: + msg341526
2019-05-06 14:04:08CharlieClarksetfiles: + Report.wer

messages: + msg341503
2019-05-05 19:22:17jklothsetnosy: + jkloth
messages: + msg341473
2019-05-05 18:45:56CharlieClarksetmessages: + msg341470
2019-05-05 17:56:09Manjusakasetmessages: + msg341463
2019-05-05 17:51:12Dominik Geldmachersetmessages: + msg341462
2019-05-05 16:38:12Manjusakasetmessages: + msg341457
2019-05-05 16:06:38Dominik Geldmachersetnosy: + Dominik Geldmacher
messages: + msg341456
2019-05-05 13:23:38CharlieClarksetmessages: + msg341444
2019-05-05 13:13:59Manjusakasetmessages: + msg341442
2019-05-05 12:50:57CharlieClarksetmessages: + msg341438
2019-05-05 12:48:52Manjusakasetmessages: + msg341437
2019-05-05 12:45:39Manjusakasetnosy: + Manjusaka
messages: + msg341436
2019-05-04 13:25:05serhiy.storchakasetnosy: + vstinner
2019-05-04 12:33:18CharlieClarkcreate