This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: time.strftime('%a'), ValueError: embedded null byte, in ko locale
Type: behavior Stage: resolved
Components: Library (Lib), Unicode, Windows Versions: Python 3.5
process
Status: closed Resolution: third party
Dependencies: Superseder:
Assigned To: Nosy List: belopolsky, eryksun, ezio.melotti, lemburg, loewis, paul.moore, scw, steve.dower, sy LEE, tim.golden, vstinner, zach.ware
Priority: normal Keywords:

Created on 2015-09-08 05:32 by sy LEE, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (5)
msg250157 - (view) Author: grizlupo (sy LEE) Date: 2015-09-08 05:32
>>> locale.setlocale(locale.LC_ALL, 'en')
'en'
>>> time.strftime('%a')
'Tue'
>>> locale.setlocale(locale.LC_ALL, 'ko')
'ko'
>>> time.strftime('%a')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: embedded null byte
>>>
msg250160 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-09-08 06:06
What is your OS?

On Ubuntu:
>>> import locale, time
>>> locale.setlocale(locale.LC_ALL, 'ko_KR.UTF-8')
'ko_KR.UTF-8'
>>> time.strftime('%a')
'화'
>>> locale.setlocale(locale.LC_ALL, 'ko_KR.eucKR')
'ko_KR.eucKR'
>>> time.strftime('%a')
'화'
>>> locale.setlocale(locale.LC_ALL, 'ko_KR')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/serhiy/py/cpython/Lib/locale.py", line 595, in setlocale
    return _setlocale(category, locale)
locale.Error: unsupported locale setting
>>> locale.setlocale(locale.LC_ALL, 'ko')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/serhiy/py/cpython/Lib/locale.py", line 595, in setlocale
    return _setlocale(category, locale)
locale.Error: unsupported locale setting
msg250176 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2015-09-08 09:55
It seems VC 14 has a bug here. In the new C runtime, strftime is implemented by calling wcsftime as follows:

    size_t const result = _Wcsftime_l(wstring.get(), maxsize, wformat.get(), timeptr, lc_time_arg, locale);
    if (result == 0)
        return 0;

    // Copy output from wide char string
    if (!WideCharToMultiByte(lc_time_cp, 0, wstring.get(), -1, string, static_cast<int>(maxsize), nullptr, nullptr))
    {
        __acrt_errno_map_os_error(GetLastError());
        return 0;
    }

    return result;

The WideCharToMultiByte call returns the number of bytes in the converted string, but strftime doesn't update the value of "result". 

This worked correctly in the old CRT. For example, in 3.4 built with VC 10:

    >>> sys.version_info[:2]
    (3, 4)
    >>> locale.setlocale(locale.LC_ALL, 'kor_kor') 
    'Korean_Korea.949'
    >>> time.strftime('%a')
    '\ud654'

Here's an overview of the problem in 3.5, stepped through in the debugger:

    >>> sys.version_info[:2]
    (3, 5)
    >>> locale.setlocale(locale.LC_ALL, 'ko')
    'ko'
    >>> time.strftime('%a')
    Breakpoint 0 hit
    ucrtbase!Wcsftime_l:
    000007fe`e9e6fd74 48895c2410      mov     qword ptr [rsp+10h],rbx ss:00000000`003df6d8=0000000000666ce0

wcsftime returns the output buffer length in wide characters:

    0:000> pt; r rax
    rax=0000000000000001

WideCharToMultiByte is called to convert the wide-character string to the locale encoding:

    0:000> pc
    ucrtbase!Strftime_l+0x17f:
    000007fe`e9e6c383 ff15dfa00200    call    qword ptr [ucrtbase!_imp_WideCharToMultiByte (000007fe`e9e96468)] ds:000007fe`
    e9e96468={KERNELBASE!WideCharToMultiByte (000007fe`fd631be0)}
    0:000> p
    ucrtbase!Strftime_l+0x185:
    000007fe`e9e6c389 85c0            test    eax,eax

This returns the length of the converted string (including the null):

    0:000> r rax
    rax=0000000000000003

But strftime ignores this value, and instead returns the wide-character string length, which gets passed to PyUnicode_DecodeLocaleAndSize:

    0:000> bp python35!PyUnicode_DecodeLocaleAndSize
    0:000> g
    Breakpoint 1 hit
    python35!PyUnicode_DecodeLocaleAndSize:
    00000000`5ec15160 4053            push    rbx
    0:000> r rdx
    rdx=0000000000000001

U+D654 was converted correctly to '\xc8\cad' (codepaged 949):

    0:000> db @rcx l3
    00000000`007e5d20  c8 ad 00                                         ...

However, since (str[len] != '\0'), PyUnicode_DecodeLocaleAndSize errors out as follows:

    0:000> bd 0,1; g
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: embedded null byte

It works as expected if the length is manually changed to 2:

    >>> time.strftime('%a')
    Breakpoint 1 hit
    python35!PyUnicode_DecodeLocaleAndSize:
    00000000`5ec15160 4053            push    rbx
    0:000> r rdx=2
    0:000> g
    '\ud654'

The string is null-terminated, so can time_strftime simply substitute PyUnicode_DecodeLocale in place of PyUnicode_DecodeLocaleAndSize?
msg253020 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2015-10-14 21:29
I can confirm that this is fixed in an upcoming Windows update:

Python 3.5.0 (v3.5.0:374f501f4567, Sep 13 2015, 02:27:37) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale, time
>>> locale.setlocale(locale.LC_ALL, 'ko')
'ko'
>>> time.strftime('%a')
'\uc218'
>>>
msg295208 - (view) Author: Shaun Walbridge (scw) * Date: 2017-06-05 19:56
For reference if anyone else still runs into this issue: the affected DLL is ucrtbase.dll, and the faulty version is 10.0.10240.0, which shipped with the 1507 release of Windows 10, the Windows 10 SDK, and Visual Studio 2015 RTM. This issue was resolved at the 1511 (
10.0.10586.212) release and later, along with Visual Studio 2015 Update 3, which can be installed on Windows 10 via Windows Update. On Windows 7 and 8.1, Windows update may update the files, but you also need to check for any local copies of the DLL in the same directory as the Python executable, as on these platforms per-application installs have priority over the copy within the Windows installation. 

Currently, the Python distributed with Conda environments (where Py3.5+ is used) are affected by this issue[1] because of their app-local deployments of these DLLs on Windows 7/8.1. Any application which similarly bundles the UCRT DLLs alongside its runtime will be also be affected. 

1. Conda issue filed at: https://github.com/ContinuumIO/anaconda-issues/issues/1974
History
Date User Action Args
2022-04-11 14:58:20adminsetgithub: 69211
2017-06-05 19:56:59scwsetnosy: + scw
messages: + msg295208
2015-10-14 21:29:07steve.dowersetstatus: open -> closed
resolution: third party
messages: + msg253020

stage: resolved
2015-09-08 09:59:32serhiy.storchakasetnosy: + belopolsky, - serhiy.storchaka
2015-09-08 09:55:47eryksunsetnosy: + paul.moore, tim.golden, eryksun, zach.ware, steve.dower
messages: + msg250176
components: + Windows
2015-09-08 06:06:58serhiy.storchakasetnosy: + serhiy.storchaka, lemburg, loewis
messages: + msg250160

components: + Library (Lib)
type: crash -> behavior
2015-09-08 05:32:07sy LEEcreate