classification
Title: [Windows] Python 3.7 initializes LC_CTYPE locale at startup, causing performance issue on msvcrt isdigit()
Type: performance Stage:
Components: IO Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Dragoljub, cgohlke, vstinner, xtreak
Priority: normal Keywords:

Created on 2018-11-08 22:56 by Dragoljub, last changed 2018-11-16 17:50 by Dragoljub.

Files
File name Uploaded Description Edit
test_isdigit.c cgohlke, 2018-11-13 01:33
setup.py cgohlke, 2018-11-13 01:36
Messages (21)
msg329490 - (view) Author: Dragoljub (Dragoljub) Date: 2018-11-08 22:56
xref: https://github.com/pandas-dev/pandas/issues/23516

Example:
import io
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(1000000, 10), columns=('COL{}'.format(i) for i in range(10)))
csv = io.StringIO(df.to_csv(index=False))
df2 = pd.read_csv(csv) #3.5X slower on Python 3.7.1

pd.read_csv() reads data at 30MB/sec on Python 3.7.1 while at 100MB/sec on Python 3.6.7.

This issue seems to be only present on Windows 10 Builds both x86 & x64. 

Possibly some IO changes in Python 3.7 could have contributed to this slowdown on Windows but not on Linux?
msg329545 - (view) Author: Dragoljub (Dragoljub) Date: 2018-11-09 18:29
After some more benchmarks I'm seeing this line of code called in Python 3.7 but not in Python 3.5:

{built-in method _thread.allocate_lock}
msg329573 - (view) Author: Dragoljub (Dragoljub) Date: 2018-11-10 00:12
After some more digging it appears that we see the 3.5x slowdown manifest in Python 3.7.0a4 and is not present in Python 3.7.0a3.

One guess is that 

https://docs.python.org/3.7/whatsnew/changelog.html#python-3-7-0-alpha-4

bpo-29240: Add a new UTF-8 mode: implementation of the PEP 540

may contribute to this slowdown on windows. Is there a way to ensure we disable any native to UTF conversion that may be happening in Python 3.7.a4?
msg329575 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python triager) Date: 2018-11-10 00:19
From the PEP 540

This mode is off by default, but is automatically activated when using the "POSIX" locale.

Add the -X utf8 command line option and PYTHONUTF8 environment variable to control UTF-8 Mode.

https://docs.python.org/3.7/using/cmdline.html#envvar-PYTHONUTF8

I think you can set it to 0 from the docs yo see if it has any effect.
msg329587 - (view) Author: Dragoljub (Dragoljub) Date: 2018-11-10 02:20
I tried playing around with the UTF-8 mode settings but did not get a speed improvement.

After reading through the PEP it appears that on Windoes:

"To allow for better cross-platform binary portability and to adjust automatically to future changes in locale availability, these checks will be implemented at runtime on all platforms other than Windows, rather than attempting to determine which locales to try at compile time."

So if i'm understanding this correctly the locale coercion would not be controllable from Windows after Python is compiled?
msg329589 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python triager) Date: 2018-11-10 02:36
I have limited understanding of Windows and I don't have access to a Windows machine to check this out. I am adding Victor who implemented the PEP and might help here. There also seems to be PYTHONLEGACYWINDOWSFSENCODING for windows specific use case. Some more notes on the original issue https://bugs.python.org/issue29240#msg285278.

https://bugs.python.org/issue29240#msg285325

> Handle PYTHONLEGACYWINDOWSFSENCODING: this env var now disables the UTF-8 mode and has the priority over -X utf8 and PYTHONUTF8
msg329592 - (view) Author: Dragoljub (Dragoljub) Date: 2018-11-10 05:14
I tested this at runtime with sys._enablelegacywindowsfsencoding()

Also this was new in 3.6 and Py 3.6 does not have the slowdown issue.

New in version 3.6: See PEP 529 for more details.
msg329641 - (view) Author: Dragoljub (Dragoljub) Date: 2018-11-10 20:02
@cgohlke compared the statement df2 = pd.read_csv(csv) on Python 3.7.0a3 and a4 in the Visual Studio profiler. The culprit is the isdigit function called in the parsers extension module. On 3.7.0a3 the function is fast at ~8% of samples. On 3.7.0a4 the function is slow at ~64% samples because it calls the _isdigit_l function, which seems to update and restore the locale in the current thread every time...

3.7.0a3:
Function Name	Inclusive Samples	Exclusive Samples	Inclusive Samples %	Exclusive Samples %	Module Name
 + [parsers.cp37-win_amd64.pyd]	705	347	28.52%	14.04%	parsers.cp37-win_amd64.pyd
   isdigit	207	207	8.37%	8.37%	ucrtbase.dll
 - _errno	105	39	4.25%	1.58%	ucrtbase.dll
   toupper	24	24	0.97%	0.97%	ucrtbase.dll
   isspace	21	21	0.85%	0.85%	ucrtbase.dll
   [python37.dll]	1	1	0.04%	0.04%	python37.dll
3.7.0a4:
Function Name	Inclusive Samples	Exclusive Samples	Inclusive Samples %	Exclusive Samples %	Module Name
 + [parsers.cp37-win_amd64.pyd]	8,613	478	83.04%	4.61%	parsers.cp37-win_amd64.pyd
 + isdigit	6,642	208	64.04%	2.01%	ucrtbase.dll
 + _isdigit_l	6,434	245	62.03%	2.36%	ucrtbase.dll
 + _LocaleUpdate::_LocaleUpdate	5,806	947	55.98%	9.13%	ucrtbase.dll
 + __acrt_getptd	2,121	1,031	20.45%	9.94%	ucrtbase.dll
   FlsGetValue	647	647	6.24%	6.24%	KernelBase.dll
 - RtlSetLastWin32Error	296	235	2.85%	2.27%	ntdll.dll
   _guard_dispatch_icall_nop	101	101	0.97%	0.97%	ucrtbase.dll
   GetLastError	46	46	0.44%	0.44%	KernelBase.dll
 + __acrt_update_multibyte_info	1,475	246	14.22%	2.37%	ucrtbase.dll
 - __crt_state_management::get_current_state_index	1,229	513	11.85%	4.95%	ucrtbase.dll
 + __acrt_update_locale_info	1,263	235	12.18%	2.27%	ucrtbase.dll
 - __crt_state_management::get_current_state_index	1,028	429	9.91%	4.14%	ucrtbase.dll
   _ischartype_l	383	383	3.69%	3.69%	ucrtbase.dll
msg329743 - (view) Author: Dragoljub (Dragoljub) Date: 2018-11-12 16:34
@Vstinner,

Any way you can help test out a config setting to avoid the locale changes on Python 2.7.0a4+? It is currently causing the isdigit() low-level function to call the local-specific function on windows and update locals each call slowing down CSV Paring on Windows 3.5X

How can we configure python to not be different than 3.6.7 when it come to locale behavior?
msg329745 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-11-12 16:50
Can someone please try to write an example which only uses the stdlib?

> The culprit is the isdigit function called in the parsers extension module.

What are these extensions? Where do them come from?

> Any way you can help test out a config setting to avoid the locale changes on Python 3.7.0a4+?

(I fixed 2.7 => 3.7)

I don't understand which "locale changes" you are talking about. You can change the locale using locale.setlocale().
msg329786 - (view) Author: Christoph Gohlke (cgohlke) Date: 2018-11-13 01:26
> Can someone please try to write an example which only uses the stdlib?

The simplest is to compare performance of the `windll.LoadLibrary('API-MS-WIN-CRT-STRING-L1-1-0.DLL')` function on Python 3.7.0a3 and 3.7.0a4, but that will mostly measure Python/ctypes overhead. I will post a minimal C extension instead.


> What are these extensions? Where do them come from?

The `isdigit` function is from the UCRT. The `parsers` Cython/C extension is part of the pandas wheel on PyPI. The context for this issue is at https://github.com/pandas-dev/pandas/issues/23516


> I don't understand which "locale changes" you are talking about. You can change the locale using locale.setlocale().

The `UCRT.isdigit` function, when run on Python >=3.7.0a4, calls the `_isdigit_l` function, which calls `_LocaleUpdate::_LocaleUpdate` (see the VS profiler output).
msg329787 - (view) Author: Dragoljub (Dragoljub) Date: 2018-11-13 01:39
Here is a simple pure python example:

digits = ''.join([str(i) for i in range(10)]*10000000)
%timeit digits.isdigit() # --> 2X+ slower on python 3.7.1

Basically in Pandas C-code parser we call the isdigit() function for each number that is to be parsed. so 12345.6789 calls isdigt() 9 times to determine if this is a digit character that can be converted to a float. The problem is in the latest version of Python with locale updates isdigit() takes a locale argument that seems to be passed over and over slowing down this check. Is it possible that we disable any local passing from Python down to lower-level C code, or simply set the default locale to 'C' to keep it from thrashing?
msg329789 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-11-13 01:45
> digits = ''.join([str(i) for i in range(10)]*10000000)
> %timeit digits.isdigit() # --> 2X+ slower on python 3.7.1

This code calls:

* (Python) str.isdigit()
* unicode_isdigit_impl()
* _PyUnicode_IsDigit()
* _PyUnicode_ToDigit() which uses Python internal Unicode database

This code doesn't depend on locales at all. It's pure Unicode.
msg329790 - (view) Author: Christoph Gohlke (cgohlke) Date: 2018-11-13 01:45
I attached a minimal C extension module that can be used to demonstrate the performance degradation from Python 3.7.0a3 to 3.7.0a4.

Build the extension with `py setup.py build_ext --inplace`, then run the following code on Python 3.7.0a3 to 3.7.0a4:

```
import time
from test_isdigit import test_isdigit

start_time = time.time()
test_isdigit()
print(time.time() - start_time)
```

On my Windows 10 Pro WS system, the timings are:

Python 3.7.0a3: ~0.0156
Python 3.7.0a4: ~0.3281


I would expect that other locale aware functions in the UCRT are also affected but I have not tested that.
msg329791 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-11-13 01:47
test_isdigit.c: Can you try to call locale.setlocale(locale.LC_CTYPE, "") before running your benchmark on Python 3.7.0?

This issue may be related to bpo-34485.
msg329792 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-11-13 01:50
> This issue may be related to bpo-34485.

I'm thinking to:

New changeset 177d921c8c03d30daa32994362023f777624b10d by Victor Stinner in branch 'master':
bpo-34485, Windows: LC_CTYPE set to user preference (GH-8988)
https://github.com/python/cpython/commit/177d921c8c03d30daa32994362023f777624b10d

Oh, I only made this change in the future Python 3.8 (master branch). So this change may be unrelated.

Note: Right now, my Windows VM is broken, so I cannot investigate this performance issue which seems to be specific to the msvcrt (libc of Microsoft VisualStudio).
msg329793 - (view) Author: Christoph Gohlke (cgohlke) Date: 2018-11-13 01:53
> test_isdigit.c: Can you try to call locale.setlocale(locale.LC_CTYPE, "") before running your benchmark on Python 3.7.0?

Yes, that slows down Python 3.7.0a3 to the 3.7.0a4 level.
msg329828 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-11-13 11:42
> Yes, that slows down Python 3.7.0a3 to the 3.7.0a4 level.

Ok, so something calls setlocale(LC_CTYPE, "") or setlocale(LC_ALL, "") in Python 3.7.0. I'm not interested to dig the Git history. It doesn't really matter at this point.

Can you try to get the current LC_CTYPE locale on Python 3.6 or 3.7.0a3? Example:

$ python3 -c 'import _locale; print(_locale.setlocale(_locale.LC_CTYPE, None))'
fr_FR.utf8

Or using the locale module (it should give the same result):

$ python3 -c 'import locale; print(locale.setlocale(locale.LC_CTYPE, None))'
fr_FR.utf8

Can also also try on 3.7.0a4 and newer (ex: Python 3.7.1): locale.setlocale(locale.LC_ALL, "C"), to see it does workaround your performance issue? I don't recall if "C" or "POSIX" locales are supported on Windows.
msg329854 - (view) Author: Dragoljub (Dragoljub) Date: 2018-11-13 17:31
On Python 3.7.1 and Windows 10:

I attempted locale.setlocale(locale.LC_ALL, "POSIX") --> Errors Out
---------------------------------------------------------------------------
Error                                     Traceback (most recent call last)
<ipython-input-8-f384c02ab238> in <module>
----> 1 locale.setlocale(locale.LC_ALL, "POSIX")

D:\Python37\lib\locale.py in setlocale(category, locale)
    602         # convert to string
    603         locale = normalize(_build_localename(locale))
--> 604     return _setlocale(category, locale)
    605 
    606 def resetlocale(category=LC_ALL):

Error: unsupported locale setting


I was able to set the loacle to "C" but that does not improve the parsing performance.

locale.setlocale(locale.LC_ALL, "C") --> returns 'C'
msg329855 - (view) Author: Dragoljub (Dragoljub) Date: 2018-11-13 17:42
This is the default LC_CTYPE locale type I see on Windows10 and Python 3.7.1 vs 3.5.2: Same: 'C'


'3.7.1 (v3.7.1:260ec2c36a, Oct 20 2018, 14:57:15) [MSC v.1915 64 bit (AMD64)]'

import _locale
_locale.setlocale(_locale.LC_CTYPE, None)

'C'


'3.5.2 (v3.5.2:4def2a2901a5, Jun 25 2016, 22:18:55) [MSC v.1900 64 bit (AMD64)]'

import _locale
_locale.setlocale(_locale.LC_CTYPE, None)

'C'
msg330007 - (view) Author: Dragoljub (Dragoljub) Date: 2018-11-16 17:50
Do we know if its possible to prevent the initialize LC_CTYPE on startup? Is there some combination of ENV-Var or CMD-Args that can avoid this slowdown on Windows?

What are the next step to get the issue assigned?
History
Date User Action Args
2018-11-16 17:50:36Dragoljubsetmessages: + msg330007
2018-11-13 17:42:08Dragoljubsetmessages: + msg329855
2018-11-13 17:31:32Dragoljubsetmessages: + msg329854
2018-11-13 11:43:46vstinnersettitle: Pandas read_csv() is 3.5X Slower on Python 3.7.1 vs Python 3.6.7 & 3.5.2 On Windows 10 -> [Windows] Python 3.7 initializes LC_CTYPE locale at startup, causing performance issue on msvcrt isdigit()
2018-11-13 11:42:37vstinnersetmessages: + msg329828
2018-11-13 01:53:12cgohlkesetmessages: + msg329793
2018-11-13 01:50:16vstinnersetmessages: + msg329792
2018-11-13 01:47:22vstinnersetmessages: + msg329791
2018-11-13 01:45:32cgohlkesetmessages: + msg329790
2018-11-13 01:45:23vstinnersetmessages: + msg329789
2018-11-13 01:39:54Dragoljubsetmessages: + msg329787
2018-11-13 01:36:51cgohlkesetfiles: + setup.py
2018-11-13 01:33:34cgohlkesetfiles: + test_isdigit.c
2018-11-13 01:26:31cgohlkesetnosy: + cgohlke
messages: + msg329786
2018-11-12 16:50:20vstinnersetmessages: + msg329745
2018-11-12 16:34:13Dragoljubsetmessages: + msg329743
2018-11-10 20:02:15Dragoljubsetmessages: + msg329641
2018-11-10 05:14:39Dragoljubsetmessages: + msg329592
2018-11-10 02:36:27xtreaksetnosy: + vstinner
messages: + msg329589
2018-11-10 02:20:08Dragoljubsetmessages: + msg329587
2018-11-10 00:19:43xtreaksetmessages: + msg329575
2018-11-10 00:12:18Dragoljubsetmessages: + msg329573
2018-11-09 18:29:24Dragoljubsetmessages: + msg329545
2018-11-09 03:34:36xtreaksetnosy: + xtreak
2018-11-08 22:56:21Dragoljubcreate