classification
Title: _warnings: patch to avoid conversions from/to UTF-8
Type: Stage: resolved
Components: Unicode Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, ezio.melotti, haypo, python-dev, serhiy.storchaka, vajrasky, zach.ware
Priority: normal Keywords: patch

Created on 2013-10-28 17:43 by haypo, last changed 2013-11-04 10:29 by python-dev. This issue is now closed.

Files
File name Uploaded Description Edit
warnings.patch haypo, 2013-10-28 17:43 review
remove_compiler_warning_pyunicode_compare_with_ascii_string.patch vajrasky, 2013-10-30 02:19 review
issue19424-windows-warnings.diff zach.ware, 2013-10-30 17:15 review
Messages (16)
msg201558 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-10-28 17:43
Attached patch removes usage of _PyUnicode_AsString() to not convert strings from/to UTF-8.
msg201655 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-10-29 18:53
I don't see a benefit from this patch. _PyUnicode_AsString() is very fast in most cases (because source lines are mostly ASCII only). On other hand, the patch replaces one-time _PyUnicode_AsString() by multiple less effective (even for ASCII strings) operations.
msg201657 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-10-29 19:19
> I don't see a benefit from this patch.

Oh, sorry, I forgot to explain the motivation. Performances of the warnings module are not critical module. The motivation here is to avoid to encoding string to UTF-8 for correctness. For example, _PyUnicode_AsString(filename) fails if the filename contains a surrogate character.

>>> warnings.warn_explicit("text", RuntimeError, "filename", 5)
filename:5: RuntimeError: text
>>> warnings.warn_explicit("text", RuntimeError, "filename\udc80", 5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode character '\udc80' in position 8: surrogates not allowed


Another example where a string to encoded to UTF-8 and decoded from UTF-8 a few instructions later:

PyObject *to_str = PyObject_Str(item);
err_str = _PyUnicode_AsString(to_str);
...
PyErr_Format(PyExc_RuntimeError,  "...%s", err_str);

Using "%R" avoids any encoding conversion.
msg201680 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-10-29 21:47
Well, it looks reasonable. But an action should be ASCII only string. So perhaps we should first check PyUnicode_IS_ASCII() and then use PyUnicode_1BYTE_DATA() and strcpy().
msg201688 - (view) Author: Roundup Robot (python-dev) Date: 2013-10-29 22:32
New changeset 34e166d60f37 by Victor Stinner in branch 'default':
Issue #19424: Optimize PyUnicode_CompareWithASCIIString()
http://hg.python.org/cpython/rev/34e166d60f37
msg201689 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-10-29 22:32
"Well, it looks reasonable. But an action should be ASCII only string. So perhaps we should first check PyUnicode_IS_ASCII() and then use PyUnicode_1BYTE_DATA() and strcpy()."

Such optimizations in warnings seem overkill, performances are not critical in this module.

I optimized PyUnicode_CompareWithASCIIString() instead.
msg201691 - (view) Author: Roundup Robot (python-dev) Date: 2013-10-29 22:44
New changeset c7326aa0b69c by Victor Stinner in branch 'default':
Issue #19424: Fix the warnings module to accept filename containing surrogate
http://hg.python.org/cpython/rev/c7326aa0b69c
msg201692 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-10-29 22:46
@Serhiy: Thanks for your review. I modified warnings.warn_explicit() to reject types different than str for the filename. I also removed the useless optimization for PyUnicode_Substring() when i=0.
msg201694 - (view) Author: Roundup Robot (python-dev) Date: 2013-10-29 22:58
New changeset 05e8dde3229c by Victor Stinner in branch 'default':
Issue #19424: Fix test_warnings for locale encoding unable to encode
http://hg.python.org/cpython/rev/05e8dde3229c
msg201697 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-10-29 23:37
Oops, while fixing this issue, I found a new issue related to warnings: I opened issue #19442 "Python crashes when a warning is emitted during shutdown".

This issue can now be fixed.
msg201703 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-10-30 02:19
Victor, I found out that this commit http://hg.python.org/cpython/rev/34e166d60f37 gives me compiler warning.

Objects/unicodeobject.c: In function ‘PyUnicode_CompareWithASCIIString’:
Objects/unicodeobject.c:10583:22: warning: pointer targets in initialization differ in signedness [-Wpointer-sign]
gcc -pthread -c -Wno-unused-result -Werror=declaration-after-statement -g -O0 -Wall -Wstrict-prototypes     -I. -IInclude -I./Include    -DPy_BUILD_CORE \

Attached the patch to remove the compiler warning.
msg201749 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2013-10-30 17:15
Adding to Vajrasky's report, the same commit also adds 3 warnings when building on Windows:

..\Objects\unicodeobject.c(10588): warning C4018: '>' : signed/unsigned mismatch [P:\Projects\OSS\Python\cpython\PCbuild\pythoncore.vcxproj]
..\Objects\unicodeobject.c(10592): warning C4018: '>' : signed/unsigned mismatch [P:\Projects\OSS\Python\cpython\PCbuild\pythoncore.vcxproj]
..\Objects\unicodeobject.c(10594): warning C4018: '>' : signed/unsigned mismatch [P:\Projects\OSS\Python\cpython\PCbuild\pythoncore.vcxproj]

Vajrasky's patch doesn't fix the Windows warnings (but doesn't create any new ones either); the attached patch does (but likely doesn't do anything for the other).

I don't know nearly enough C to say whether my patch is any good or not, just enough to say it compiles, doesn't break anything immediately obvious, and removes the warnings on Windows.
msg201752 - (view) Author: Roundup Robot (python-dev) Date: 2013-10-30 17:27
New changeset 52ec6a3eeda5 by Victor Stinner in branch 'default':
Issue #19424: Fix a compiler warning
http://hg.python.org/cpython/rev/52ec6a3eeda5
msg202013 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-11-03 11:55
Py_ssize_t is signed long. size_it is unsigned long. In this case, I suppose we should avoid unsigned as much as possible in comparison with signed.

So I think Zachary's patch is reasonable.

What do you think, Victor?
msg202022 - (view) Author: Roundup Robot (python-dev) Date: 2013-11-03 12:56
New changeset 2ed8d500e113 by Victor Stinner in branch 'default':
Issue #19424: Fix a compiler warning on comparing signed/unsigned size_t
http://hg.python.org/cpython/rev/2ed8d500e113
msg202113 - (view) Author: Roundup Robot (python-dev) Date: 2013-11-04 10:29
New changeset 494f736f5945 by Victor Stinner in branch 'default':
Issue #19424: PyUnicode_CompareWithASCIIString() normalizes memcmp() result
http://hg.python.org/cpython/rev/494f736f5945
History
Date User Action Args
2013-11-04 10:29:09python-devsetmessages: + msg202113
2013-11-03 12:56:27python-devsetmessages: + msg202022
2013-11-03 11:55:25vajraskysetmessages: + msg202013
2013-10-30 17:27:24python-devsetmessages: + msg201752
2013-10-30 17:15:12zach.waresetfiles: + issue19424-windows-warnings.diff
nosy: + zach.ware
messages: + msg201749

2013-10-30 02:19:29vajraskysetfiles: + remove_compiler_warning_pyunicode_compare_with_ascii_string.patch
nosy: + vajrasky
messages: + msg201703

2013-10-29 23:38:04Arfreversetstage: resolved
2013-10-29 23:37:08hayposetstatus: open -> closed
resolution: fixed
messages: + msg201697
2013-10-29 22:58:35python-devsetmessages: + msg201694
2013-10-29 22:46:47hayposetmessages: + msg201692
2013-10-29 22:44:44python-devsetmessages: + msg201691
2013-10-29 22:32:58hayposetmessages: + msg201689
2013-10-29 22:32:22python-devsetnosy: + python-dev
messages: + msg201688
2013-10-29 21:47:00serhiy.storchakasetmessages: + msg201680
2013-10-29 19:19:01hayposetmessages: + msg201657
2013-10-29 18:53:23serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg201655
2013-10-28 23:10:23Arfreversetnosy: + Arfrever
2013-10-28 17:43:22haypocreate