New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
_warnings: patch to avoid conversions from/to UTF-8 #63623
Comments
Attached patch removes usage of _PyUnicode_AsString() to not convert strings from/to UTF-8. |
I don't see a benefit from this patch. _PyUnicode_AsString() is very fast in most cases (because source lines are mostly ASCII only). On other hand, the patch replaces one-time _PyUnicode_AsString() by multiple less effective (even for ASCII strings) operations. |
Oh, sorry, I forgot to explain the motivation. Performances of the warnings module are not critical module. The motivation here is to avoid to encoding string to UTF-8 for correctness. For example, _PyUnicode_AsString(filename) fails if the filename contains a surrogate character. >>> warnings.warn_explicit("text", RuntimeError, "filename", 5)
filename:5: RuntimeError: text
>>> warnings.warn_explicit("text", RuntimeError, "filename\udc80", 5)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode character '\udc80' in position 8: surrogates not allowed Another example where a string to encoded to UTF-8 and decoded from UTF-8 a few instructions later: PyObject *to_str = PyObject_Str(item);
err_str = _PyUnicode_AsString(to_str);
...
PyErr_Format(PyExc_RuntimeError, "...%s", err_str); Using "%R" avoids any encoding conversion. |
Well, it looks reasonable. But an action should be ASCII only string. So perhaps we should first check PyUnicode_IS_ASCII() and then use PyUnicode_1BYTE_DATA() and strcpy(). |
New changeset 34e166d60f37 by Victor Stinner in branch 'default': |
"Well, it looks reasonable. But an action should be ASCII only string. So perhaps we should first check PyUnicode_IS_ASCII() and then use PyUnicode_1BYTE_DATA() and strcpy()." Such optimizations in warnings seem overkill, performances are not critical in this module. I optimized PyUnicode_CompareWithASCIIString() instead. |
New changeset c7326aa0b69c by Victor Stinner in branch 'default': |
@serhiy: Thanks for your review. I modified warnings.warn_explicit() to reject types different than str for the filename. I also removed the useless optimization for PyUnicode_Substring() when i=0. |
New changeset 05e8dde3229c by Victor Stinner in branch 'default': |
Oops, while fixing this issue, I found a new issue related to warnings: I opened issue bpo-19442 "Python crashes when a warning is emitted during shutdown". This issue can now be fixed. |
Victor, I found out that this commit http://hg.python.org/cpython/rev/34e166d60f37 gives me compiler warning. Objects/unicodeobject.c: In function ‘PyUnicode_CompareWithASCIIString’: Attached the patch to remove the compiler warning. |
Adding to Vajrasky's report, the same commit also adds 3 warnings when building on Windows: ..\Objects\unicodeobject.c(10588): warning C4018: '>' : signed/unsigned mismatch [P:\Projects\OSS\Python\cpython\PCbuild\pythoncore.vcxproj] Vajrasky's patch doesn't fix the Windows warnings (but doesn't create any new ones either); the attached patch does (but likely doesn't do anything for the other). I don't know nearly enough C to say whether my patch is any good or not, just enough to say it compiles, doesn't break anything immediately obvious, and removes the warnings on Windows. |
New changeset 52ec6a3eeda5 by Victor Stinner in branch 'default': |
Py_ssize_t is signed long. size_it is unsigned long. In this case, I suppose we should avoid unsigned as much as possible in comparison with signed. So I think Zachary's patch is reasonable. What do you think, Victor? |
New changeset 2ed8d500e113 by Victor Stinner in branch 'default': |
New changeset 494f736f5945 by Victor Stinner in branch 'default': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: