This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Return singleton empty string in _PyUnicode_FromASCII
Type: enhancement Stage: patch review
Components: Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: serhiy.storchaka, vstinner, xiang.zhang
Priority: normal Keywords: patch

Created on 2016-10-09 15:49 by xiang.zhang, last changed 2022-04-11 14:58 by admin.

Files
File name Uploaded Description Edit
_PyUnicode_FromASCII.patch xiang.zhang, 2016-10-09 15:49 review
Messages (10)
msg278364 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-10-09 15:49
Right now in _PyUnicode_FromASCII, it only optimises for size 1 not size 0. All other places getting the same situation in unicodeobject.c do both.
msg278389 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-10-09 20:35
Additional check has a non-zero cost. What is the benefit of this change?
msg278398 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-10-10 02:44
The cost is really small (an integer compare vs memory malloc and copy). The advantages are fast path for empty strings and retaining consistency with other codes in unicodeobject.c. You can see other places use the same optimization, e.g. PyUnicode_FromUnicode, PyUnicode_FromUCS1, and some functions using unicode_result.
msg280141 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-06 15:44
The case for size=0 often is handled before calling _PyUnicode_FromASCII. In what circumstances _PyUnicode_FromASCII is called with size=0? Is this case covered by tests?
msg280188 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-11-07 05:23
IMHO, _PyUnicode_FromASCII is a private API and could be used in other places in future. We should not rely on the caller to check and return the singleton empty string.
msg281129 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-11-18 15:43
To reduce the memory footprint, it's important that we reuse internal Unicode singletons. For the empty string singleton, you have two choices:

* if length is equal to zero, return the singleton
* create a string and then call unicode_result()

unicode_result() should be used when the output length is not easy to compute or when the code is complex.

Here is the code is very simple and the output length is obvious.

Serhiy: "In what circumstances _PyUnicode_FromASCII is called with size=0?"

I checked the current code: I'm unable to see any function which can call _PyUnicode_FromASCII() with size=0.


Xiang: "IMHO, _PyUnicode_FromASCII is a private API and could be used in other places in future. We should not rely on the caller to check and return the singleton empty string."

I agree. Xiang's patch is short and makes sense, so I vote +1 on it.

I let Serhiy decides if the patch should be merged or not.

If we decide to reject the patch, I suggest to add the following code at the beginning of _PyUnicode_FromASCII:

/* Detect missed optimization */
assert(size != 0);
msg281140 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-18 16:29
I know that _PyUnicode_FromASCII() is called with size=0. But most callers call it only with size!=0. I didn't investigate further. We should know what is the benefit of the patch before committing it. Just "for consistency" is a weak argument itself.
msg281142 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-11-18 16:33
> We should know what is the benefit of the patch before committing it.

The purpose of the patch is to use the empty string singleton to reduce the memory footprint, instead of creating multiple empty (Unicode) string objects.
msg281147 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-18 16:52
My question is simple: in what circumstances the patch has an effect?
msg281148 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-11-18 17:12
> My question is simple: in what circumstances the patch has an effect?

My original intention is that there is no need for the caller to check for the empty string. Even there is no use case now, it could be in future.

But Serhiy, I am actually very glad to get a determined result, even it's rejected. If the issue is still open, I would think you are still open to discussion. Such a trivial patch isn't worth it.
History
Date User Action Args
2022-04-11 14:58:38adminsetgithub: 72584
2016-11-18 17:12:44xiang.zhangsetassignee: serhiy.storchaka
messages: + msg281148
2016-11-18 16:52:31serhiy.storchakasetmessages: + msg281147
2016-11-18 16:33:48vstinnersetmessages: + msg281142
2016-11-18 16:29:02serhiy.storchakasetmessages: + msg281140
2016-11-18 15:43:21vstinnersetmessages: + msg281129
2016-11-07 05:23:31xiang.zhangsetmessages: + msg280188
2016-11-06 15:44:18serhiy.storchakasetmessages: + msg280141
2016-10-10 02:44:22xiang.zhangsetmessages: + msg278398
2016-10-09 20:35:14serhiy.storchakasetmessages: + msg278389
2016-10-09 15:49:06xiang.zhangcreate