Issue 39109: [C-API] PyUnicode_FromString

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/83290

classification

process

Created on 2019-12-20 12:33 by YannickSchmetz, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (2)
msg358706 - (view)	Author: Yannick (YannickSchmetz)	Date: 2019-12-20 12:33
Python version: 3.5 Tested with VS Studio 2017 in an C-API extension. When you have a UTF-8 encoded char buffer which is filled with a 0 or empty, and you youse the PyUnicode_FromString() method on this buffer, you will get a PyObject. The content looks good, but the refence counter looks strange. In case of an 0 as char in the buffer, the ob_refcnt Field is set to 100 and in case of an empty buffer, the ob_refcnt Field is set to something around 9xx. Example Code: string s1 = u8""; string s2 = u8"0"; PyObject o1 = PyUnicode_FromString(s1.c_str()); //o1->ob_refcnt = 9xx PyObject *o2 = PyUnicode_FromString(s2.c_str()); //o2->ob_refcnt = 100 I think the ob_refcnt Field should be 1 in both cases. Or why is the refcnt here so high?
msg358707 - (view)	Author: STINNER Victor (vstinner) *	Date: 2019-12-20 13:07
> I think the ob_refcnt Field should be 1 in both cases. Or why is the refcnt here so high? Python has singletons for short strings: empty string and 1-character latin1 characters (unicode range [U+0000; U+00FF]). Examples: >>> sys.getrefcount("") 103 >>> sys.getrefcount("a") 11 It's not a bug, but an optimization to reduce the memory footprint ;-)

History
Date	User	Action	Args
2022-04-11 14:59:24	admin	set	github: 83290
2019-12-20 13:07:41	vstinner	set	status: open -> closed nosy: + vstinner messages: + msg358707 resolution: not a bug stage: resolved
2019-12-20 12:33:15	YannickSchmetz	create