classification
Title: [C-API] PyUnicode_FromString
Type: behavior Stage: resolved
Components: C API Versions: Python 3.5
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: YannickSchmetz, vstinner
Priority: normal Keywords:

Created on 2019-12-20 12:33 by YannickSchmetz, last changed 2019-12-20 13:07 by vstinner. This issue is now closed.

Messages (2)
msg358706 - (view) Author: Yannick (YannickSchmetz) Date: 2019-12-20 12:33
Python version: 3.5
Tested with VS Studio 2017 in an C-API extension.

When you have a UTF-8 encoded char buffer which is filled with a 0 or empty, and you youse the PyUnicode_FromString() method on this buffer, you will get a PyObject*. The content looks good, but the refence counter looks strange. 

In case of an 0 as char in the buffer, the ob_refcnt Field is set to 100 and in case of an empty buffer, the ob_refcnt Field is set to something around 9xx. 

Example Code: 
      string s1 = u8"";
      string s2 = u8"0";

      PyObject *o1 = PyUnicode_FromString(s1.c_str());
      //o1->ob_refcnt = 9xx
      PyObject *o2 = PyUnicode_FromString(s2.c_str());
      //o2->ob_refcnt = 100

I think the ob_refcnt Field should be 1 in both cases. Or why is the refcnt here so high?
msg358707 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-12-20 13:07
> I think the ob_refcnt Field should be 1 in both cases. Or why is the refcnt here so high?

Python has singletons for short strings: empty string and 1-character latin1 characters (unicode range [U+0000; U+00FF]).

Examples:

>>> sys.getrefcount("")
103
>>> sys.getrefcount("a")
11

It's not a bug, but an optimization to reduce the memory footprint ;-)
History
Date User Action Args
2019-12-20 13:07:41vstinnersetstatus: open -> closed

nosy: + vstinner
messages: + msg358707

resolution: not a bug
stage: resolved
2019-12-20 12:33:15YannickSchmetzcreate