Message 241319 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	benjamin.peterson, georg.brandl, larry, lemburg
Date	2015-04-17.07:57:47
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<5530BCF2.2020203@egenix.com>
In-reply-to	<1429242641.5.0.823130320938.issue23980@psf.upfronthosting.co.za>

Content
On 17.04.2015 05:50, Larry Hastings wrote: > > Larry Hastings added the comment: > >> The "e" variants (typically) allocate a buffer for you, since it's pretty >> much unknown how long the encoded data will be. > > All four will do it if you pass in a NULL pointer. "es#" and "et#" can reuse > an existing buffer, because you can also pass in its size. Right. >> So I guess the "e" descriptions need to have the additional * removed >> or the paragraph has to be updated and all other listings need >> to be converted to precise types (that would be my preference). > > Here's the problem with removing the "additional ". The first argument > to encoding is a static string. You literally pass in the char string, > not a pointer to a variable containing the address of the string. e.g. > > PyArg_ParseTuple("es", args, "utf-8", &buffer); > > So how do we annotate that? [char, char ]? A literal static string is really a pointer as well. The compiler will allocate and initialize the string and then provide a pointer to reference it. It is also possible to pass in the pointer directly - it just needs to be a const char , i.e. one who's target value doesn't change. So for "es" this would have to be [const char, char ], but I like your & idea more: [& const char , & char **] >> I wonder why no one has noticed in all these years. > > Because nobody ever freaking uses the "e" format units. But by golly > I'm going to see that the docs are right and Clinic supports them > correctly--or die trying. It's not used much, but there are a few cases in the posixmodule and the Mac modules of the Python 2.7 stdlib, as well as in extension modules, of course. The Python 3.4 stdlib does not seem to use them anymore (the uses were replaced with custom solutions or the modules removed). I guess people never found out about those parser markers or simply always fetch Unicode as object rather than as encoded string.

On 17.04.2015 05:50, Larry Hastings wrote:
> 
> Larry Hastings added the comment:
> 
>> The "e" variants (typically) allocate a buffer for you, since it's pretty
>> much unknown how long the encoded data will be.
> 
> All four will do it if you pass in a NULL pointer.  "es#" and "et#" can reuse
> an existing buffer, because you can also pass in its size.

Right.

>> So I guess the "e" descriptions need to have the additional * removed
>> or the paragraph has to be updated and all other listings need
>> to be converted to precise types (that would be my preference).
> 
> Here's the problem with removing the "additional *".  The first argument
> to encoding is a static string.  You literally pass in the char * string,
> not a pointer to a variable containing the address of the string.  e.g.
> 
>   PyArg_ParseTuple("es", args, "utf-8", &buffer);
> 
> So how do we annotate that?  [char, char *]?

A literal static string is really a pointer as well. The compiler
will allocate and initialize the string and then provide a pointer
to reference it. It is also possible to pass in the pointer
directly - it just needs to be a const char *, i.e. one who's target
value doesn't change.

So for "es" this would have to be [const char, char *], but I like
your & idea more: [& const char *, & char **]

>> I wonder why no one has noticed in all these years.
> 
> Because nobody ever freaking uses the "e" format units.  But by golly
> I'm going to see that the docs are right and Clinic supports them
> correctly--or die trying.

It's not used much, but there are a few cases in the posixmodule and
the Mac modules of the Python 2.7 stdlib, as well as in extension modules,
of course. The Python 3.4 stdlib does not seem to use them anymore
(the uses were replaced with custom solutions or the modules removed).

I guess people never found out about those parser markers or
simply always fetch Unicode as object rather than as encoded
string.

History
Date	User	Action	Args
2015-04-17 07:57:47	lemburg	set	recipients: + lemburg, georg.brandl, larry, benjamin.peterson
2015-04-17 07:57:47	lemburg	link	issue23980 messages
2015-04-17 07:57:47	lemburg	create