Message 106648 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	lemburg, vstinner
Date	2010-05-28.11:30:19
SpamBayes Score	9.202961e-05
Marked as misclassified	No
Message-id	<4BFFA949.2000405@egenix.com>
In-reply-to	<1275044642.5.0.15082943935.issue8839@psf.upfronthosting.co.za>

Content
STINNER Victor wrote: > > STINNER Victor <victor.stinner@haypocalc.com> added the comment: > >> Given that "y#" is not (yet) in wide-spread use, ... > > t# is only used once (in codecs.charbuffer_encode()), whereas y# is used by ossaudiodev, socket and mmap modules (there are 8 functions using y#). There are 46 functions using y* format. y format is not used in Python3. > > To me, it looks easier to just drop t# and continue to use y, y* and y# formats in Python3. You are forgetting our main target: to get extension writers to port their extensions to Python3. Changes to the Python core are a lot easier to implement than getting thousands of extensions ported. "t#" is in wide-spread use, since it's the only way a Python2 extension can request access to an object's text data version. "y#" was introduced with Python3, and there are only very few extensions written for it. Given these facts, it's better to drop "y#" and replace it with "t#". This is easily done for the core modules and by adding synonyms for "y#" we can also automatically take care of the few Python3 extensions possibly using it. >> "y#" and "y" could then be setup as synonyms for "t#" and "t" > > If we have to keep backward compatibility, yes, t# can be kept as a synonym for y#. But I don't think that backward compatibility of the C API is important in Python3 because only few 3rd party modules are compatible with Python3. True and that's why we have to make it easier for extension writer to port their extensions rather than making it harder. It is not too difficult to adjust a Python2 extension to work in Python3 as well, so that's most likely the route that many extension writer will take, hence the need to reduce the number of differences between the Python2 and Python3 C API. > -- > > I prefer to use y, y* and y# formats because they target the bytes type (which is the Python3 type to store byte strings), whereas s# is used in Python2 to get text, str type.. which are byte strings, but most Python2 programmers consider that the str type is the type of chararacter string. I see the change of s# to y#, as the change from str to bytes (the strict separation between bytes and str). That's not correct: "s#" is used in Python2 to get at the bytes representation of an object, not the text version. "t#" was specifically added to access a text version of the content. In Python3, this distinction is no longer available (for whatever reason), so only the bytes representation of the object remains. Looking at the implementation again, I found that "y#" rejects Unicode, while "s#" returns the default encoded version like "t#" does in Python2. So I have to correct what I said earlier: "y#" is not the right replacement for "t#" in order to stay compatible with its Python2 pendant. The "t#" implementation in Python3 is not compatible with the Python2 approach - it's in fact, a totally different parser, since Unicode no longer provides a buffer interface and thus cannot be used as input for "t#". The only compatible pendant to the Python2 "t#" parser marker in Python3 appears to be "s#". I'll have to think about this some more, but seen in that light, removing "t#" in Python3 may actually be a better strategy after all - mostly to remove a misguided forward-porting attempt and to reduce the number of surprising extension writer will see when porting their apps to Python3.

STINNER Victor wrote:
> 
> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
> 
>> Given that "y#" is not (yet) in wide-spread use, ...
> 
> t# is only used once (in codecs.charbuffer_encode()), whereas y# is used by ossaudiodev, socket and mmap modules (there are 8 functions using y#). There are 46 functions using y* format. y format is not used in Python3.
> 
> To me, it looks easier to just drop t# and continue to use y, y* and y# formats in Python3.

You are forgetting our main target: to get extension writers to
port their extensions to Python3. Changes to the Python core are
a lot easier to implement than getting thousands of extensions
ported.

"t#" is in wide-spread use, since it's the only way a Python2
extension can request access to an object's text data version.

"y#" was introduced with Python3, and there are only very few
extensions written for it.

Given these facts, it's better to drop "y#" and replace it with
"t#". This is easily done for the core modules and by adding
synonyms for "y#" we can also automatically take care of the
few Python3 extensions possibly using it.

>> "y#" and "y*" could then be setup as synonyms for "t#" and "t*"
> 
> If we have to keep backward compatibility, yes, t# can be kept as a synonym for y#. But I don't think that backward compatibility of the C API is important in Python3 because only few 3rd party modules are compatible with Python3.

True and that's why we have to make it easier for extension writer
to port their extensions rather than making it harder.

It is not too difficult to adjust a Python2 extension to work
in Python3 as well, so that's most likely the route that
many extension writer will take, hence the need to reduce the
number of differences between the Python2 and Python3 C API.

> --
> 
> I prefer to use y, y* and y# formats because they target the *bytes* type (which is the Python3 type to store byte strings), whereas s# is used in Python2 to get text, *str* type.. which are byte strings, but most Python2 programmers consider that the str type is the type of chararacter string. I see the change of s# to y#, as the change from str to bytes (the strict separation between bytes and str).

That's not correct: "s#" is used in Python2 to get at the bytes
representation of an object, not the text version. "t#" was
specifically added to access a text version of the content.

In Python3, this distinction is no longer available (for whatever
reason), so only the bytes representation of the object remains.

Looking at the implementation again, I found that "y#" rejects
Unicode, while "s#" returns the default encoded version like
"t#" does in Python2.

So I have to correct what I said earlier:

"y#" is not the right replacement for "t#" in order to stay compatible
with its Python2 pendant. The "t#" implementation in Python3 is not
compatible with the Python2 approach - it's in fact, a totally
different parser, since Unicode no longer provides a buffer interface
and thus cannot be used as input for "t#".

The only compatible pendant to the Python2 "t#" parser marker
in Python3 appears to be "s#".

I'll have to think about this some more, but seen in that light,
removing "t#" in Python3 may actually be a better strategy after
all - mostly to remove a misguided forward-porting attempt
and to reduce the number of surprising extension writer will
see when porting their apps to Python3.

History
Date	User	Action	Args
2010-05-28 11:30:22	lemburg	set	recipients: + lemburg, vstinner
2010-05-28 11:30:20	lemburg	link	issue8839 messages
2010-05-28 11:30:19	lemburg	create