Issue8839
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2010-05-27 23:29 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
getarg_remove_tdash.patch | vstinner, 2010-05-28 10:33 | |||
getarg_remove_tdash-2.patch | vstinner, 2010-06-06 21:09 |
Messages (11) | |||
---|---|---|---|
msg106627 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2010-05-27 23:29 | |
"t#" format was introduced by r11803 (11 years ago): "Implement new format character 't#'. This is like s#, accepting an object that implements the buffer interface, but requires a buffer that contains 8-bit character data." Python3 now has a strict separation between byte string (bytes and bytearray types) and unicode string (str), and has PyBuffer and PyCapsule APIs. "t#" format can be replaced by "y#" or "y*". Extract of getarg.c: /*TEO: This can be eliminated --- here only for backward compatibility */ case 't': { /* 8-bit character buffer, read-only access */ In Python, the last function using "t#" is _codecs.charbuffer_encode() and I proposed to remove this function in #8838. We can also patch this function. I don't know if third party modules use this format or not. I don't know if it can be just removed or if it should raise a deprecation warning (but who will notice such warning since there are disabled by default?). |
|||
msg106641 - (view) | Author: Marc-Andre Lemburg (lemburg) * ![]() |
Date: 2010-05-28 08:14 | |
STINNER Victor wrote: > > New submission from STINNER Victor <victor.stinner@haypocalc.com>: > > "t#" format was introduced by r11803 (11 years ago): "Implement new format character 't#'. This is like s#, accepting an object that implements the buffer interface, but requires a buffer that contains 8-bit character data." > > Python3 now has a strict separation between byte string (bytes and bytearray types) and unicode string (str), and has PyBuffer and PyCapsule APIs. "t#" format can be replaced by "y#" or "y*". > > Extract of getarg.c: > > /*TEO: This can be eliminated --- here only for backward > compatibility */ > case 't': { /* 8-bit character buffer, read-only access */ > > In Python, the last function using "t#" is _codecs.charbuffer_encode() and I proposed to remove this function in #8838. We can also patch this function. > > I don't know if third party modules use this format or not. I don't know if it can be just removed or if it should raise a deprecation warning (but who will notice such warning since there are disabled by default?). Since Python3 completely removed the getcharbuffer interface to which the "t#" interfaces in Python2, "t#" does indeed no longer serve any special purpose. It's probably wise to just map "t#" to "y#" in order to ease porting extensions from 2.x to 3.x. |
|||
msg106642 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2010-05-28 10:33 | |
Patch to remove "t#": - Update c-api/arg.rst documentation - Replace "t#" format by "y#" in codecs.charbuffer_encode() - Add a note in Doc/whatsnew/3.2.rst (in Porting to Python 3.2) |
|||
msg106643 - (view) | Author: Marc-Andre Lemburg (lemburg) * ![]() |
Date: 2010-05-28 10:39 | |
STINNER Victor wrote: > > STINNER Victor <victor.stinner@haypocalc.com> added the comment: > > Patch to remove "t#": > - Update c-api/arg.rst documentation > - Replace "t#" format by "y#" in codecs.charbuffer_encode() > - Add a note in Doc/whatsnew/3.2.rst (in Porting to Python 3.2) Given that "y#" is not (yet) in wide-spread use, it may actually make more sense, to replace "y#" with "t#" and introduce "t*" to replace "y*". "y#" and "y*" could then be setup as synonyms for "t#" and "t*". |
|||
msg106644 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2010-05-28 11:04 | |
> Given that "y#" is not (yet) in wide-spread use, ... t# is only used once (in codecs.charbuffer_encode()), whereas y# is used by ossaudiodev, socket and mmap modules (there are 8 functions using y#). There are 46 functions using y* format. y format is not used in Python3. To me, it looks easier to just drop t# and continue to use y, y* and y# formats in Python3. > "y#" and "y*" could then be setup as synonyms for "t#" and "t*" If we have to keep backward compatibility, yes, t# can be kept as a synonym for y#. But I don't think that backward compatibility of the C API is important in Python3 because only few 3rd party modules are compatible with Python3. -- I prefer to use y, y* and y# formats because they target the *bytes* type (which is the Python3 type to store byte strings), whereas s# is used in Python2 to get text, *str* type.. which are byte strings, but most Python2 programmers consider that the str type is the type of chararacter string. I see the change of s# to y#, as the change from str to bytes (the strict separation between bytes and str). |
|||
msg106648 - (view) | Author: Marc-Andre Lemburg (lemburg) * ![]() |
Date: 2010-05-28 11:30 | |
STINNER Victor wrote: > > STINNER Victor <victor.stinner@haypocalc.com> added the comment: > >> Given that "y#" is not (yet) in wide-spread use, ... > > t# is only used once (in codecs.charbuffer_encode()), whereas y# is used by ossaudiodev, socket and mmap modules (there are 8 functions using y#). There are 46 functions using y* format. y format is not used in Python3. > > To me, it looks easier to just drop t# and continue to use y, y* and y# formats in Python3. You are forgetting our main target: to get extension writers to port their extensions to Python3. Changes to the Python core are a lot easier to implement than getting thousands of extensions ported. "t#" is in wide-spread use, since it's the only way a Python2 extension can request access to an object's text data version. "y#" was introduced with Python3, and there are only very few extensions written for it. Given these facts, it's better to drop "y#" and replace it with "t#". This is easily done for the core modules and by adding synonyms for "y#" we can also automatically take care of the few Python3 extensions possibly using it. >> "y#" and "y*" could then be setup as synonyms for "t#" and "t*" > > If we have to keep backward compatibility, yes, t# can be kept as a synonym for y#. But I don't think that backward compatibility of the C API is important in Python3 because only few 3rd party modules are compatible with Python3. True and that's why we have to make it easier for extension writer to port their extensions rather than making it harder. It is not too difficult to adjust a Python2 extension to work in Python3 as well, so that's most likely the route that many extension writer will take, hence the need to reduce the number of differences between the Python2 and Python3 C API. > -- > > I prefer to use y, y* and y# formats because they target the *bytes* type (which is the Python3 type to store byte strings), whereas s# is used in Python2 to get text, *str* type.. which are byte strings, but most Python2 programmers consider that the str type is the type of chararacter string. I see the change of s# to y#, as the change from str to bytes (the strict separation between bytes and str). That's not correct: "s#" is used in Python2 to get at the bytes representation of an object, not the text version. "t#" was specifically added to access a text version of the content. In Python3, this distinction is no longer available (for whatever reason), so only the bytes representation of the object remains. Looking at the implementation again, I found that "y#" rejects Unicode, while "s#" returns the default encoded version like "t#" does in Python2. So I have to correct what I said earlier: "y#" is not the right replacement for "t#" in order to stay compatible with its Python2 pendant. The "t#" implementation in Python3 is not compatible with the Python2 approach - it's in fact, a totally different parser, since Unicode no longer provides a buffer interface and thus cannot be used as input for "t#". The only compatible pendant to the Python2 "t#" parser marker in Python3 appears to be "s#". I'll have to think about this some more, but seen in that light, removing "t#" in Python3 may actually be a better strategy after all - mostly to remove a misguided forward-porting attempt and to reduce the number of surprising extension writer will see when porting their apps to Python3. |
|||
msg106652 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2010-05-28 12:08 | |
Le vendredi 28 mai 2010 13:30:22, vous avez écrit : > Looking at the implementation again, I found that "y#" rejects > Unicode, while "s#" returns the default encoded version like > "t#" does in Python2. Oh, I didn't noticed that. > So I have to correct what I said earlier: > > "y#" is not the right replacement for "t#" in order to stay compatible > with its Python2 pendant. The "t#" implementation in Python3 is not > compatible with the Python2 approach - it's in fact, a totally > different parser, since Unicode no longer provides a buffer interface > and thus cannot be used as input for "t#". > > The only compatible pendant to the Python2 "t#" parser marker > in Python3 appears to be "s#". > > I'll have to think about this some more, but seen in that light, > removing "t#" in Python3 may actually be a better strategy after > all - mostly to remove a misguided forward-porting attempt > and to reduce the number of surprising extension writer will > see when porting their apps to Python3. So t#, s# and y# are all different. I'm waiting for your final decision. "reduce the number of surprising extension writer ..." is a good argument in favor of removing t# :-) |
|||
msg107229 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2010-06-06 21:09 | |
New version of the patch: - charbuffer_encode() uses y* instead of y# format to accept modifiable buffer objects (eg. bytearray) - Improve the documentation about the change @lemburg: So, do you agree with my patch? |
|||
msg107284 - (view) | Author: Marc-Andre Lemburg (lemburg) * ![]() |
Date: 2010-06-07 21:50 | |
STINNER Victor wrote: > > STINNER Victor <victor.stinner@haypocalc.com> added the comment: > > New version of the patch: > - charbuffer_encode() uses y* instead of y# format to accept modifiable buffer objects (eg. bytearray) > - Improve the documentation about the change > > @lemburg: So, do you agree with my patch? No, because y*/y# are not correct replacements for t#. They don't accept Unicode objects. t# was meant to provide access to text data, so replacing it with a parser code that is meant for binary data is not correct. The closes Python3 gets to t# from Python2 is s# or s*, so please use those in the NEWS entry and s* in charbuffer_encode(). |
|||
msg107362 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2010-06-08 23:02 | |
> t# was meant to provide access to text data, so replacing it with a > parser code that is meant for binary data is not correct. The > closes Python3 gets to t# from Python2 is s# or s*, so please use > those in the NEWS entry and s* in charbuffer_encode(). Done. Patch commited as r81854 in 3.2: it removes also codecs.charbuffer_encode(). Commit blocked in 3.1 (r81855). |
|||
msg107450 - (view) | Author: Marc-Andre Lemburg (lemburg) * ![]() |
Date: 2010-06-10 09:43 | |
STINNER Victor wrote: > > STINNER Victor <victor.stinner@haypocalc.com> added the comment: > >> t# was meant to provide access to text data, so replacing it with a >> parser code that is meant for binary data is not correct. The >> closes Python3 gets to t# from Python2 is s# or s*, so please use >> those in the NEWS entry and s* in charbuffer_encode(). > > Done. Patch commited as r81854 in 3.2: it removes also codecs.charbuffer_encode(). Commit blocked in 3.1 (r81855). Thanks. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:01 | admin | set | github: 53085 |
2010-06-10 09:43:45 | lemburg | set | messages: + msg107450 |
2010-06-08 23:13:56 | vstinner | set | status: open -> closed resolution: fixed |
2010-06-08 23:02:02 | vstinner | set | messages: + msg107362 |
2010-06-07 21:50:21 | lemburg | set | messages: + msg107284 |
2010-06-06 21:09:15 | vstinner | set | files:
+ getarg_remove_tdash-2.patch messages: + msg107229 |
2010-05-28 13:18:02 | pitrou | set | nosy:
+ loewis |
2010-05-28 12:08:46 | vstinner | set | messages: + msg106652 |
2010-05-28 11:30:20 | lemburg | set | messages: + msg106648 |
2010-05-28 11:04:00 | vstinner | set | messages: + msg106644 |
2010-05-28 10:39:50 | lemburg | set | messages: + msg106643 |
2010-05-28 10:33:30 | vstinner | set | files:
+ getarg_remove_tdash.patch keywords: + patch messages: + msg106642 |
2010-05-28 08:14:41 | lemburg | set | nosy:
+ lemburg messages: + msg106641 |
2010-05-27 23:29:07 | vstinner | create |