Message 410632 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	bar.harel
Recipients	aganders3, aviramha, bar.harel, benjamin.peterson, brandtbucher, bukzor, georg.brandl, levkivskyi, methane, miss-islington, pitrou, rhettinger, serhiy.storchaka
Date	2022-01-15.08:52:56
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1642236776.63.0.841515965946.issue46376@roundup.psfhosted.org>
In-reply-to

Content
@rhettinger I completely understand what you're saying and at first I agreed with you. Before I gave it a closer look, I thought about the same thing - we want reliability. Reliability is important and will avoid subtle bugs, which is why I was against this change for the exact reasons you mentioned: it is both breaking and unreliable. I then realized that this change can be a reliable replacement for isinstance(obj, collections.abc.Sequence) at the C level. Let's use the broken sre_parse.SubPattern as an example - it does not register or inherit from collections.abc.Sequence, and isinstance(SubPattern, collections.abc.Sequence) == False. We cannot know programmatically if SubPattern is a Sequence, we cannot type hint it as such, and apart from reading the documentation, we cannot deal with the type differently in dynamic code that accepts either sequences or mappings. I dare to say, counting on it being a sequence, especially on a LBYL language like C is even less reliable. While SubPattern "embraces" the spirit of duck typing, it is very hard to fit in light of all recent changes advocating for a more structured and well defined types. After all, this feature was requested in order to solve reliability issues in statically typed languages. Putting everything aside, the grand question still remains: do you think that there's a use for an efficient C-API isinstance check for Sequences and Mappings? I would presume the answer is yes. Would we encourage it? I have no clue. But if there's a need, we can either change this function as it has the same "spirit" or introduce a new one to prevent breaking existing code. To answer your question: per specification, testing for Py_TPFLAGS_SEQUENCE using PyType_HasFeature, does not take strings, bytes and bytearray into consideration, and will not suffice. It is an incorrect solution that is even less reliable and falls into the exact pitfall of "guesswork" (for example SubPattern currently doesn't work with it either). It is not encouraged or easily thought of. PySequence_Check which is much more intuitive yet doesn't work either and that's where fixing it can have an edge. A theoretical `PyIsInstance_Sequence` can check for TPFLAGS_SEQUENCE and Str/Bytes/ByteArray_Check. If I'm not wrong, doing so will be 100% reliable, identical to isinstance(obj, Sequence), and will be very efficient. As a side-note, the C-API documentation for TP_FLAGS is not clear atm. It mentions for example tp_as_sequence and says "if such a flag bit is clear, the type fields it guards must not be accessed and must be considered to have a zero or NULL value instead" yet Py_TPFLAGS_SEQUENCE does not actually coincide with sequences per specification. I know it has a different explanation as well and the flag has its own docstring, but it is still a bit misleading.

@rhettinger I completely understand what you're saying and at first I agreed with you. Before I gave it a closer look, I thought about the same thing - we want reliability. Reliability is important and will avoid subtle bugs, which is why I was against this change for the exact reasons you mentioned: it is both breaking and unreliable.

I then realized that this change can be a reliable replacement for isinstance(obj, collections.abc.Sequence) at the C level. Let's use the broken sre_parse.SubPattern as an example - it does not register or inherit from collections.abc.Sequence, and isinstance(SubPattern, collections.abc.Sequence) == False. We cannot know programmatically if SubPattern is a Sequence, we cannot type hint it as such, and apart from reading the documentation, we cannot deal with the type differently in dynamic code that accepts either sequences or mappings. I dare to say, counting on it being a sequence, especially on a LBYL language like C is even less reliable. While SubPattern "embraces" the spirit of duck typing, it is very hard to fit in light of all recent changes advocating for a more structured and well defined types. After all, this feature was requested in order to solve reliability issues in statically typed languages.

Putting everything aside, the grand question still remains: do you think that there's a use for an efficient C-API isinstance check for Sequences and Mappings? I would presume the answer is yes. Would we encourage it? I have no clue. But if there's a need, we can either change this function as it has the same "spirit" or introduce a new one to prevent breaking existing code.

To answer your question: per specification, testing for Py_TPFLAGS_SEQUENCE using PyType_HasFeature, does not take strings, bytes and bytearray into consideration, and will not suffice. It is an incorrect solution that is even less reliable and falls into the exact pitfall of "guesswork" (for example SubPattern currently doesn't work with it either). It is not encouraged or easily thought of. PySequence_Check which is much more intuitive yet doesn't work either and that's where fixing it can have an edge.

A theoretical `PyIsInstance_Sequence` can check for TPFLAGS_SEQUENCE and Str/Bytes/ByteArray_Check. If I'm not wrong, doing so will be 100% reliable, identical to isinstance(obj, Sequence), and will be very efficient.

As a side-note, the C-API documentation for TP_FLAGS is not clear atm. It mentions for example tp_as_sequence and says "if such a flag bit is clear, the type fields it guards must not be accessed and must be considered to have a zero or NULL value instead" yet Py_TPFLAGS_SEQUENCE does not actually coincide with sequences per specification. I know it has a different explanation as well and the flag has its own docstring, but it is still a bit misleading.

History
Date	User	Action	Args
2022-01-15 08:52:56	bar.harel	set	recipients: + bar.harel, georg.brandl, rhettinger, pitrou, benjamin.peterson, methane, bukzor, aganders3, serhiy.storchaka, levkivskyi, miss-islington, brandtbucher, aviramha
2022-01-15 08:52:56	bar.harel	set	messageid: <1642236776.63.0.841515965946.issue46376@roundup.psfhosted.org>
2022-01-15 08:52:56	bar.harel	link	issue46376 messages
2022-01-15 08:52:56	bar.harel	create