This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Introduce new data model method __iter_items__
Type: enhancement Stage:
Components: C API Versions: Python 3.10
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: bob.ippolito, conqp, eric.smith, rhettinger, steven.daprano
Priority: normal Keywords:

Created on 2020-12-28 15:26 by conqp, last changed 2022-04-11 14:59 by admin.

Messages (5)
msg383897 - (view) Author: Richard Neumann (conqp) * Date: 2020-12-28 15:26
I have use cases in which I use named tuples to represent data sets, e.g:

class BasicStats(NamedTuple):
    """Basic statistics response packet."""

    type: Type
    session_id: BigEndianSignedInt32
    motd: str
    game_type: str
    map: str
    num_players: int
    max_players: int
    host_port: int
    host_ip: IPAddressOrHostname

I want them to behave as intended, i.e. that unpacking them should behave as expected from a tuple:

type, session_id, motd, … = BasicStats(…)

I also want to be able to serialize them to a JSON-ish dict.
The NamedTuple has an _asdict method, that I could use.

json = BasicStats(…)._asdict()

But for the dict to be passed to JSON, I need customization of the dict representation, e.g. set host_ip to str(self.host_ip), since it might be a non-serializable ipaddress.IPv{4,6}Address. Doing this in an object hook of json.dumps() is a non-starter, since I cannot force the user to remember, which types need to be converted on the several data structures.
Also, using _asdict() seems strange as an exposed API, since it's an underscore method and users hence might not be inclined to use it.

So what I did is to add a method to_json() to convert the named tuple into a JSON-ish dict:

    def to_json(self) -> dict:
        """Returns a JSON-ish dict."""
        return {
            'type': self.type.value,
            'session_id': self.session_id,
            'motd': self.motd,
            'game_type': self.game_type,
            'map': self.map,
            'num_players': self.num_players,
            'max_players': self.max_players,
            'host_port': self.host_port,
            'host_ip': str(self.host_ip)
        }

It would be nicer to have my type just return this appropriate dict when invoking dict(BasicStats(…)). This would require me to override the __iter__() method to yield key / value tuples for the dict.
However, this would break the natural behaviour of tuple unpacking as described above.

Hence, I propose to add a method __iter_items__(self) to the python data model with the following properties:

1) __iter_items__ is expected to return an iterator of 2-tuples representing key / value pairs.
2) the built-in function dict(), when called on an object, will attempt to create the object from __iter_items__ first and fall back to __iter__.

Alternative names could also be __items__ or __iter_dict__.
msg383909 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-12-28 18:01
This core of this idea is plausible.  It is a common problem for people to want to teach a class how to convert itself to and from JSON.

Altering the API for dicts is a major step, so you would need to take this to python-ideas to start getting buy-in.   A much smaller API change would be to just teach the JSON module to recognize a __json__ method.

Presumably if a robust serialization solution is created, people will need a way to deserialize back into a named tuple, data class, or custom class.  Offhand, the only way I can think of to do this would be to add a field that could be recognized by json.load().  Some care would be needed to not create a pickle-like risk of arbitrary code execution.
msg383933 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2020-12-28 21:49
Hi Richard,


> Also, using _asdict() seems strange as an exposed API, since it's an underscore method and users hence might not be inclined to use it.

I don't consider this a strong argument. Named tuple in general has to use a naming convention for public methods that cannot clash with field names, hence the single underscore, but your concrete named tuple class can offer any methods you like since you know which field names are used and which are not. Just add a public method "asdict" or any name you prefer, and delegate to the single underscore method.



> It would be nicer to have my type just return this appropriate dict when invoking dict(BasicStats(…)).

As we speak there is a discussion on Python-Ideas about this.

https://mail.python.org/archives/list/python-ideas@python.org/thread/2HMRGJ672NDZJZ5PVLMNVW6KP7OHMQDI/#UYDIPMY2HXGL4OLEEFXBTZ2T4CK6TSVU

Your input would be appreciated.


> This would require me to override the __iter__() method to yield key / value tuples for the dict.

The dict constructor does not require that. See discussion on the thread above.

If you search the Python-Ideas archives, I am sure you will find past proposals for a `__json__` protocol. If I recall correctly, there was some concern about opening the flood-gates for dunder protocols (will this be followed with demands for __yaml__, __xml__, __cson__, __toml__, etc?) but perhaps the time is right to revisit this idea.
msg384486 - (view) Author: Richard Neumann (conqp) * Date: 2021-01-06 10:53
Thank you all for your input.
I had a look at aforementioned discussion and learned something new.
So I tried to implement the dict data model by implementing keys() and __getitem__() accordingly:

from typing import NamedTuple


class Spamm(NamedTuple):

    foo: int
    bar: str

    def __getitem__(self, item):
        if isinstance(item, str):
            try:
                return getattr(self, item)
            except AttributeError:
                raise KeyError(item) from None

        return super().__getitem__(item)

    def keys(self):
        yield 'foo'
        yield 'bar'


def main():

    spamm = Spamm(12, 'hello')
    print(spamm.__getitem__)
    print(spamm.__getitem__(1))
    d = dict(spamm)


if __name__ == '__main__':
    main()


Unfortunately this will result in an error:

Traceback (most recent call last):
  File "/home/neumann/test.py", line 4, in <module>
    class Spamm(NamedTuple):
RuntimeError: __class__ not set defining 'Spamm' as <class '__main__.Spamm'>. Was __classcell__ propagated to type.__new__?

Which seems to be caused by the __getitem__ implementation.
I found a corresponding issue here: https://bugs.python.org/issue41629
Can I assume, that this is a pending bug and thusly I cannot implement the desired behaviour until a fix?
msg384488 - (view) Author: Richard Neumann (conqp) * Date: 2021-01-06 10:58
Okay, I found the solution. Not using super() works:

from typing import NamedTuple


class Spamm(NamedTuple):

    foo: int
    bar: str

    def __getitem__(self, index_or_key):
        if isinstance(index_or_key, str):
            try:
                return getattr(self, index_or_key)
            except AttributeError:
                raise KeyError(index_or_key) from None

        return tuple.__getitem__(self, index_or_key)

    def keys(self):
        yield 'foo'
        yield 'bar'


def main():

    spamm = Spamm(12, 'hello')
    print(spamm.__getitem__)
    print(spamm.__getitem__(1))
    d = dict(spamm)
    print(d)


if __name__ == '__main__':
    main()

Result:

<bound method Spamm.__getitem__ of Spamm(foo=12, bar='hello')>
hello
{'foo': 12, 'bar': 'hello'}
History
Date User Action Args
2022-04-11 14:59:39adminsetgithub: 86931
2021-01-06 10:58:49conqpsetmessages: + msg384488
2021-01-06 10:53:10conqpsetmessages: + msg384486
2020-12-29 00:56:45eric.smithsetnosy: + eric.smith
2020-12-28 21:49:50steven.dapranosetnosy: + steven.daprano
messages: + msg383933
2020-12-28 18:02:53rhettingersetnosy: + bob.ippolito
2020-12-28 18:01:33rhettingersetmessages: + msg383909
2020-12-28 15:37:30xtreaksetnosy: + rhettinger
2020-12-28 15:26:40conqpcreate