classification
Title: asdict/astuple Dataclass methods
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: eric.smith Nosy List: eric.smith, gsakkis, matrixise, rhettinger, xtreak
Priority: normal Keywords:

Created on 2019-04-18 20:31 by gsakkis, last changed 2019-04-19 10:53 by gsakkis.

Messages (4)
msg340511 - (view) Author: George Sakkis (gsakkis) Date: 2019-04-18 20:31
I'd like to propose two new optional boolean parameters to the @dataclass() decorator, `asdict` and `astuple`, that if true, the respective methods are generated as equivalent to the module-level namesake functions.

In addition to saving an extra imported name, the main benefit is performance. By having access to the specific fields of the decorated class, it should be possible to generate a more efficient implementation than the one in the respective function. To illustrate the difference in performance, the asdict method is 28 times faster than the function in the following PEP 557 example:


	@dataclass
	class InventoryItem:
	    '''Class for keeping track of an item in inventory.'''
	    name: str
	    unit_price: float
	    quantity_on_hand: int = 0

	    def asdict(self): 
	        return {
	            'name': self.name, 
	            'unit_price': self.unit_price, 
	            'quantity_on_hand': self.quantity_on_hand,
	        } 
	                           

	In [4]: i = InventoryItem(name='widget', unit_price=3.0, quantity_on_hand=10)                           

	In [5]: asdict(i) == i.asdict()                                                                         
	Out[5]: True

	In [6]: %timeit asdict(i)                                                                               
	5.45 µs ± 14.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

	In [7]: %timeit i.asdict()                                                                              
	193 ns ± 0.443 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

Thoughts?
msg340523 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2019-04-19 05:01
asdict method in the benchmark does a direct dictionary construction. Meanwhile dataclasses.asdict does more work in https://github.com/python/cpython/blob/e8113f51a8bdf33188ee30a1c038a298329e7bfa/Lib/dataclasses.py#L1023 . Hence in the example i.asdict() and asdict(i) are not equivalent.

import timeit
from dataclasses import dataclass, asdict

@dataclass
class InventoryItem:
    '''Class for keeping track of an item in inventory.'''
    name: str
    unit_price: float
    quantity_on_hand: int = 0

    def asdict(self):
        data = {'name': self.name,
                'unit_price': self.unit_price,
                'quantity_on_hand': self.quantity_on_hand,
        }
        return data

i = InventoryItem(name='widget', unit_price=3.0, quantity_on_hand=10)
setup = """from dataclasses import dataclass, asdict;
@dataclass
class InventoryItem:
    '''Class for keeping track of an item in inventory.'''
    name: str
    unit_price: float
    quantity_on_hand: int = 0

    def asdict(self):
        data = {'name': self.name,
                'unit_price': self.unit_price,
                'quantity_on_hand': self.quantity_on_hand,
        }
        return data

i = InventoryItem(name='widget', unit_price=3.0, quantity_on_hand=10)"""

print("asdict(i)")
print(timeit.Timer("asdict(i)", setup=f"{setup}").timeit(number=1_000_000))
print("i.asdict()")
print(timeit.Timer("i.asdict()", setup=f"{setup}").timeit(number=1_000_000))
print("i.inlined_asdict()")
print(timeit.Timer("i.inlined_asdict(i)", setup=f"{setup}; i.inlined_asdict = asdict").timeit(number=1_000_000))

i.inlined_asdict = asdict
assert asdict(i) == i.asdict() == i.inlined_asdict(i)


./python.exe ../backups/bpo36662.py
asdict(i)
11.585838756000001
i.asdict()
0.44129350699999925
i.inlined_asdict()
11.858042807999999
msg340532 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2019-04-19 08:55
I think the best thing to do is write another decorator that adds this method. I've often thought that having a dataclasses_tools third-party module would be a good idea. It could include my add_slots decorator in https://github.com/ericvsmith/dataclasses/blob/master/dataclass_tools.py

Such a decorator could then deal with all the complications that I don't want to add to @dataclass. For example, choosing a method name. @dataclass doesn't inject any non-dunder names in the class, but the new decorator could, or it could provide a way to customize the member name.

Also, note that your example asdict method doesn't do the same thing as dataclasses.asdict. While you get some speedup by knowing the field names in advance, you also don't do the recursive generation that dataclasses.asdict does. In order to skip the recursive dict generation, you'd either have to test the type of each member (using some heuristic about what doesn't need recursion), or assume the member type matches the type defined in the class. I don't want dataclasses.asdict to make the assumption that the member type matches the declared type. There's nowhere else it does this.

I'm not sure how much of the speedup you're seeing is the result of hard-coding the member names, and how much is avoiding recursion. If all of the improvement is by eliminating recursion, then it's not worth doing.

I'm not saying the existing dataclasses.asdict can't be sped up: surely it can. But I don't want to remove features or add complexity to do so.
msg340537 - (view) Author: George Sakkis (gsakkis) Date: 2019-04-19 10:53
> I think the best thing to do is write another decorator that adds this method. I've often thought that having a dataclasses_tools third-party module would be a good idea.

I'd be happy with a separate decorator in the standard library for adding these methods. Not so sure about a third-party module, the added value is probably not high enough to justify an extra dependency (assuming one is aware it exists in the first place).

> or assume the member type matches the type defined in the class. 

This doesn't seem an unreasonable assumption to me. If I'm using a dataclass, I probably care enough about its member types to bother declaring them and I wouldn't mind if a particular method expects that the members actually match the types. This behaviour would be clearly documented. 

Alternatively, if we go with a separate decorator, whether this assumption holds could be a parameter, something like:

    def add_asdict(cls, name='asdict', strict=True)
History
Date User Action Args
2019-04-19 10:53:51gsakkissetmessages: + msg340537
2019-04-19 08:55:08eric.smithsetassignee: eric.smith
messages: + msg340532
2019-04-19 08:37:53matrixisesetnosy: + matrixise
2019-04-19 05:01:37xtreaksetnosy: + xtreak
messages: + msg340523
2019-04-19 02:35:44xtreaksetnosy: + rhettinger, eric.smith
2019-04-18 20:31:10gsakkiscreate