Issue 31085: Add option for namedtuple to name its result type automatically

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/75268

classification

Title:	Add option for namedtuple to name its result type automatically
Type:	enhancement	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.7

process

Status:	closed	Resolution:	rejected
Dependencies:		Superseder:
Assigned To:	rhettinger	Nosy List:	Isaac Morland, ethan.furman, methane, r.david.murray, rhettinger, steven.daprano
Priority:	normal	Keywords:

Created on 2017-07-31 01:05 by Isaac Morland, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (16)
msg299532 - (view)	Author: Isaac Morland (Isaac Morland)	Date: 2017-07-31 01:05
I would like to have the possibility of creating a namedtuple type without explicitly giving it a name. I see two major use cases for this: 1) Automatic creation of namedtuples for things like CSV files with headers (see #1818) or SQL results (see #13299). In this case at the point of calling namedtuple I have column headings (or otherwise automatically-determined attribute names), but there probably isn't a specific class name that makes sense to use. 2) Subclassing from a namedtuple invocation; I obviously need to name my subclass, but the name passed to the namedtuple invocation is essentially useless. My idea is to allow giving None for the typename parameter of namedtuple, like this: class MyCustomBehaviourNamedtuple (namedtuple (None, ['a', 'b'])): ... In this case namedtuple will generate a name based on the field names. This should be backward compatible because right now passing None raises a TypeError. So there is no change if a non-None typename is passed, and an exception is replaced by computing a default typename if None is passed. Patch to follow.
msg299533 - (view)	Author: Isaac Morland (Isaac Morland)	Date: 2017-07-31 01:10
I'm hoping to make a pull request but while I figure that out here is the diff: diff --git a/Lib/collections/__init__.py b/Lib/collections/__init__.py index 8408255..62cf708 100644 --- a/Lib/collections/__init__.py +++ b/Lib/collections/__init__.py @@ -384,7 +384,6 @@ def namedtuple(typename, field_names, , verbose=False, rename=False, module=Non if isinstance(field_names, str): field_names = field_names.replace(',', ' ').split() field_names = list(map(str, field_names)) - typename = str(typename) if rename: seen = set() for index, name in enumerate(field_names): @@ -394,6 +393,10 @@ def namedtuple(typename, field_names, , verbose=False, rename=False, module=Non or name in seen): field_names[index] = '_%d' % index seen.add(name) + if typename is None: + typename = '__'.join (field_names) + else: + typename = str(typename) for name in [typename] + field_names: if type(name) is not str: raise TypeError('Type names and field names must be strings')
msg299536 - (view)	Author: Steven D'Aprano (steven.daprano) *	Date: 2017-07-31 04:14
If you don't care about the name, just pass '_' for it.
msg299541 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2017-07-31 08:41
I concur with Steven.
msg299552 - (view)	Author: Isaac Morland (Isaac Morland)	Date: 2017-07-31 12:28
I want a meaningful name to appear in debugging output generated by repr() or str(), not just _ all over the place. I just don't want to specifically come up with the meaningful name myself. Right now I pass in the same generated name ('__'.join (field_names)) to the constructor, but this means I need to repeat that logic in any other similar application, and I would have to put in special handling if any of my attribute names required renaming. I would rather be explicit that I'm not providing a specific name. With your '_' suggestion it looks like a magic value - why '_'? By specifying None, it's obvious at the call point that I'm explicitly declining to provide a name, and then the code generates a semi-meaningful name automatically. Also, please note that I moved the place where typename is assigned to after the part where it handles the rename stuff, so the generated names automatically incorporate a suitable default and remain valid identifiers. I'm having trouble seeing the downside here. I'm adding one "is None" check and one line of code to the existing procedure. I can't believe I'm the only person who has wanted to skip making up a type name but still wanted something vaguely meaningful in debug output.
msg299554 - (view)	Author: Inada Naoki (methane) *	Date: 2017-07-31 12:53
When subclassing, current __repr__ uses `self.__class__.__name__`. So you get meaningful name already. When automatic generation, I recommend you to use some wrapper to cache same namedtuple, since creating namedtuple on the fly is costly job. I'm afraid "unnnamed" namedtuple may lead people to use namedtuple on the fly, like lambda.
msg299619 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2017-08-01 15:02
I think the "vaguely" pretty much says it, and you are the at least the first person who has requested it :) This is one of those cost-versus-benefit calculations. It is a specialized use case, and in other specialized use cases the "automatically generated" name that makes the most sense is likely to be something different than an amalgamation of the field names. So I vote -0.5. I don't think even the small complication of the existing code is worth it, but I'm not strongly opposed.
msg299623 - (view)	Author: Isaac Morland (Isaac Morland)	Date: 2017-08-01 17:39
First, another note I would like to point out: this is much nicer to write within namedtuple than as a wrapper function because it is trivial to use the existing rename logic when needed, as seen in the diff I provided. I suppose I could write a wrapper which calls namedtuple and then changes the class name after creation but that just feels icky. The only other alternatives would be to duplicate the rename logic or have the wrapper not work with rename. By way of response to R. David Murray: Every use case, of everything, is specialized. Another way of thinking of what I'm suggesting is that I would like to make providing a typename optional, and have the library do its best based on the other information provided in the call to namedtuple. This pretty well has to mean mashing the fieldnames together in some way because no other information about the contents of the namedtuple is provided. So I think this is a very natural feature: what else could it possibly mean to pass None for the typename? If for a particular application some other more meaningful auto-generated name is needed, that could still be provided to namedtuple(). For example, an ORM that uses the underlying table name. In response to other suggestions, I don't see how one can prefer "_" all over the place in debugging output to a string that identifies the fieldnames involved. Or really, just the option of having a string that identifies the fieldnames: I'm not forcing anyone to stop passing '_'. To INADA Naoki: thanks for pointing that out. I agree that in the subclass case it no longer matters what typename is used for the namedtuple itself. But isn't that a good reason to allow skipping the parameter, or (since you can't just skip positional parameters) passing an explicit None? On 1 August 2017 at 11:02, R. David Murray <report@bugs.python.org> wrote: > > R. David Murray added the comment: > > I think the "vaguely" pretty much says it, and you are the at least the > first person who has requested it :) > > This is one of those cost-versus-benefit calculations. It is a > specialized use case, and in other specialized use cases the "automatically > generated" name that makes the most sense is likely to be something > different than an amalgamation of the field names. > > So I vote -0.5. I don't think even the small complication of the existing > code is worth it, but I'm not strongly opposed. > > ---------- > nosy: +r.david.murray > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue31085> > _______________________________________ >
msg299627 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2017-08-01 18:32
The specialized use case is wanting to autogenerate a name with no other information provided. You suggested csv as one example where this would be used, but even in that case I'd rather see something based on the filename than a mashup of field names. I would also personally rather see '_' than a long string of field names (it would make the debug output prettier because the lines would be shorter). In contrast, being able to specify a name satisfies a wide variety of use cases, including that of autogenerating names with no other information provided. Which is why that is included in the API. I hear you about the rename logic. But for myself, since I don't like the idea of the name being a mashup of the field names, it isn't convincing :) I wrote a "parameterized tests" extension for unittest, and it has the option of autogenerating the test name from the parameter names and values. I've never used that feature, and I am considering ripping it out before I release the package, to simplify the code. If I do I might replace it with a hook for generating the test name so that the user can choose their own auto-naming scheme. Perhaps that would be an option here: a hook for generating the name, that would be called where you want your None processing to be? That would not be simpler than your proposal, but it would be more general (satisfy more use cases) and might be worth the cost. On the other hand, other developers might not like the API bloat ;)
msg299643 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2017-08-02 05:12
[R David Murray] > So I vote -0.5. Put me down for a full -1: * This would be a potentially confusing addition to the API. * It may also encourage bad practices that we don't want to see in real code. * We want to be able to search for the namedtuple definition, want to have a meaningful repr, and want pickling to be easy. * This doesn't have to be shoe-horned into the namedtuple API. If an actual need did arise, it is trivial to write a wrapper that specifies whatever auto-naming logic happens to make sense for a particular application: >>> from collections import namedtuple >>> def auto_namedtuple(attrnames, kwargs): typename = '_'.join(attrnames) return namedtuple(typename, attrnames, *kwargs) >>> NT = auto_namedtuple('name', 'rank', 'serial') >>> print(NT.__doc__) name_rank_serial(name, rank, serial)
msg299672 - (view)	Author: Isaac Morland (Isaac Morland)	Date: 2017-08-02 20:50
On 1 August 2017 at 14:32, R. David Murray <report@bugs.python.org> wrote: > > R. David Murray added the comment: > > I wrote a "parameterized tests" extension for unittest, and it has the > option of autogenerating the test name from the parameter names and > values. I've never used that feature, and I am considering ripping it out > before I release the package, to simplify the code. If I do I might > replace it with a hook for generating the test name so that the user can > choose their own auto-naming scheme. > > Perhaps that would be an option here: a hook for generating the name, that > would be called where you want your None processing to be? That would not > be simpler than your proposal, but it would be more general (satisfy more > use cases) and might be worth the cost. On the other hand, other > developers might not like the API bloat ;) > It's August, not April. Raymond Hettinger is accusing my proposed API of being potentially confusing, while you're suggesting providing a hook? All I want is the option of telling namedtuple() to make up its own typename, for situations where there should be one but I don't want to provide it. Having said that, if people really think a hook like this is worth doing, I'll implement it. But I agree that it seems excessively complicated. Let's see if auto-generation is useful first, then if somebody wants a different auto-generation, provide the capability.
msg299673 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2017-08-02 21:00
Yeah, different developers have different opinions. We discuss (I'd say argue, which is accurate, but has acquired negative connotations) until we reach a consensus. And if we don't reach a consensus we leave it alone ("status quo wins a stalemate").
msg299674 - (view)	Author: Isaac Morland (Isaac Morland)	Date: 2017-08-02 21:05
OK, so it's pretty clear this is heading towards a rejection, but I can't help but respond to your points: On 2 August 2017 at 01:12, Raymond Hettinger <report@bugs.python.org> wrote: * This would be a potentially confusing addition to the API. > I'm giving a natural meaning to providing a None where it is not permitted now. The meaning is to provide a reasonable value for the missing parameter. How could that be confusing? Also it's completely ignorable - people don't have to pass None and get the auto-generated typename if they don't want to. > * It may also encourage bad practices that we don't want to see in real > code. > What bad practices? There are lots of times when providing an explicit name is a waste of effort. This provides a simple way of telling the library to figure it out. Aren't there supposedly just two hard things in computer science? Naming things, and cache invalidation. An opportunity to avoid naming things that don't need to be specifically named is something worth taking. > * We want to be able to search for the namedtuple definition, want to have > a meaningful repr, and want pickling to be easy. > You mean by searching for the typename in the source code? In my primary usecase, the typename is computed regardless, so it doesn't appear in the source code and can't be searched for. The other suggestion which appeared at one point was passing "_" as the typename. This is going to be somewhat challenging to search for also. As to the meaningful repr, that is why I want auto-generation of the typename. This is not for uses like this: MyType = namedtuple ('MyType', ['a', 'b', 'c']) It is for ones more like this: rowtype = namedtuple (None, row_headings) Or as it currently has to be: rowtype = namedtuple ('rowtype', row_headings) (leading to all the rowtypes being the same name, so less meaningful) Or: rowtype = namedtuple ('__'.join (row_headings), row_headings) (which repeats the irrelevant-in-its-details computation wherever it is needed and doesn't support rename=True, unless a more complicated computation that duplicates code inside of namedtuple() is repeated) Finally I'm not clear on how pickling is made more difficult by having namedtuple() generate a typename. The created type still has a typename. But I'm interested - this is the only point I don't think I understand. * This doesn't have to be shoe-horned into the namedtuple API. If an > actual need did arise, it is trivial to write a wrapper that specifies > whatever auto-naming logic happens to make sense for a particular > application: > > >>> from collections import namedtuple > >>> def auto_namedtuple(attrnames, kwargs): > typename = '_'.join(attrnames) > return namedtuple(typename, attrnames, *kwargs) > > >>> NT = auto_namedtuple('name', 'rank', 'serial') > >>> print(NT.__doc__) > name_rank_serial(name, rank, serial) Your code will not work if rename=True is needed. I don't want to repeat the rename logic as doing so is a code smell. In short, I'm disappointed. I'm not surprised to make a suggestion, and have people point out problems. For example, my original proposal ignored the difficulties of creating the C implementation, and the issue of circular imports, and I very much appreciated those criticisms. But I am disappointed at the quality of the objections to these modified proposals.
msg299675 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2017-08-02 21:51
> Your code will not work if rename=True is needed. It works just fine: >>> NT = auto_namedtuple('name', 'name', 'def', rename=True) >>> print(NT.__doc__) name_name_def(name, _1, _2)
msg299679 - (view)	Author: Isaac Morland (Isaac Morland)	Date: 2017-08-03 00:38
Not if one of the attributes is something that cannot be part of a typename: >>> fields = ['def', '-'] >>> namedtuple ('test', fields, rename=True).__doc__ 'test(_0, _1)' >>> namedtuple ('__'.join (fields), fields, rename=True).__doc__ Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/collections.py", line 339, in namedtuple 'alphanumeric characters and underscores: %r' % name) ValueError: Type names and field names can only contain alphanumeric characters and underscores: 'def_-' >>> Which I admit is a weird thing to be doing, but duplicating attribute names or trying to use a keyword as an attribute name (or anything else that requires rename=True) is also weird. Also it's far from clear that the pre-renaming field names are what is wanted in the auto-generated typename. If I was actually using attribute names that required renaming I would want the auto-generated typename to match the renamed attributes. The original fieldnames play no part in the operation of the namedtuple class or its instances once it has been created: only the renamed fieldnames even remain reachable from the namedtuple object. Anyway I think I'm probably out at this point. I think Python development is not a good cultural fit for me, based on this discussion. Which is weird, since I love working in Python. I even like the whitespace indentation, although admittedly not quite as much as I thought I would before I tried it. I hugely enjoy the expressiveness of the language features, combined with the small but useful set of immediately-available library functions, together with the multitude of importable standard modules backing it all up. But I should have known when functools.compose (which ought to be almost the first thing in any sort of "functional programming" library) was rejected that I should stay away from attempting to get involved in the enhancement side of things.
msg299680 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2017-08-03 02:11
> Also it's far from clear that the pre-renaming field names are > what is wanted in the auto-generated typename. I concur. > Anyway I think I'm probably out at this point. Okay, marking this as closed. Thank you for the suggestion. Sorry this didn't pan out. > I think Python development is not a good cultural fit > for me, based on this discussion. This particular proposal didn't seem compelling to us. Other suggestions are welcome. If you're the same Isaac Morlund who participated in the initial development of namedtuple() ten years ago, then you should know that the design of the _replace() method was principally due to your suggestion.

History
Date	User	Action	Args
2022-04-11 14:58:49	admin	set	github: 75268
2017-08-03 02:11:10	rhettinger	set	status: open -> closed resolution: rejected messages: + msg299680 stage: resolved
2017-08-03 00:38:23	Isaac Morland	set	messages: + msg299679
2017-08-02 21:51:26	rhettinger	set	messages: + msg299675
2017-08-02 21:05:10	Isaac Morland	set	messages: + msg299674
2017-08-02 21:00:06	r.david.murray	set	messages: + msg299673
2017-08-02 20:50:38	Isaac Morland	set	messages: + msg299672
2017-08-02 05:12:44	rhettinger	set	messages: + msg299643
2017-08-01 18:32:16	r.david.murray	set	messages: + msg299627
2017-08-01 17:39:14	Isaac Morland	set	messages: + msg299623
2017-08-01 15:02:43	r.david.murray	set	nosy: + r.david.murray messages: + msg299619
2017-07-31 12:53:04	methane	set	nosy: + methane messages: + msg299554
2017-07-31 12:28:23	Isaac Morland	set	messages: + msg299552
2017-07-31 08:41:49	rhettinger	set	assignee: rhettinger messages: + msg299541
2017-07-31 05:26:22	ethan.furman	set	nosy: + ethan.furman
2017-07-31 04:14:24	steven.daprano	set	nosy: + steven.daprano messages: + msg299536
2017-07-31 01:11:49	ned.deily	set	nosy: + rhettinger
2017-07-31 01:10:38	Isaac Morland	set	messages: + msg299533
2017-07-31 01:05:54	Isaac Morland	create