classification
Title: Add option for namedtuple to name its result type automatically
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.7
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: Isaac Morland, ethan.furman, inada.naoki, r.david.murray, rhettinger, steven.daprano
Priority: normal Keywords:

Created on 2017-07-31 01:05 by Isaac Morland, last changed 2017-08-03 02:11 by rhettinger. This issue is now closed.

Messages (16)
msg299532 - (view) Author: Isaac Morland (Isaac Morland) Date: 2017-07-31 01:05
I would like to have the possibility of creating a namedtuple type without explicitly giving it a name.  I see two major use cases for this:

1) Automatic creation of namedtuples for things like CSV files with headers (see #1818) or SQL results (see #13299).  In this case at the point of calling namedtuple I have column headings (or otherwise automatically-determined attribute names), but there probably isn't a specific class name that makes sense to use.

2) Subclassing from a namedtuple invocation; I obviously need to name my subclass, but the name passed to the namedtuple invocation is essentially useless.

My idea is to allow giving None for the typename parameter of namedtuple, like this:

class MyCustomBehaviourNamedtuple (namedtuple (None, ['a', 'b'])):
    ...

In this case namedtuple will generate a name based on the field names.

This should be backward compatible because right now passing None raises a TypeError.  So there is no change if a non-None typename is passed, and an exception is replaced by computing a default typename if None is passed.

Patch to follow.
msg299533 - (view) Author: Isaac Morland (Isaac Morland) Date: 2017-07-31 01:10
I'm hoping to make a pull request but while I figure that out here is the diff:

diff --git a/Lib/collections/__init__.py b/Lib/collections/__init__.py
index 8408255..62cf708 100644
--- a/Lib/collections/__init__.py
+++ b/Lib/collections/__init__.py
@@ -384,7 +384,6 @@ def namedtuple(typename, field_names, *, verbose=False, rename=False, module=Non
     if isinstance(field_names, str):
         field_names = field_names.replace(',', ' ').split()
     field_names = list(map(str, field_names))
-    typename = str(typename)
     if rename:
         seen = set()
         for index, name in enumerate(field_names):
@@ -394,6 +393,10 @@ def namedtuple(typename, field_names, *, verbose=False, rename=False, module=Non
                 or name in seen):
                 field_names[index] = '_%d' % index
             seen.add(name)
+    if typename is None:
+        typename = '__'.join (field_names)
+    else:
+        typename = str(typename)
     for name in [typename] + field_names:
         if type(name) is not str:
             raise TypeError('Type names and field names must be strings')
msg299536 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2017-07-31 04:14
If you don't care about the name, just pass '_' for it.
msg299541 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-07-31 08:41
I concur with Steven.
msg299552 - (view) Author: Isaac Morland (Isaac Morland) Date: 2017-07-31 12:28
I want a meaningful name to appear in debugging output generated by repr() or str(), not just _ all over the place.  I just don't want to specifically come up with the meaningful name myself.

Right now I pass in the same generated name ('__'.join (field_names)) to the constructor, but this means I need to repeat that logic in any other similar application, and I would have to put in special handling if any of my attribute names required renaming.

I would rather be explicit that I'm not providing a specific name.  With your '_' suggestion it looks like a magic value - why '_'?  By specifying None, it's obvious at the call point that I'm explicitly declining to provide a name, and then the code generates a semi-meaningful name automatically.

Also, please note that I moved the place where typename is assigned to after the part where it handles the rename stuff, so the generated names automatically incorporate a suitable default and remain valid identifiers.

I'm having trouble seeing the downside here.  I'm adding one "is None" check and one line of code to the existing procedure.  I can't believe I'm the only person who has wanted to skip making up a type name but still wanted something vaguely meaningful in debug output.
msg299554 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2017-07-31 12:53
When subclassing, current __repr__ uses `self.__class__.__name__`.  So you get meaningful name already.

When automatic generation, I recommend you to use some wrapper to cache same namedtuple, since creating namedtuple on the fly is costly job.

I'm afraid "unnnamed" namedtuple may lead people to use namedtuple on the fly, like lambda.
msg299619 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-08-01 15:02
I think the "vaguely" pretty much says it, and you are the at least the first person who has *requested* it :)

This is one of those cost-versus-benefit calculations.  It is a specialized use case, and in other specialized use cases the "automatically generated" name that makes the most sense is likely to be something different than an amalgamation of the field names.

So I vote -0.5.  I don't think even the small complication of the existing code is worth it, but I'm not strongly opposed.
msg299623 - (view) Author: Isaac Morland (Isaac Morland) Date: 2017-08-01 17:39
First, another note I would like to point out: this is much nicer to write
within namedtuple than as a wrapper function because it is trivial to use
the existing rename logic when needed, as seen in the diff I provided. I
suppose I could write a wrapper which calls namedtuple and then changes the
class name after creation but that just feels icky. The only other
alternatives would be to duplicate the rename logic or have the wrapper not
work with rename.

By way of response to R. David Murray: Every use case, of everything, is
specialized. Another way of thinking of what I'm suggesting is that I would
like to make providing a typename optional, and have the library do its
best based on the other information provided in the call to namedtuple.
This pretty well has to mean mashing the fieldnames together in some way
because no other information about the contents of the namedtuple is
provided. So I think this is a very natural feature: what else could it
possibly mean to pass None for the typename?

If for a particular application some other more meaningful auto-generated
name is needed, that could still be provided to namedtuple(). For example,
an ORM that uses the underlying table name.

In response to other suggestions, I don't see how one can prefer "_" all
over the place in debugging output to a string that identifies the
fieldnames involved. Or really, just the option of having a string that
identifies the fieldnames: I'm not forcing anyone to stop passing '_'.

To INADA Naoki: thanks for pointing that out. I agree that in the subclass
case it no longer matters what typename is used for the namedtuple itself.
But isn't that a good reason to allow skipping the parameter, or (since you
can't just skip positional parameters) passing an explicit None?

On 1 August 2017 at 11:02, R. David Murray <report@bugs.python.org> wrote:

>
> R. David Murray added the comment:
>
> I think the "vaguely" pretty much says it, and you are the at least the
> first person who has *requested* it :)
>
> This is one of those cost-versus-benefit calculations.  It is a
> specialized use case, and in other specialized use cases the "automatically
> generated" name that makes the most sense is likely to be something
> different than an amalgamation of the field names.
>
> So I vote -0.5.  I don't think even the small complication of the existing
> code is worth it, but I'm not strongly opposed.
>
> ----------
> nosy: +r.david.murray
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue31085>
> _______________________________________
>
msg299627 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-08-01 18:32
The specialized use case is wanting to autogenerate a name with no other information provided.  You suggested csv as one example where this would be used, but even in that case I'd rather see something based on the filename than a mashup of field names.  I would also personally rather see '_' than a long string of field names (it would make the debug output prettier because the lines would be shorter).

In contrast, being able to specify a name satisfies a wide variety of use cases, including that of autogenerating names with no other information provided.  Which is why that is included in the API.

I hear you about the rename logic.  But for myself, since I don't like the idea of the name being a mashup of the field names, it isn't convincing :)

I wrote a "parameterized tests" extension for unittest, and it has the option of autogenerating the test name from the parameter names and values.  I've never used that feature, and I am considering ripping it out before I release the package, to simplify the code.  If I do I might replace it with a hook for generating the test name so that the user can choose their own auto-naming scheme.

Perhaps that would be an option here: a hook for generating the name, that would be called where you want your None processing to be?  That would not be simpler than your proposal, but it would be more general (satisfy more use cases) and might be worth the cost.  On the other hand, other developers might not like the API bloat ;)
msg299643 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-08-02 05:12
[R David Murray]
> So I vote -0.5.

Put me down for a full -1:

* This would be a potentially  confusing addition to the API.

* It may also encourage bad practices that we don't want to see in real code. 

* We want to be able to search for the namedtuple definition, want to have a meaningful repr, and want pickling to be easy.

* This doesn't have to be shoe-horned into the namedtuple API.  If an actual need did arise, it is trivial to write a wrapper that specifies whatever auto-naming logic happens to make sense for a particular application:

    >>> from collections import namedtuple
    >>> def auto_namedtuple(*attrnames, **kwargs):
            typename = '_'.join(attrnames)
            return namedtuple(typename, attrnames, **kwargs)

    >>> NT = auto_namedtuple('name', 'rank', 'serial')
    >>> print(NT.__doc__)
    name_rank_serial(name, rank, serial)
msg299672 - (view) Author: Isaac Morland (Isaac Morland) Date: 2017-08-02 20:50
On 1 August 2017 at 14:32, R. David Murray <report@bugs.python.org> wrote:

>
> R. David Murray added the comment:
>
> I wrote a "parameterized tests" extension for unittest, and it has the
> option of autogenerating the test name from the parameter names and
> values.  I've never used that feature, and I am considering ripping it out
> before I release the package, to simplify the code.  If I do I might
> replace it with a hook for generating the test name so that the user can
> choose their own auto-naming scheme.
>
> Perhaps that would be an option here: a hook for generating the name, that
> would be called where you want your None processing to be?  That would not
> be simpler than your proposal, but it would be more general (satisfy more
> use cases) and might be worth the cost.  On the other hand, other
> developers might not like the API bloat ;)
>

It's August, not April. Raymond Hettinger is accusing my proposed API of
being potentially confusing, while you're suggesting providing a hook? All
I want is the option of telling namedtuple() to make up its own typename,
for situations where there should be one but I don't want to provide it.

Having said that, if people really think a hook like this is worth doing,
I'll implement it. But I agree that it seems excessively complicated. Let's
see if auto-generation is useful first, then if somebody wants a different
auto-generation, provide the capability.
msg299673 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-08-02 21:00
Yeah, different developers have different opinions.  We discuss (I'd say argue, which is accurate, but has acquired negative connotations) until we reach a consensus.  And if we don't reach a consensus we leave it alone ("status quo wins a stalemate").
msg299674 - (view) Author: Isaac Morland (Isaac Morland) Date: 2017-08-02 21:05
OK, so it's pretty clear this is heading towards a rejection, but I can't
help but respond to your points:

On 2 August 2017 at 01:12, Raymond Hettinger <report@bugs.python.org> wrote:

* This would be a potentially  confusing addition to the API.
>

I'm giving a natural meaning to providing a None where it is not permitted
now. The meaning is to provide a reasonable value for the missing
parameter. How could that be confusing? Also it's completely ignorable -
people don't have to pass None and get the auto-generated typename if they
don't want to.

> * It may also encourage bad practices that we don't want to see in real
> code.
>

What bad practices? There are lots of times when providing an explicit name
is a waste of effort. This provides a simple way of telling the library to
figure it out. Aren't there supposedly just two hard things in computer
science? Naming things, and cache invalidation. An opportunity to avoid
naming things that don't need to be specifically named is something worth
taking.

> * We want to be able to search for the namedtuple definition, want to have
> a meaningful repr, and want pickling to be easy.
>

You mean by searching for the typename in the source code? In my primary
usecase, the typename is computed regardless, so it doesn't appear in the
source code and can't be searched for. The other suggestion which appeared
at one point was passing "_" as the typename. This is going to be somewhat
challenging to search for also.

As to the meaningful repr, that is why I want auto-generation of the
typename. This is not for uses like this:

MyType = namedtuple ('MyType', ['a', 'b', 'c'])

It is for ones more like this:

rowtype = namedtuple (None, row_headings)

Or as it currently has to be:

rowtype = namedtuple ('rowtype', row_headings)

(leading to all the rowtypes being the same name, so less meaningful)

Or:

rowtype = namedtuple ('__'.join (row_headings), row_headings)

(which repeats the irrelevant-in-its-details computation wherever it is
needed and doesn't support rename=True, unless a more complicated
computation that duplicates code inside of namedtuple() is repeated)

Finally I'm not clear on how pickling is made more difficult by having
namedtuple() generate a typename. The created type still has a typename.
But I'm interested - this is the only point I don't think I understand.

* This doesn't have to be shoe-horned into the namedtuple API.  If an
> actual need did arise, it is trivial to write a wrapper that specifies
> whatever auto-naming logic happens to make sense for a particular
> application:
>
>     >>> from collections import namedtuple
>     >>> def auto_namedtuple(*attrnames, **kwargs):
>             typename = '_'.join(attrnames)
>             return namedtuple(typename, attrnames, **kwargs)
>
>     >>> NT = auto_namedtuple('name', 'rank', 'serial')
>     >>> print(NT.__doc__)
>     name_rank_serial(name, rank, serial)

Your code will not work if rename=True is needed. I don't want to repeat
the rename logic as doing so is a code smell.

In short, I'm disappointed. I'm not surprised to make a suggestion, and
have people point out problems. For example, my original proposal ignored
the difficulties of creating the C implementation, and the issue of
circular imports, and I very much appreciated those criticisms. But I am
disappointed at the quality of the objections to these modified proposals.
msg299675 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-08-02 21:51
> Your code will not work if rename=True is needed.

It works just fine:
        
>>> NT = auto_namedtuple('name', 'name', 'def', rename=True)
>>> print(NT.__doc__)
name_name_def(name, _1, _2)
msg299679 - (view) Author: Isaac Morland (Isaac Morland) Date: 2017-08-03 00:38
Not if one of the attributes is something that cannot be part of a typename:

>>> fields = ['def', '-']

>>> namedtuple ('test', fields, rename=True).__doc__

'test(_0, _1)'

>>> namedtuple ('__'.join (fields), fields, rename=True).__doc__

Traceback (most recent call last):

  File "<stdin>", line 1, in <module>

  File
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/collections.py",
line 339, in namedtuple

    'alphanumeric characters and underscores: %r' % name)

ValueError: Type names and field names can only contain alphanumeric
characters and underscores: 'def_-'

>>>

Which I admit is a weird thing to be doing, but duplicating attribute names
or trying to use a keyword as an attribute name (or anything else that
requires rename=True) is also weird.

Also it's far from clear that the pre-renaming field names are what is
wanted in the auto-generated typename. If I was actually using attribute
names that required renaming I would want the auto-generated typename to
match the renamed attributes. The original fieldnames play no part in the
operation of the namedtuple class or its instances once it has been
created: only the renamed fieldnames even remain reachable from the
namedtuple object.

Anyway I think I'm probably out at this point. I think Python development
is not a good cultural fit for me, based on this discussion. Which is
weird, since I love working in Python. I even like the whitespace
indentation, although admittedly not quite as much as I thought I would
before I tried it. I hugely enjoy the expressiveness of the language
features, combined with the small but useful set of immediately-available
library functions, together with the multitude of importable standard
modules backing it all up. But I should have known when functools.compose
(which ought to be almost the first thing in any sort of "functional
programming" library) was rejected that I should stay away from attempting
to get involved in the enhancement side of things.
msg299680 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-08-03 02:11
> Also it's far from clear that the pre-renaming field names are 
> what is wanted in the auto-generated typename.

I concur.


> Anyway I think I'm probably out at this point.

Okay, marking this as closed.  Thank you for the suggestion.  Sorry this didn't pan out.


> I think Python development is not a good cultural fit
> for me, based on this discussion. 

This particular proposal didn't seem compelling to us.  Other suggestions are welcome.  If you're the same Isaac Morlund who participated in the initial development of namedtuple() ten years ago, then you should know that the design of the _replace() method was principally due to your suggestion.
History
Date User Action Args
2017-08-03 02:11:10rhettingersetstatus: open -> closed
resolution: rejected
messages: + msg299680

stage: resolved
2017-08-03 00:38:23Isaac Morlandsetmessages: + msg299679
2017-08-02 21:51:26rhettingersetmessages: + msg299675
2017-08-02 21:05:10Isaac Morlandsetmessages: + msg299674
2017-08-02 21:00:06r.david.murraysetmessages: + msg299673
2017-08-02 20:50:38Isaac Morlandsetmessages: + msg299672
2017-08-02 05:12:44rhettingersetmessages: + msg299643
2017-08-01 18:32:16r.david.murraysetmessages: + msg299627
2017-08-01 17:39:14Isaac Morlandsetmessages: + msg299623
2017-08-01 15:02:43r.david.murraysetnosy: + r.david.murray
messages: + msg299619
2017-07-31 12:53:04inada.naokisetnosy: + inada.naoki
messages: + msg299554
2017-07-31 12:28:23Isaac Morlandsetmessages: + msg299552
2017-07-31 08:41:49rhettingersetassignee: rhettinger
messages: + msg299541
2017-07-31 05:26:22ethan.furmansetnosy: + ethan.furman
2017-07-31 04:14:24steven.dapranosetnosy: + steven.daprano
messages: + msg299536
2017-07-31 01:11:49ned.deilysetnosy: + rhettinger
2017-07-31 01:10:38Isaac Morlandsetmessages: + msg299533
2017-07-31 01:05:54Isaac Morlandcreate