Title: str.translate needs a mapping example
Created on 2016-11-04 17:02 by Jim.Jewett, last changed 2016-12-30 23:40 by Chris.Barker.

Author: Jim Jewett (Jim.Jewett) Date: 2016-11-04 17:02
One commonly needed string transformation is stripping out certain characters (or only keeping certain characters).  This is common enough that it might be worth a dedicated method, except, that, as Stephen J. Turnbull wrote in

So really translate with defaultdict is a specialized loop that
marries an algorithmic body (which could do things like look up the
original script or other character properties to decide on the
replacement for the generic case) with a (usually "small") table of
exceptions.  That seems like inspired design to me.

Alas, while inspired, it isn't obvious to someone who isn't yet used to the power of python custom classes.

The documentation (such as ) should include such an example.

One possible example would be a defaultdict that says to discard any characters except lower case ASCII lettersI.
Author: Jim Jewett (Jim.Jewett) Date: 2016-11-04 18:20 by Chris Barker points out that a custom object (which doesn't ever store the missing "keys") may be better still... though I'm not sure it is better enough to complicate the docs.
Author: Chris Barker (ChrisBarker) Date: 2016-11-04 19:12

the custom dict type would be nice for a recipe or blog post or...

but not for the docs.

I'll note that the other trick to this recipe is that you need to know to use lambda to make a "None factory" for defaultdict -- though maybe that's a ToDo for the defaultdict docs...
Author: Gaurav Tatke (Gaurav Tatke) Date: 2016-12-29 10:38

I am new to Python and want to contribute. I am attaching a patch having required example of using defaultdict with translate. Please let me know if anything needs to be changed. I have tested the example and also the html doc in my local.

Author: Raymond Hettinger (rhettinger) Date: 2016-12-30 05:23
I like the idea of adding a mapping example but don't want to encourage use of defaultdict in contexts like this one.  A defaultdict usefully specifies a default but has the unpleasant side-effect of altering the dictionary (adding new keys) during the look-up phase.  This has bitten a lot of people (including famous ones like Peter Norvig).
Author: Serhiy Storchaka (serhiy.storchaka) Date: 2016-12-30 05:57
If the side effect of defaultdict is unpleasant, the correct way is combining the translation mapping with the custom mapping by ChainMap. But this example is too complex for the documentation of str.translate(). On other side, it is trivial for more experience users and don't need special mentioning.

I think other resources (ActiveState Code Reciepes [1] or books) are better places for this example.

Author: Gaurav Tatke (Gaurav Tatke) Date: 2016-12-30 20:15

Pardon my ignorance, I am new to this but have below queries/thoughts -

1. Why would we say that adding new keys during lookup phase is an unpleasant side-effect? From what I understood by docs, one of the main reasons to use defaultdicts is to be able to insert a missing key and give a default value to it. 'defaultdict' doc itself suggest that doing this is cleaner and faster than using dict.setdefault().

2. I believe defaultdict perfectly fits in this context of creating translation table for str.translate(). Even if we have very large string containing all characters from 4-5 languages, our defaultdict will still be comparatively small. It is easier to create a translation table using defaultdict when we have to strip most characters out of a string as in the example requested in the issue. Creating a translation table using str.maketrans() or by user defined function is tricky in this use case.

3. If argument for using defaultdict in this context is not convincing, shall I just give a patch, giving example of str.translate() using str.maketranse()?

Author: Gaurav Tatke (Gaurav Tatke) Date: 2016-12-30 20:17
Should a user be suggested to use str.translate() for the use case where user only wants to keep certain characters and strip off everything else?
Author: Christopher Barker (Chris.Barker) Date: 2016-12-30 23:40
This all came out of a thread on python-ideas, starting here:

the thread kind of petered out, but it seems there was a kinda-sorta
consensus that we didn't need any new string methods, but rather same notes
in the docs on how to to use .translate() to remove "all but these" was in

And the defaultdict method was proposed as the easiest / most pythonic.

As it happens, I did't live the fact hat defaultdict will build up a
big(ish) dict of Nones for no reason, and thus suggested a NoneDict option:

class NoneDict(dict):
    Dictionary implementation that always returns None when a key is not in
the dict,
    rather than raising a KeyError
    def __getitem__(self, key):
            val = dict.__getitem__(self, key)
        except KeyError:
            val = None
        return val

Though maybe that's a bit much for the docs.

However, in short:

either the defaultdict approach is siple and pythonic enough to be in teh
docs, or we SHOULD add something new to the string object.

(or maybe someone has another nifty pythonic way to do this with the stdlib
that's better than defaultdict?)


