This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: str.translate needs a mapping example
Type: enhancement Stage: needs patch
Components: Documentation Versions: Python 3.7, Python 3.6, Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Chris.Barker, ChrisBarker, Gaurav Tatke, Jim.Jewett, cheryl.sabella, docs@python, rhettinger, serhiy.storchaka
Priority: normal Keywords: easy, patch

Created on 2016-11-04 17:02 by Jim.Jewett, last changed 2022-04-11 14:58 by admin.

Files
File name Uploaded Description Edit
translateexample_issue28612.patch Gaurav Tatke, 2016-12-29 10:38 review
Messages (10)
msg280061 - (view) Author: Jim Jewett (Jim.Jewett) * (Python triager) Date: 2016-11-04 17:02
One commonly needed string transformation is stripping out certain characters (or only keeping certain characters).  This is common enough that it might be worth a dedicated method, except, that, as Stephen J. Turnbull wrote in https://mail.python.org/pipermail/python-ideas/2016-November/043501.html

"""
So really translate with defaultdict is a specialized loop that
marries an algorithmic body (which could do things like look up the
original script or other character properties to decide on the
replacement for the generic case) with a (usually "small") table of
exceptions.  That seems like inspired design to me.
"""

Alas, while inspired, it isn't obvious to someone who isn't yet used to the power of python custom classes.

The documentation (such as https://docs.python.org/3/library/stdtypes.html?highlight=translate#str.translate ) should include such an example.

One possible example would be a defaultdict that says to discard any characters except lower case ASCII lettersI.
msg280063 - (view) Author: Jim Jewett (Jim.Jewett) * (Python triager) Date: 2016-11-04 18:20
https://mail.python.org/pipermail/python-ideas/2016-November/043539.html by Chris Barker points out that a custom object (which doesn't ever store the missing "keys") may be better still... though I'm not sure it is better enough to complicate the docs.
msg280070 - (view) Author: Chris Barker (ChrisBarker) * Date: 2016-11-04 19:12
Agreed:

the custom dict type would be nice for a recipe or blog post or...

but not for the docs.

I'll note that the other trick to this recipe is that you need to know to use lambda to make a "None factory" for defaultdict -- though maybe that's a ToDo for the defaultdict docs...
msg284244 - (view) Author: Gaurav Tatke (Gaurav Tatke) * Date: 2016-12-29 10:38
Hi,

I am new to Python and want to contribute. I am attaching a patch having required example of using defaultdict with translate. Please let me know if anything needs to be changed. I have tested the example and also the html doc in my local.

Regards,
Gaurav
msg284309 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-12-30 05:23
I like the idea of adding a mapping example but don't want to encourage use of defaultdict in contexts like this one.  A defaultdict usefully specifies a default but has the unpleasant side-effect of altering the dictionary (adding new keys) during the look-up phase.  This has bitten a lot of people (including famous ones like Peter Norvig).
msg284316 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-12-30 05:57
If the side effect of defaultdict is unpleasant, the correct way is combining the translation mapping with the custom mapping by ChainMap. But this example is too complex for the documentation of str.translate(). On other side, it is trivial for more experience users and don't need special mentioning.

I think other resources (ActiveState Code Reciepes [1] or books) are better places for this example.

[1] http://code.activestate.com/recipes/popular/
msg284342 - (view) Author: Gaurav Tatke (Gaurav Tatke) * Date: 2016-12-30 20:15
Hi,

Pardon my ignorance, I am new to this but have below queries/thoughts -

1. Why would we say that adding new keys during lookup phase is an unpleasant side-effect? From what I understood by docs, one of the main reasons to use defaultdicts is to be able to insert a missing key and give a default value to it. 'defaultdict' doc itself suggest that doing this is cleaner and faster than using dict.setdefault().

2. I believe defaultdict perfectly fits in this context of creating translation table for str.translate(). Even if we have very large string containing all characters from 4-5 languages, our defaultdict will still be comparatively small. It is easier to create a translation table using defaultdict when we have to strip most characters out of a string as in the example requested in the issue. Creating a translation table using str.maketrans() or by user defined function is tricky in this use case.

3. If argument for using defaultdict in this context is not convincing, shall I just give a patch, giving example of str.translate() using str.maketranse()?

Regards,
msg284343 - (view) Author: Gaurav Tatke (Gaurav Tatke) * Date: 2016-12-30 20:17
Should a user be suggested to use str.translate() for the use case where user only wants to keep certain characters and strip off everything else?
msg284354 - (view) Author: Christopher Barker (Chris.Barker) Date: 2016-12-30 23:40
This all came out of a thread on python-ideas, starting here:

https://mail.python.org/pipermail/python-ideas/2016-October/043284.html

the thread kind of petered out, but it seems there was a kinda-sorta
consensus that we didn't need any new string methods, but rather same notes
in the docs on how to to use .translate() to remove "all but these" was in
order.

And the defaultdict method was proposed as the easiest / most pythonic.

As it happens, I did't live the fact hat defaultdict will build up a
big(ish) dict of Nones for no reason, and thus suggested a NoneDict option:

class NoneDict(dict):
    """
    Dictionary implementation that always returns None when a key is not in
the dict,
    rather than raising a KeyError
    """
    def __getitem__(self, key):
        try:
            val = dict.__getitem__(self, key)
        except KeyError:
            val = None
        return val

Though maybe that's a bit much for the docs.

However, in short:

either the defaultdict approach is siple and pythonic enough to be in teh
docs, or we SHOULD add something new to the string object.

(or maybe someone has another nifty pythonic way to do this with the stdlib
that's better than defaultdict?)

-CHB

On Fri, Dec 30, 2016 at 12:18 PM, Gaurav Tatke <report@bugs.python.org>
wrote:

>
> Gaurav Tatke added the comment:
>
> Should a user be suggested to use str.translate() for the use case where
> user only wants to keep certain characters and strip off everything else?
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue28612>
> _______________________________________
>

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@noaa.gov
msg313796 - (view) Author: Cheryl Sabella (cheryl.sabella) * (Python committer) Date: 2018-03-13 23:47
IDLE just added similar functionality to pyparse (issue 32940) using:

    class ParseMap(dict):
        def __missing__(self, key):
            return 120 # ord('x')

    # Map all ascii to 120 to avoid __missing__ call, then replace some.
    trans = ParseMap.fromkeys(range(128), 120)
    trans.update((ord(c), ord('(')) for c in "({[")  # open brackets => '(';
    trans.update((ord(c), ord(')')) for c in ")}]")  # close brackets => ')'.
    trans.update((ord(c), ord(c)) for c in "\"'\\\n#") # Keep these.

    code = code.translate(trans)

Of course, all that is probably too much for a docs example, but it uses a mapping without the side effect of defaultdict.  I wonder if defining the dict subclass with __missing__ and then the example of keeping only lowercase letters would work for the docs?
History
Date User Action Args
2022-04-11 14:58:39adminsetgithub: 72798
2018-03-13 23:47:12cheryl.sabellasetnosy: + cheryl.sabella
messages: + msg313796
2016-12-30 23:40:16Chris.Barkersetnosy: + Chris.Barker
messages: + msg284354
2016-12-30 20:17:59Gaurav Tatkesetmessages: + msg284343
2016-12-30 20:15:33Gaurav Tatkesetmessages: + msg284342
2016-12-30 05:57:49serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg284316
2016-12-30 05:23:24rhettingersetnosy: + rhettinger
messages: + msg284309
2016-12-29 10:38:14Gaurav Tatkesetfiles: + translateexample_issue28612.patch

nosy: + Gaurav Tatke
messages: + msg284244

keywords: + patch
2016-11-04 19:12:27ChrisBarkersetnosy: + ChrisBarker
messages: + msg280070
2016-11-04 18:20:23Jim.Jewettsetmessages: + msg280063
2016-11-04 17:02:34Jim.Jewettcreate