Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

str.translate needs a mapping example #72798

Open
jimjjewett mannequin opened this issue Nov 4, 2016 · 10 comments
Open

str.translate needs a mapping example #72798

jimjjewett mannequin opened this issue Nov 4, 2016 · 10 comments
Labels
3.7 (EOL) end of life docs Documentation in the Doc dir easy type-feature A feature request or enhancement

Comments

@jimjjewett
Copy link
Mannequin

jimjjewett mannequin commented Nov 4, 2016

BPO 28612
Nosy @rhettinger, @JimJJewett, @serhiy-storchaka, @PythonCHB, @csabella
Files
  • translateexample_issue28612.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2016-11-04.17:02:34.846>
    labels = ['easy', 'type-feature', '3.7', 'docs']
    title = 'str.translate needs a mapping example'
    updated_at = <Date 2018-03-13.23:47:12.065>
    user = 'https://github.com/JimJJewett'

    bugs.python.org fields:

    activity = <Date 2018-03-13.23:47:12.065>
    actor = 'cheryl.sabella'
    assignee = 'docs@python'
    closed = False
    closed_date = None
    closer = None
    components = ['Documentation']
    creation = <Date 2016-11-04.17:02:34.846>
    creator = 'Jim.Jewett'
    dependencies = []
    files = ['46072']
    hgrepos = []
    issue_num = 28612
    keywords = ['patch', 'easy']
    message_count = 10.0
    messages = ['280061', '280063', '280070', '284244', '284309', '284316', '284342', '284343', '284354', '313796']
    nosy_count = 8.0
    nosy_names = ['rhettinger', 'Chris.Barker', 'docs@python', 'Jim.Jewett', 'serhiy.storchaka', 'ChrisBarker', 'Gaurav Tatke', 'cheryl.sabella']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = 'needs patch'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue28612'
    versions = ['Python 3.5', 'Python 3.6', 'Python 3.7']

    @jimjjewett
    Copy link
    Mannequin Author

    jimjjewett mannequin commented Nov 4, 2016

    One commonly needed string transformation is stripping out certain characters (or only keeping certain characters). This is common enough that it might be worth a dedicated method, except, that, as Stephen J. Turnbull wrote in https://mail.python.org/pipermail/python-ideas/2016-November/043501.html

    """
    So really translate with defaultdict is a specialized loop that
    marries an algorithmic body (which could do things like look up the
    original script or other character properties to decide on the
    replacement for the generic case) with a (usually "small") table of
    exceptions. That seems like inspired design to me.
    """

    Alas, while inspired, it isn't obvious to someone who isn't yet used to the power of python custom classes.

    The documentation (such as https://docs.python.org/3/library/stdtypes.html?highlight=translate#str.translate ) should include such an example.

    One possible example would be a defaultdict that says to discard any characters except lower case ASCII lettersI.

    @jimjjewett jimjjewett mannequin added the 3.7 (EOL) end of life label Nov 4, 2016
    @jimjjewett jimjjewett mannequin assigned docspython Nov 4, 2016
    @jimjjewett jimjjewett mannequin added docs Documentation in the Doc dir easy type-feature A feature request or enhancement labels Nov 4, 2016
    @jimjjewett
    Copy link
    Mannequin Author

    jimjjewett mannequin commented Nov 4, 2016

    https://mail.python.org/pipermail/python-ideas/2016-November/043539.html by Chris Barker points out that a custom object (which doesn't ever store the missing "keys") may be better still... though I'm not sure it is better enough to complicate the docs.

    @PythonCHB
    Copy link
    Mannequin

    PythonCHB mannequin commented Nov 4, 2016

    Agreed:

    the custom dict type would be nice for a recipe or blog post or...

    but not for the docs.

    I'll note that the other trick to this recipe is that you need to know to use lambda to make a "None factory" for defaultdict -- though maybe that's a ToDo for the defaultdict docs...

    @GauravTatke
    Copy link
    Mannequin

    GauravTatke mannequin commented Dec 29, 2016

    Hi,

    I am new to Python and want to contribute. I am attaching a patch having required example of using defaultdict with translate. Please let me know if anything needs to be changed. I have tested the example and also the html doc in my local.

    Regards,
    Gaurav

    @rhettinger
    Copy link
    Contributor

    I like the idea of adding a mapping example but don't want to encourage use of defaultdict in contexts like this one. A defaultdict usefully specifies a default but has the unpleasant side-effect of altering the dictionary (adding new keys) during the look-up phase. This has bitten a lot of people (including famous ones like Peter Norvig).

    @serhiy-storchaka
    Copy link
    Member

    If the side effect of defaultdict is unpleasant, the correct way is combining the translation mapping with the custom mapping by ChainMap. But this example is too complex for the documentation of str.translate(). On other side, it is trivial for more experience users and don't need special mentioning.

    I think other resources (ActiveState Code Reciepes [1] or books) are better places for this example.

    [1] http://code.activestate.com/recipes/popular/

    @GauravTatke
    Copy link
    Mannequin

    GauravTatke mannequin commented Dec 30, 2016

    Hi,

    Pardon my ignorance, I am new to this but have below queries/thoughts -

    1. Why would we say that adding new keys during lookup phase is an unpleasant side-effect? From what I understood by docs, one of the main reasons to use defaultdicts is to be able to insert a missing key and give a default value to it. 'defaultdict' doc itself suggest that doing this is cleaner and faster than using dict.setdefault().

    2. I believe defaultdict perfectly fits in this context of creating translation table for str.translate(). Even if we have very large string containing all characters from 4-5 languages, our defaultdict will still be comparatively small. It is easier to create a translation table using defaultdict when we have to strip most characters out of a string as in the example requested in the issue. Creating a translation table using str.maketrans() or by user defined function is tricky in this use case.

    3. If argument for using defaultdict in this context is not convincing, shall I just give a patch, giving example of str.translate() using str.maketranse()?

    Regards,

    @GauravTatke
    Copy link
    Mannequin

    GauravTatke mannequin commented Dec 30, 2016

    Should a user be suggested to use str.translate() for the use case where user only wants to keep certain characters and strip off everything else?

    @ChrisBarker
    Copy link
    Mannequin

    ChrisBarker mannequin commented Dec 30, 2016

    This all came out of a thread on python-ideas, starting here:

    https://mail.python.org/pipermail/python-ideas/2016-October/043284.html

    the thread kind of petered out, but it seems there was a kinda-sorta
    consensus that we didn't need any new string methods, but rather same notes
    in the docs on how to to use .translate() to remove "all but these" was in
    order.

    And the defaultdict method was proposed as the easiest / most pythonic.

    As it happens, I did't live the fact hat defaultdict will build up a
    big(ish) dict of Nones for no reason, and thus suggested a NoneDict option:

    class NoneDict(dict):
        """
        Dictionary implementation that always returns None when a key is not in
    the dict,
        rather than raising a KeyError
        """
        def __getitem__(self, key):
            try:
                val = dict.__getitem__(self, key)
            except KeyError:
                val = None
            return val

    Though maybe that's a bit much for the docs.

    However, in short:

    either the defaultdict approach is siple and pythonic enough to be in teh
    docs, or we SHOULD add something new to the string object.

    (or maybe someone has another nifty pythonic way to do this with the stdlib
    that's better than defaultdict?)

    -CHB

    On Fri, Dec 30, 2016 at 12:18 PM, Gaurav Tatke <report@bugs.python.org>
    wrote:

    Gaurav Tatke added the comment:

    Should a user be suggested to use str.translate() for the use case where
    user only wants to keep certain characters and strip off everything else?

    ----------


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue28612\>


    --

    Christopher Barker, Ph.D.
    Oceanographer

    Emergency Response Division
    NOAA/NOS/OR&R (206) 526-6959 voice
    7600 Sand Point Way NE (206) 526-6329 fax
    Seattle, WA 98115 (206) 526-6317 main reception

    Chris.Barker@noaa.gov

    @csabella
    Copy link
    Contributor

    IDLE just added similar functionality to pyparse (bpo-32940) using:

        class ParseMap(dict):
            def __missing__(self, key):
                return 120 # ord('x')
    
        # Map all ascii to 120 to avoid __missing__ call, then replace some.
        trans = ParseMap.fromkeys(range(128), 120)
        trans.update((ord(c), ord('(')) for c in "({[")  # open brackets => '(';
        trans.update((ord(c), ord(')')) for c in ")}]")  # close brackets => ')'.
        trans.update((ord(c), ord(c)) for c in "\"'\\\n#") # Keep these.
    
        code = code.translate(trans)

    Of course, all that is probably too much for a docs example, but it uses a mapping without the side effect of defaultdict. I wonder if defining the dict subclass with __missing__ and then the example of keeping only lowercase letters would work for the docs?

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life docs Documentation in the Doc dir easy type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants