Issue28612
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2016-11-04 17:02 by Jim.Jewett, last changed 2022-04-11 14:58 by admin.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
translateexample_issue28612.patch | Gaurav Tatke, 2016-12-29 10:38 | review |
Messages (10) | |||
---|---|---|---|
msg280061 - (view) | Author: Jim Jewett (Jim.Jewett) * | Date: 2016-11-04 17:02 | |
One commonly needed string transformation is stripping out certain characters (or only keeping certain characters). This is common enough that it might be worth a dedicated method, except, that, as Stephen J. Turnbull wrote in https://mail.python.org/pipermail/python-ideas/2016-November/043501.html """ So really translate with defaultdict is a specialized loop that marries an algorithmic body (which could do things like look up the original script or other character properties to decide on the replacement for the generic case) with a (usually "small") table of exceptions. That seems like inspired design to me. """ Alas, while inspired, it isn't obvious to someone who isn't yet used to the power of python custom classes. The documentation (such as https://docs.python.org/3/library/stdtypes.html?highlight=translate#str.translate ) should include such an example. One possible example would be a defaultdict that says to discard any characters except lower case ASCII lettersI. |
|||
msg280063 - (view) | Author: Jim Jewett (Jim.Jewett) * | Date: 2016-11-04 18:20 | |
https://mail.python.org/pipermail/python-ideas/2016-November/043539.html by Chris Barker points out that a custom object (which doesn't ever store the missing "keys") may be better still... though I'm not sure it is better enough to complicate the docs. |
|||
msg280070 - (view) | Author: Chris Barker (ChrisBarker) * | Date: 2016-11-04 19:12 | |
Agreed: the custom dict type would be nice for a recipe or blog post or... but not for the docs. I'll note that the other trick to this recipe is that you need to know to use lambda to make a "None factory" for defaultdict -- though maybe that's a ToDo for the defaultdict docs... |
|||
msg284244 - (view) | Author: Gaurav Tatke (Gaurav Tatke) * | Date: 2016-12-29 10:38 | |
Hi, I am new to Python and want to contribute. I am attaching a patch having required example of using defaultdict with translate. Please let me know if anything needs to be changed. I have tested the example and also the html doc in my local. Regards, Gaurav |
|||
msg284309 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2016-12-30 05:23 | |
I like the idea of adding a mapping example but don't want to encourage use of defaultdict in contexts like this one. A defaultdict usefully specifies a default but has the unpleasant side-effect of altering the dictionary (adding new keys) during the look-up phase. This has bitten a lot of people (including famous ones like Peter Norvig). |
|||
msg284316 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2016-12-30 05:57 | |
If the side effect of defaultdict is unpleasant, the correct way is combining the translation mapping with the custom mapping by ChainMap. But this example is too complex for the documentation of str.translate(). On other side, it is trivial for more experience users and don't need special mentioning. I think other resources (ActiveState Code Reciepes [1] or books) are better places for this example. [1] http://code.activestate.com/recipes/popular/ |
|||
msg284342 - (view) | Author: Gaurav Tatke (Gaurav Tatke) * | Date: 2016-12-30 20:15 | |
Hi, Pardon my ignorance, I am new to this but have below queries/thoughts - 1. Why would we say that adding new keys during lookup phase is an unpleasant side-effect? From what I understood by docs, one of the main reasons to use defaultdicts is to be able to insert a missing key and give a default value to it. 'defaultdict' doc itself suggest that doing this is cleaner and faster than using dict.setdefault(). 2. I believe defaultdict perfectly fits in this context of creating translation table for str.translate(). Even if we have very large string containing all characters from 4-5 languages, our defaultdict will still be comparatively small. It is easier to create a translation table using defaultdict when we have to strip most characters out of a string as in the example requested in the issue. Creating a translation table using str.maketrans() or by user defined function is tricky in this use case. 3. If argument for using defaultdict in this context is not convincing, shall I just give a patch, giving example of str.translate() using str.maketranse()? Regards, |
|||
msg284343 - (view) | Author: Gaurav Tatke (Gaurav Tatke) * | Date: 2016-12-30 20:17 | |
Should a user be suggested to use str.translate() for the use case where user only wants to keep certain characters and strip off everything else? |
|||
msg284354 - (view) | Author: Christopher Barker (Chris.Barker) | Date: 2016-12-30 23:40 | |
This all came out of a thread on python-ideas, starting here: https://mail.python.org/pipermail/python-ideas/2016-October/043284.html the thread kind of petered out, but it seems there was a kinda-sorta consensus that we didn't need any new string methods, but rather same notes in the docs on how to to use .translate() to remove "all but these" was in order. And the defaultdict method was proposed as the easiest / most pythonic. As it happens, I did't live the fact hat defaultdict will build up a big(ish) dict of Nones for no reason, and thus suggested a NoneDict option: class NoneDict(dict): """ Dictionary implementation that always returns None when a key is not in the dict, rather than raising a KeyError """ def __getitem__(self, key): try: val = dict.__getitem__(self, key) except KeyError: val = None return val Though maybe that's a bit much for the docs. However, in short: either the defaultdict approach is siple and pythonic enough to be in teh docs, or we SHOULD add something new to the string object. (or maybe someone has another nifty pythonic way to do this with the stdlib that's better than defaultdict?) -CHB On Fri, Dec 30, 2016 at 12:18 PM, Gaurav Tatke <report@bugs.python.org> wrote: > > Gaurav Tatke added the comment: > > Should a user be suggested to use str.translate() for the use case where > user only wants to keep certain characters and strip off everything else? > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue28612> > _______________________________________ > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov |
|||
msg313796 - (view) | Author: Cheryl Sabella (cheryl.sabella) * | Date: 2018-03-13 23:47 | |
IDLE just added similar functionality to pyparse (issue 32940) using: class ParseMap(dict): def __missing__(self, key): return 120 # ord('x') # Map all ascii to 120 to avoid __missing__ call, then replace some. trans = ParseMap.fromkeys(range(128), 120) trans.update((ord(c), ord('(')) for c in "({[") # open brackets => '('; trans.update((ord(c), ord(')')) for c in ")}]") # close brackets => ')'. trans.update((ord(c), ord(c)) for c in "\"'\\\n#") # Keep these. code = code.translate(trans) Of course, all that is probably too much for a docs example, but it uses a mapping without the side effect of defaultdict. I wonder if defining the dict subclass with __missing__ and then the example of keeping only lowercase letters would work for the docs? |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:58:39 | admin | set | github: 72798 |
2018-03-13 23:47:12 | cheryl.sabella | set | nosy:
+ cheryl.sabella messages: + msg313796 |
2016-12-30 23:40:16 | Chris.Barker | set | nosy:
+ Chris.Barker messages: + msg284354 |
2016-12-30 20:17:59 | Gaurav Tatke | set | messages: + msg284343 |
2016-12-30 20:15:33 | Gaurav Tatke | set | messages: + msg284342 |
2016-12-30 05:57:49 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages: + msg284316 |
2016-12-30 05:23:24 | rhettinger | set | nosy:
+ rhettinger messages: + msg284309 |
2016-12-29 10:38:14 | Gaurav Tatke | set | files:
+ translateexample_issue28612.patch nosy: + Gaurav Tatke messages: + msg284244 keywords: + patch |
2016-11-04 19:12:27 | ChrisBarker | set | nosy:
+ ChrisBarker messages: + msg280070 |
2016-11-04 18:20:23 | Jim.Jewett | set | messages: + msg280063 |
2016-11-04 17:02:34 | Jim.Jewett | create |