This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author terry.reedy
Recipients cheryl.sabella, rhettinger, serhiy.storchaka, terry.reedy
Date 2019-03-09.01:04:15
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1552093456.21.0.157545746068.issue36219@roundup.psfhosted.org>
In-reply-to
Content
I support adding a new function, with these notes.

1. Let's limit the scope to actual reversible bugs introduced by 3rd party software we care about.  Let's not try to anticipate every possible issue.  Also, once we have a function to replace some unicode chars, I can imagine users requesting replacement of other unicode chars, such as math X-like multiplication symbol by '*'.  I am pretty sure that encouraging intentional unicode extensions would not pass core-dev review. ;-)

Raymond, do users encounter all of the characters and combinations Cheryl suggested?  Serhiy, do you know if real pdfs make the other changes you pointed at? Can you provide or suggest a specific test string?

2. I want to put the new feature on the Format menu.  A. The Edit menu is already overly long and B) the other items on Format already do various selection or whole-text fixups (inserts, replacements, and deletions). Possible menu entry: 'Replace non-ascii chars'.  This is 23 chars; the current longest entry is 25.  A 'hotkey' is not needed for something so rarely used.  (Some of the other items on Format don't need them either.)

I think including Format on the Shell menu, with a subset of entries active, should be a follow-up issue.  Another possible follow-up is to check pasted or opened text and offer to edit if appropriate.  I am wary of doing so automatically, especially to start.

3. We should not replace within strings and comments, but mangled strings may be hard to recognize as such.  Suppose '’' is mangled to ‘’’ (\u2018\u2019\u2019, open-close-close).  I am not sure how we should recognize to leave the middle character as is, except to reject anything that results in a syntax error.  I would rather do too few rather than too many edits.  I will be happy if we can start with something useful, not wrong, tested, and documented.
History
Date User Action Args
2019-03-09 01:04:16terry.reedysetrecipients: + terry.reedy, rhettinger, serhiy.storchaka, cheryl.sabella
2019-03-09 01:04:16terry.reedysetmessageid: <1552093456.21.0.157545746068.issue36219@roundup.psfhosted.org>
2019-03-09 01:04:16terry.reedylinkissue36219 messages
2019-03-09 01:04:15terry.reedycreate