Title: small cleanups in Unicode normalization code
Type: Stage: patch review
Components: Unicode Versions: Python 3.9
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Greg Price, benjamin.peterson, ezio.melotti, rhettinger
Priority: normal Keywords: patch

Created on 2019-09-06 04:46 by Greg Price, last changed 2019-09-10 19:31 by rhettinger.

Pull Requests
URL Status Linked Edit
PR 15711 merged Greg Price, 2019-09-06 05:22
PR 15712 merged Greg Price, 2019-09-06 05:24
PR 15558 Greg Price, 2019-09-06 05:27
Messages (4)
msg351229 - (view) Author: Greg Price (Greg Price) * Date: 2019-09-06 04:46
Benjamin noticed in reviewing GH-15558 (for #37966) several points where the existing code around Unicode normalization can be improved:

* on the `QuickcheckResult` enum:
  > Maybe `` should output this enum (with better name namespacing)

* > merging `test_normalization` into this file [i.e. ``] for clarity

* > These "boolean int" parameters could be actual `bool`s. [sc. the `nfc` and `k` parameters to `is_normalized_quickcheck`]

None of these are super hard, so good to knock them out while we're thinking of them.
msg351373 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2019-09-09 09:16
New changeset 7669cb8b21c7c9cef758609c44017c09d1ce4658 by Benjamin Peterson (Greg Price) in branch 'master':
bpo-38043: Use `bool` for boolean flags on is_normalized_quickcheck. (GH-15711)
msg351600 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2019-09-10 09:29
New changeset 1ad0c776cb640be9f19c8019bbf34bb4aba312ad by Benjamin Peterson (Greg Price) in branch 'master':
bpo-38043: Move unicodedata.normalize tests into test_unicodedata. (GH-15712)
msg351740 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-09-10 19:31
This is mostly harmless but I'm concerned that we're encouraging a new Python developer to:

* churn code in mostly minor ways, irrelevant to users

* altering code long known to be stable, increasing
  the risk of introducing new bugs or performance changes

* altering code in ways that are atypical for our 
  code base (i.e. the bool type isn't a norm in our
  code, we mostly use int for that)

* altering code without communicating with the developer
  who originally wrote that code (if they are still active)

* consuming the time of reviewers when they could be working
  on known bugs, legitimate feature requests, or documentation

* one-off or drive-by code alterations rather that what
  Guido calls "holistic refactoring" where we do clean-ups
  while understanding and thinking about the module as a
  whole and focusing on the user experience.

* unfortunately, making lots of random, minor changes to
  a code base in a major project is an addictive experience
  and IMO it would be best to re-channel it early, particularly
  if the changes are motivated by "I like my style of coding
  more than that of the original contributor".  Style changes
  are highly subjective and usually we defer to the original
  contributor who was closest to the problem being solved.
Date User Action Args
2019-09-10 19:31:45rhettingersetnosy: + rhettinger
messages: + msg351740
2019-09-10 14:51:13vstinnersetnosy: - vstinner
2019-09-10 09:29:29benjamin.petersonsetmessages: + msg351600
2019-09-09 09:16:34benjamin.petersonsetmessages: + msg351373
2019-09-06 05:27:57Greg Pricesetpull_requests: + pull_request15368
2019-09-06 05:24:24Greg Pricesetpull_requests: + pull_request15367
2019-09-06 05:22:27Greg Pricesetkeywords: + patch
stage: patch review
pull_requests: + pull_request15366
2019-09-06 04:46:54Greg Pricecreate