This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Possible subtle bug when normalizing and str.translate()ing
Type: behavior Stage: resolved
Components: Versions: Python 3.4
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: SilentGhost, mark, peter.otten
Priority: normal Keywords:

Created on 2016-01-15 18:04 by mark, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
normbug.py mark, 2016-01-15 18:04 Run this program in an xterm multiple times to see the two different behaviors.
Messages (5)
msg258314 - (view) Author: Mark Summerfield (mark) * Date: 2016-01-15 18:04
I am using Python 3.4.3 on Xubuntu 14.04 LTS 64-bit.

I have a program that when run repeatedly sometimes what I expect, and sometimes does not:

$ ~/tmp/normbug.py 
OK ('The AEnid oevre', '==', 'The AEnid oevre')
$ ~/tmp/normbug.py 
OK ('The AEnid oevre', '==', 'The AEnid oevre')
$ ~/tmp/normbug.py 
OK ('The AEnid oevre', '==', 'The AEnid oevre')
$ ~/tmp/normbug.py 
BUG ('The aenid oevre', '!=', 'The AEnid oevre')
$ ~/tmp/normbug.py 
OK ('The AEnid oevre', '==', 'The AEnid oevre')
$ ~/tmp/normbug.py 
OK ('The AEnid oevre', '==', 'The AEnid oevre')
$ ~/tmp/normbug.py 
OK ('The AEnid oevre', '==', 'The AEnid oevre')
$ ~/tmp/normbug.py 
OK ('The AEnid oevre', '==', 'The AEnid oevre')
$ ~/tmp/normbug.py 
OK ('The AEnid oevre', '==', 'The AEnid oevre')
$ ~/tmp/normbug.py 
BUG ('The aenid oevre', '!=', 'The AEnid oevre')
$ ~/tmp/normbug.py 
BUG ('The aenid oevre', '!=', 'The AEnid oevre')
$ ~/tmp/normbug.py 
OK ('The AEnid oevre', '==', 'The AEnid oevre')

As you can see, sometimes the left (actual) is case-folded, and sometimes it isn't which is surprising given the code (which is attached).

Of course this could be a mistake on my part; maybe I've misunderstood how the unicode normalizing works.
msg258315 - (view) Author: Peter Otten (peter.otten) * Date: 2016-01-15 18:17
There seems to be a connection to hash randomization. I consistently get

$ PYTHONHASHSEED=1 python3.6 ./normbug.py 
BUG ('The aenid oevre', '!=', 'The AEnid oevre')
$ PYTHONHASHSEED=0 python3.6 ./normbug.py 
OK ('The AEnid oevre', '==', 'The AEnid oevre')
msg258316 - (view) Author: Peter Otten (peter.otten) * Date: 2016-01-15 18:34
Not a bug. In your XFORMS dict you have

>>> ord("Æ") == 0xC6
True

Whether the value of "Æ" or 0xC6 is used by str.maketrans() depends on the order of the dict entries which in turn is determined by the keys' hash. Remove one and you should see consistent results.
msg258317 - (view) Author: SilentGhost (SilentGhost) * (Python triager) Date: 2016-01-15 18:37
Mark, your XFORMS dictionary contains this entry: 0x00C6: "ae"
It should be 'AE'. The same applies to 0x0152: "oe" which should be 'OE'.
msg258368 - (view) Author: Mark Summerfield (mark) * Date: 2016-01-16 07:57
Thanks for looking at this. In my full translation dict I had some other mistakes of case, now all fixed:-)
History
Date User Action Args
2022-04-11 14:58:26adminsetgithub: 70314
2016-01-16 07:57:21marksetmessages: + msg258368
2016-01-15 18:37:19SilentGhostsetstatus: open -> closed
2016-01-15 18:37:04SilentGhostsetnosy: + SilentGhost
messages: + msg258317

resolution: not a bug
stage: resolved
2016-01-15 18:34:14peter.ottensetmessages: + msg258316
2016-01-15 18:17:49peter.ottensetnosy: + peter.otten
messages: + msg258315
2016-01-15 18:04:50markcreate