Author mattheww
Recipients benjamin.peterson, lemburg, mattheww, ncoghlan, serhiy.storchaka
Date 2017-09-25.22:37:16
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1506379036.8.0.0300622709681.issue30755@psf.upfronthosting.co.za>
In-reply-to
Content
I've investigated a bit more.

First, I've tried with Python 3.7.0a1 . As you'd expect, PEP 537 means
this behaviour now also occurs when no locale environment variables at
all are set.


Second, I've looked through locale.py a bit. I believe what it calls the
"aliasing engine" is applied for:

 - getlocale()
 - getdefaultlocale()
 - setlocale() when passed a tuple, but not when passed a string


This leads to some rather odd results.

With 3.7.0a1 and no locale environment variables:

  >>> import locale
  >>> locale.getlocale()
  ('en_US', 'UTF-8')

  # getlocale() is lying: the effective locale is really C.UTF-8
  >>> sorted("abcABC", key=locale.strxfrm)
  ['A', 'B', 'C', 'a', 'b', 'c']


Third, I've checked on a system which does have en_US.UTF-8 installed,
and (as you'd expect) instead of crashing it gives wrong results:

  >>> import locale
  >>> locale.setlocale(locale.LC_ALL, ('C', 'UTF-8'))
  'en_US.UTF-8'
  >>> locale.getlocale()
  ('en_US', 'UTF-8')

  # now getlocale() is telling the truth, and the user isn't getting the
  # collation they requested
  >>> sorted("abcABC", key=locale.strxfrm)
  ['a', 'A', 'b', 'B', 'c', 'C']
History
Date User Action Args
2017-09-25 22:37:16matthewwsetrecipients: + mattheww, lemburg, ncoghlan, benjamin.peterson, serhiy.storchaka
2017-09-25 22:37:16matthewwsetmessageid: <1506379036.8.0.0300622709681.issue30755@psf.upfronthosting.co.za>
2017-09-25 22:37:16matthewwlinkissue30755 messages
2017-09-25 22:37:16matthewwcreate