This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Clarify SortingHOWTO regarding locale aware string sorting
Type: Stage:
Components: Documentation Versions: Python 3.11, Python 3.10
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: CendioOssman, rhettinger, steven.daprano
Priority: normal Keywords:

Created on 2022-04-08 10:31 by CendioOssman, last changed 2022-04-11 14:59 by admin.

Messages (2)
msg416972 - (view) Author: Pierre Ossman (CendioOssman) Date: 2022-04-08 10:31
There is a big gotcha in Python that is easily overlooked and should at the very least be more prominently pointed out in the documentation.

Sorting strings will produce results that is very confusing for humans.

I happens to work for ASCII, but will generally produce bad results for other things as code points do not always follow the alphabetical order.

The expressions chapter¹ mentions this fact, but you have to dig quite a bit to reach that. It also mentions that normalization is an issue, but it never mentions the issue about code point order versus alphabetical order.

The sorting tutorial mentions under "Odds and ends"² that you need to use a special key or comparison function to get locale aware sorting. It doesn't mention that this also includes respecting alphabetical order, which might be overlooked unless you are very familiar with how the sorting works. The tutorial is also something you have to dig a bit to reach.

Ideally string comparison would always be locale aware in a high level language such as Python. However, a smaller step would be a note on sorted()³ that extra care needs to be taken for strings as the default behaviour will produce unexpected results once your strings include anything outside the English alphabet.

¹ https://docs.python.org/3/reference/expressions.html
² https://docs.python.org/3/howto/sorting.html#odd-and-ends
³ https://docs.python.org/3/library/functions.html#sorted
msg416997 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2022-04-08 17:45
I don't think splashing this everywhere else in the docs would be helpful.  Tools like list.sort, sorted, min, max, nlargest, nsmallest use whatever sort order is provided by the underlying object whether it be a string, tuple, float, or int.

The section on expressions is the intended place to cover how comparison are defined for core objects:  https://docs.python.org/3/reference/expressions.html#value-comparisons

As suggested, I will edit the sorting howto to be cleared that locale aware sort ordering refers to alphabetical orderings which can vary (for example, the Spanish ll sorts differently in different locales).
History
Date User Action Args
2022-04-11 14:59:58adminsetgithub: 91415
2022-04-08 17:49:04rhettingersettitle: string sorting often incorrect -> Clarify SortingHOWTO regarding locale aware string sorting
versions: + Python 3.10, Python 3.11
2022-04-08 17:45:22rhettingersetassignee: rhettinger
messages: + msg416997
components: + Documentation, - Interpreter Core
2022-04-08 12:12:34rhettingersetnosy: + rhettinger
2022-04-08 10:46:02steven.dapranosetnosy: + steven.daprano
2022-04-08 10:31:59CendioOssmancreate