This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Programming FAQ about "How do you remove duplicates from a list?" -- Improve the examples + Mention possible caveats
Type: enhancement Stage:
Components: Documentation Versions: Python 3.9, Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Dominik V., docs@python
Priority: normal Keywords:

Created on 2020-04-20 21:53 by Dominik V., last changed 2022-04-11 14:59 by admin.

Messages (1)
msg366893 - (view) Author: Dominik Vilsmeier (Dominik V.) * Date: 2020-04-20 21:53
https://docs.python.org/3/faq/programming.html#how-do-you-remove-duplicates-from-a-list

In the beginning it points to the recipes at https://code.activestate.com/recipes/52560/ which does mention various caveats such as

> [...] whether [elements are] hashable, and whether they support full comparisons.

It then shows a concrete example implementation which however does require that the elements define a total ordering. The code for the example is pretty long so it might discourage new programmers before they even discover the most likely best solution which comes at the end of the section:

    list(set(mylist))

This seems by far the most useful solution with evidence from this StackOverflow question: https://stackoverflow.com/questions/7961363/removing-duplicates-in-lists

Hence I propose two changes:

1. Include the first sentence of the abstract from the recipes at https://code.activestate.com/recipes/52560/ in the FAQs: "The fastest way to remove duplicates from a sequence depends on some pretty subtle properties of the sequence elements, such as whether they're hashable, and whether they support full comparisons." at the beginning in order to mention possible caveats.
2. Either remove or move the code example relying on `sort` in order to give more visibility to the most likely more relevant solution using `set`. In any case it has the disclaimer about hashability and hence won't trick people into believing it works for all cases.

If the `sort` example is not removed, at least it's description should mention that elements must define a total ordering (e.g. if the elements are sets it won't generally work).
History
Date User Action Args
2022-04-11 14:59:29adminsetgithub: 84527
2020-04-20 21:53:49Dominik V.create