This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ncoghlan
Recipients Arfrever, ezio.melotti, lemburg, martin.panter, ncoghlan, pitrou, r.david.murray, serhiy.storchaka, sjt, steven.daprano, vstinner
Date 2018-03-30.06:52:39
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1522392759.88.0.467229070634.issue18814@psf.upfronthosting.co.za>
In-reply-to
Content
With PEPs 538 and 540 implemented for 3.7, my thinking on this has evolved a bit.

A recent discussion on python-ideas [1] also introduced me to the third party library, "ftfy", which offers a wide range of tools for cleaning up improperly decoded data: https://ftfy.readthedocs.io/en/latest/

That includes a lone surrogate fixer: https://ftfy.readthedocs.io/en/latest/#ftfy.fixes.fix_surrogates

So a potential way to go here would be to a section on "Handling Improperly Decoded Text Data" to the codecs module documentation, and include ftfy as a See Also link in that new section.

If folks think that would be a reasonable way to go, then I think the clearest way to handle it would be to close this issue as "later" (which still implies "maybe never", but not as strongly as "rejected" does), and open a new issue for the suggested new section in the docs.

[1] https://mail.python.org/pipermail/python-ideas/2018-January/048583.html
History
Date User Action Args
2018-03-30 06:52:40ncoghlansetrecipients: + ncoghlan, lemburg, pitrou, vstinner, ezio.melotti, Arfrever, steven.daprano, r.david.murray, sjt, martin.panter, serhiy.storchaka
2018-03-30 06:52:39ncoghlansetmessageid: <1522392759.88.0.467229070634.issue18814@psf.upfronthosting.co.za>
2018-03-30 06:52:39ncoghlanlinkissue18814 messages
2018-03-30 06:52:39ncoghlancreate