This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author steve.dower
Recipients methane, ncoghlan, steve.dower, vstinner
Date 2019-03-06.18:07:14
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1551895634.42.0.584100112577.issue36204@roundup.psfhosted.org>
In-reply-to
Content
> If you want to force the usage of UTF-8, you can opt-in for UTF-8 mode: call putenv("PYTHONUTF8=1") before Py_UnixMain() for example.

I'm not talking about forcing UTF-8, I'm talking about *assuming* it (and letting "someone else" worry about forcing it).

As I understand it UTF-8 mode, is about overriding the environment's apparent encoding and saying "skip our detection logic and always encode/decode via UTF-8". That is part of the encoding detection logic.

Our embedding APIs currently accept "whatever" and try to figure out the encoding on the inside. I'm proposing that they should accept "UTF-8" and the caller has to figure out the encoding (maybe with our helper functions).

That way embedders can just worry about UTF-8 consistently, instead of having to work around our workarounds for encoding detection.
History
Date User Action Args
2019-03-06 18:07:14steve.dowersetrecipients: + steve.dower, ncoghlan, vstinner, methane
2019-03-06 18:07:14steve.dowersetmessageid: <1551895634.42.0.584100112577.issue36204@roundup.psfhosted.org>
2019-03-06 18:07:14steve.dowerlinkissue36204 messages
2019-03-06 18:07:14steve.dowercreate