Message 389098 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	eryksun, lemburg, methane, vstinner
Date	2021-03-19.14:56:53
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<ad56a593-943f-af5f-3da0-f65f2ca89eab@egenix.com>
In-reply-to	<1616161656.73.0.605355864914.issue43552@roundup.psfhosted.org>

Content
On 19.03.2021 14:47, STINNER Victor wrote: > > STINNER Victor <vstinner@python.org> added the comment: > >> - If you add "current", people will rightly ask: then what do all the >> other APIs in the locale module return ? Of course, they all return >> the current state of settings :-) So this is unnecessary as well. > > The problem is that there are two different "locale encodings", what I call: > > * "current locale encoding": nl_langinfo(CODESET) in short > * "Python locale encoding": "UTF-8" in some cases, nl_langinfo(CODESET) otherwise The UTF-8 mode is a Python invention. It doesn't have anything to do with the lib C locale functions, which this module addresses and interfaces to. Please don't mix the two. In fact, in order to avoid issues, Python should probably set the locale encoding to UTF-8 as well, when run in UTF-8 mode. It's dangerous to have Python and the lib C use different assumptions about the encoding, esp. in embedded applications. > It is unfortunate that the Python UTF-8 Mode which "ignores the locale" changes the behavior of the locale module, of the locale.getpreferredencoding() function. But the ship has sailed. > > People are used to look into the "locale" module to get the "locale" encoding. So I prefer to put the function to get the "Python locale encoding" in the locale module. > > I propose to add "current" in the name since this encoding is not the one you are looking for usually. > > An alternative is to have a single function with an optional parameter. Example: > > * get_locale_encoding() or get_locale_encoding(True) returns the locale encoding > * get_locale_encoding(False) returns the current locale encoding -1, both on the names and the idea to again add parameters which change their meaning. We should have one function per meaning and really only need the interface getencoding(), since the UTF-8 mode doesn't fit into the locale module scope.

On 19.03.2021 14:47, STINNER Victor wrote:
> 
> STINNER Victor <vstinner@python.org> added the comment:
> 
>> - If you add "current", people will rightly ask: then what do all the
>> other APIs in the locale module return ? Of course, they all return
>> the current state of settings :-) So this is unnecessary as well.
> 
> The problem is that there are two different "locale encodings", what I call:
> 
> * "current locale encoding": nl_langinfo(CODESET) in short
> * "Python locale encoding": "UTF-8" in some cases, nl_langinfo(CODESET) otherwise

The UTF-8 mode is a Python invention. It doesn't have anything to
do with the lib C locale functions, which this module addresses and
interfaces to.

Please don't mix the two.

In fact, in order to avoid issues, Python should probably set the locale
encoding to UTF-8 as well, when run in UTF-8 mode. It's dangerous to
have Python and the lib C use different assumptions about the encoding,
esp. in embedded applications.

> It is unfortunate that the Python UTF-8 Mode which "ignores the locale" changes the behavior of the locale module, of the locale.getpreferredencoding() function. But the ship has sailed.
> 
> People are used to look into the "locale" module to get the "locale" encoding. So I prefer to put  the function to get the "Python locale encoding" in the locale module.
> 
> I propose to add "current" in the name since this encoding is not the one you are looking for usually.
> 
> An alternative is to have a single function with an optional parameter. Example:
> 
> * get_locale_encoding() or get_locale_encoding(True) returns the locale encoding
> * get_locale_encoding(False) returns the current locale encoding

-1, both on the names and the idea to again add parameters which change
their meaning. We should have one function per meaning and really
only need the interface getencoding(), since the UTF-8 mode
doesn't fit into the locale module scope.

History
Date	User	Action	Args
2021-03-19 14:56:53	lemburg	set	recipients: + lemburg, vstinner, methane, eryksun
2021-03-19 14:56:53	lemburg	link	issue43552 messages
2021-03-19 14:56:53	lemburg	create