This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients vstinner
Date 2011-06-30.12:20:49
SpamBayes Score 8.3266727e-16
Marked as misclassified No
Message-id <1309436451.46.0.759652312058.issue12451@psf.upfronthosting.co.za>
In-reply-to
Content
open() uses the locale encoding in Python 3 when opening text file if the encoding argument is not specified (implicit). Some functions use locale encoding, but it's not the right encoding. I see at least three cases where the encoding should be changed:

 - UTF-8 should be used instead for portability: it's a bug in the module
 - ASCII must be used instead: the module doesn't support non-ASCII characters (old file formats, old network protocols, some fields of a document, etc.)
 - ASCII can be used instead: it's just a micro-optimization, the ASCII encoding is  a little bit faster

To detect the usage of the implicit locale encoding, some functions can be monkeypatched:

 - builtins.open, io.open, _pyio.open
 - io.TextIOWrapper, _pyio.TextIOWrapper
 - more functions using directly or indirectly open/TextIOWrapper may be patched to emit the warning earlier

Attached open_hook.patch implements these hooks (hacks?) in the site module: it emits a ResourceWarning. Use python -Werror to raise an error if the locale encoding is used implicitly. If you really want to use the locale encoding, use encoding='locale' to make quiet the warning.

Quite all functions in Python uses the implicit locale encoding. For example, Python doesn't start with the patch and -Werror. If you use -Werror, you have to patch *all* calls to open()/TextIOWrapper to be able to locate real bugs, or the program will stop before hitting the real problems. Each time you have to check what is the real expected encoding, it takes a lot of time.

I started this huge project. I'm using ASCII most of the time (especially in Python tests), I don't know if it's correct. It will require a second step to ensure that the function really don't use/support non-ASCII characters.

I will use this issue for my commits, attach patches, and more generally discuss this topic.
History
Date User Action Args
2011-06-30 12:20:51vstinnersetrecipients: + vstinner
2011-06-30 12:20:51vstinnersetmessageid: <1309436451.46.0.759652312058.issue12451@psf.upfronthosting.co.za>
2011-06-30 12:20:50vstinnerlinkissue12451 messages
2011-06-30 12:20:50vstinnercreate