Message 105010 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	Arfrever, lemburg, loewis, pitrou, vstinner
Date	2010-05-05.10:07:00
SpamBayes Score	5.595341e-05
Marked as misclassified	No
Message-id	<4BE14342.2030502@egenix.com>
In-reply-to	<1273051865.07.0.321860913525.issue8610@psf.upfronthosting.co.za>

Content
STINNER Victor wrote: > > STINNER Victor <victor.stinner@haypocalc.com> added the comment: > >> I think that using ASCII is a safer choice in case of errors. > > I choosed UTF-8 to keep backward compatibility: PyUnicode_DecodeFSDefaultAndSize() uses utf-8 if Py_FileSystemDefaultEncoding==NULL. If the OS has no nl_langinfo(CODESET) function at all, Python3 uses utf-8. Ouch, that was a poor choice. In Python we have a tradition to avoid guessing, if possible. Since we cannot guarantee that the file system will indeed use UTF-8, it would have been safer to use ASCII. Not sure why this reasoning wasn't applied for the file system encoding. Nothing we can do about now, though. >> Using UTF-8 may be safe for reading file names, but it's not >> safe for creating files or directories. > > Well, I don't know. You are maybe right. And which encoding should be used if nl_langinfo(CODESET) function is missing: ASCII or UTF-8? > > UTF-8 is also an optimist choice: I bet that more and more OS will move to UTF-8. I think we should also add a new environment variable to override the automatic determination of the file system encoding, much like what we have for the I/O encoding: PYTHONFSENCODING: Encoding[:errors] used for file system. (that would need to go on a new ticket, though)

STINNER Victor wrote:
> 
> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
> 
>> I think that using ASCII is a safer choice in case of errors.
> 
> I choosed UTF-8 to keep backward compatibility: PyUnicode_DecodeFSDefaultAndSize() uses utf-8 if Py_FileSystemDefaultEncoding==NULL. If the OS has no nl_langinfo(CODESET) function at all, Python3 uses utf-8.

Ouch, that was a poor choice. In Python we have a tradition to
avoid guessing, if possible. Since we cannot guarantee that the
file system will indeed use UTF-8, it would have been safer to
use ASCII. Not sure why this reasoning wasn't applied for
the file system encoding.

Nothing we can do about now, though.

>> Using UTF-8 may be safe for reading file names, but it's not
>> safe for creating files or directories.
> 
> Well, I don't know. You are maybe right. And which encoding should be used if nl_langinfo(CODESET) function is missing: ASCII or UTF-8?
> 
> UTF-8 is also an optimist choice: I bet that more and more OS will move to UTF-8.

I think we should also add a new environment variable to override
the automatic determination of the file system encoding, much like
what we have for the I/O encoding:

PYTHONFSENCODING: Encoding[:errors] used for file system.

(that would need to go on a new ticket, though)

History
Date	User	Action	Args
2010-05-05 10:07:03	lemburg	set	recipients: + lemburg, loewis, pitrou, vstinner, Arfrever
2010-05-05 10:07:00	lemburg	link	issue8610 messages
2010-05-05 10:07:00	lemburg	create