Message 105007 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	Arfrever, lemburg, loewis, pitrou, vstinner
Date	2010-05-05.09:16:42
SpamBayes Score	2.639662e-06
Marked as misclassified	No
Message-id	<4BE13778.3080108@egenix.com>
In-reply-to	<1273049882.6.0.568296637609.issue8610@psf.upfronthosting.co.za>

Content
STINNER Victor wrote: > > STINNER Victor <victor.stinner@haypocalc.com> added the comment: > >> manpage for nl_langinfo() doesn't mention any errors that could >> be raised by it > > It's more about get_codeset(). This function can fail for different reasons: > > - nl_langinfo() result is an empty string: "If item is not valid, a pointer to an empty string is returned." say the manpage > - _PyCodec_Lookup() failed: unable to import the encoding codec module, there is no such codec, codec machinery is broken, etc. > - the codec has no "name "attribute > - strdup() failure (no more memory) > > Do you think that you should fallback to ASCII if nl_langinfo() result is an empty string, and UTF-8 otherwise? get_codeset() failure is very unlikely, and I think that fallback to UTF-8 is just fine. A warning is printed to stderr, the user should try to understand why get_codeset() failed. I think that using ASCII is a safer choice in case of errors. Using UTF-8 may be safe for reading file names, but it's not safe for creating files or directories. I also think that an application should be able to update the file system encoding in such an error case (and only in such a case). The application may have better knowledge about how it's being used and can provide correct encoding information by other means.

STINNER Victor wrote:
> 
> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
> 
>> manpage for nl_langinfo() doesn't mention any errors that could
>> be raised by it
> 
> It's more about get_codeset(). This function can fail for different reasons:
> 
>  - nl_langinfo() result is an empty string: "If item is not valid, a pointer to an empty string is returned." say the manpage
>  - _PyCodec_Lookup() failed: unable to import the encoding codec module, there is no such codec, codec machinery is broken, etc.
>  - the codec has no "name "attribute
>  - strdup() failure (no more memory)
> 
> Do you think that you should fallback to ASCII if nl_langinfo() result is an empty string, and UTF-8 otherwise? get_codeset() failure is very unlikely, and I think that fallback to UTF-8 is just fine. A warning is printed to stderr, the user should try to understand why get_codeset() failed.

I think that using ASCII is a safer choice in case of errors.
Using UTF-8 may be safe for reading file names, but it's not
safe for creating files or directories.

I also think that an application should be able to update the
file system encoding in such an error case (and only in such a case).
The application may have better knowledge about how it's being
used and can provide correct encoding information by other means.

History
Date	User	Action	Args
2010-05-05 09:16:44	lemburg	set	recipients: + lemburg, loewis, pitrou, vstinner, Arfrever
2010-05-05 09:16:43	lemburg	link	issue8610 messages
2010-05-05 09:16:42	lemburg	create