This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author lemburg
Recipients amaury.forgeotdarc, doerwalter, eric.smith, ezio.melotti, flox, lemburg, vstinner
Date 2010-02-24.10:02:41
SpamBayes Score 1.7136292e-13
Marked as misclassified No
Message-id <4B84F940.40800@egenix.com>
In-reply-to <1267004642.18.0.456785037879.issue7649@psf.upfronthosting.co.za>
Content
Amaury Forgeot d'Arc wrote:
> 
> Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment:
> 
>> Could you please check for chars above 0x7f first and then use
>> PyUnicode_Decode() instead of the PyUnicode_FromStringAndSize() API
> 
> I concur: PyUnicode_FromStringAndSize() decodes with utf-8 whereas the expected conversion char->unicode should use the default encoding (ascii).
> But why is it necessary to check for chars above 0x7f?

The Python default encoding has to be ASCII compatible,
so it's better to use a short-cut for pure-ASCII characters
and avoid the complete round-trip via a temporary Unicode
object.

>> (this API should not have been backported from the Python 3.x
>> in Python 2.6,
> This function is still useful when the chars come from a C string literal in the source code (btw there should be something about the encoding used in C files). But it's not always correctly used even in 3.x, in posixmodule.c for example.

The function is a really just yet another interface to the
PyUnicode_DecodeUTF8() API and it's name is misleading in that:

Python 2.x uses the default encoding for converting strings without
known encoding to Unicode, the docs for the API say that
it decodes Latin-1 (!) and the interface makes it looks like
a drop-in replacement for PyString_FromStringAndSize() which
it isn't for Python 2.x.

For Python 3.x, the default encoding is fixed to UTF-8, so the
situation is different (though the docs are still wrong),
however I don't see the advantage of using a less explicit
name over the direct use of PyUnicode_DecodeUTF8().
History
Date User Action Args
2010-02-24 10:02:43lemburgsetrecipients: + lemburg, doerwalter, amaury.forgeotdarc, vstinner, eric.smith, ezio.melotti, flox
2010-02-24 10:02:42lemburglinkissue7649 messages
2010-02-24 10:02:41lemburgcreate