Author brett.cannon
Recipients amaury.forgeotdarc, brett.cannon, sjmachin
Date 2009-01-03.01:13:44
SpamBayes Score 1.26998e-07
Marked as misclassified No
Message-id <1230945228.36.0.298587550738.issue4626@psf.upfronthosting.co.za>
In-reply-to
Content
Here is what I have found out so far.
Python/bltinmodule.c:builtin_compile takes in a PyObject and gets the
char * representation of that object and passes it to
Python/pythonrun.c:Py_CompileStringFlags. Unfortunately no other
information is passed along in the call, including what the encoding
happens to be. This is unfortunate as builtin_compile makes sure that
the char* data is encoded using the default encoding before calling
Py_CompileStringFlags.

I just tried setting a PyCF flag to denote that the char* data is
encoded using the default encoding, but Parser/tokenizer.c is not
compiled against unicodeobject.c and thus one cannot use
PyUnicode_GetDefaultEncoding() to know what the data is stored as.

I'm going to try to explicitly convert to UTF-8 and see if that works.
History
Date User Action Args
2009-01-03 01:13:48brett.cannonsetrecipients: + brett.cannon, sjmachin, amaury.forgeotdarc
2009-01-03 01:13:48brett.cannonsetmessageid: <1230945228.36.0.298587550738.issue4626@psf.upfronthosting.co.za>
2009-01-03 01:13:47brett.cannonlinkissue4626 messages
2009-01-03 01:13:45brett.cannoncreate