Author Suzumizaki
Recipients Suzumizaki, amaury.forgeotdarc, brett.cannon, eric.snow, ncoghlan, serhiy.storchaka, vstinner
Date 2014-02-05.08:52:27
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1391590348.73.0.252941573893.issue20485@psf.upfronthosting.co.za>
In-reply-to
Content
Thank you Victor about msg210125, I read the discussion on ML, May 2011.

Inside the articles, the previous discussion on tracker is found:
"On Windows, don't encode filenames in the import machinery"
http://bugs.python.org/issue11619

Here is my memo, might be helpful to review the discussions.

-- About Window CE --
* Windows CE series have GetProcAddress() at First.
* with Windows CE 3.0, GetProcAddressA() is added.
* but Python community chose 'A' version to support Windows CE.
* Windows CE continues as Windows Embedded Compact today.
* but Python3 for Windows CE seems not to be distributed.

-- About Windows Desktop and Servers --
* Windows Desktops and Servers have GetProcAddress() only, neither A nor W postfix appended.
* GetProcAddress() on Windows Desktop and Servers takes LPCSTR as the 2nd parameter.
* but the parameter, in this case, is null-terminated binary block. neither MBCS nor UTF-8. 
* Visual C++ 2010 encodes non-ASCII export symbols as UTF-8.
* Because the 2 reasons described above the 2 lines, We can give UTF-8 encoded string to GetProcAddress().

I checked the last fact with my Window Japanese Editions:
* XP Home Edition (32bit)
* Vista Home Premium (64bit)
* Windows 8.1 Pro (64bit)

GetProcAddress (Windows CE)
The type of the 2nd parameter is LPC"W"STR, and the document says LPCSTR version added on CE 3.0.
http://msdn.microsoft.com/en-us/library/ms885634.aspx

GetProcAddress (Windows Desktop/Server)
The type of the 2nd parameter is LPCSTR, nor LPC"T"STR neither LPC"W"STR.
Note that the example seems to be wrong about using TEXT macro.
http://msdn.microsoft.com/en-us/library/windows/desktop/ms683212(v=vs.85).aspx

PythonCE (seems stopped at Python 2.5 compatible)
http://pythonce.sourceforge.net/

Symbols seem to be encoded utf-8 inside Windows Executable 
https://mail.python.org/pipermail/python-dev/2011-May/111325.html

-- About C/C++ Standards --
* C99 says the significant length of identifiers are 63.
* C99 allows to use Unicode to name identifiers.
* but not define how to translate \uNNNN or \uNNNNNNNN forms used in "quotations".
* C++11 defines u8"" literals. we can make utf-8 char* string inside u8"quotes" with \u formats.
* but the encoding of source file is platform dependent.
* also, how to export symbols is platform dependent.

-- About C/C++ tool kits --
* Window Executable can contain 2048 chars per each exported symbol.
* Visual C++ 2010 seems to encode exporting symbols with UTF-8.
* gcc don't have logical limit of the length of identifiers.
* Currently, Visual C++ 2010 and LLVM/Clang supports using UTF-8 in whole source code.
* gcc only support \uNNNN or \uNNNNNNNN form.
* About GetProcAddress() functions, see previous memo about Windows.
History
Date User Action Args
2014-02-05 08:52:28Suzumizakisetrecipients: + Suzumizaki, brett.cannon, amaury.forgeotdarc, ncoghlan, vstinner, eric.snow, serhiy.storchaka
2014-02-05 08:52:28Suzumizakisetmessageid: <1391590348.73.0.252941573893.issue20485@psf.upfronthosting.co.za>
2014-02-05 08:52:28Suzumizakilinkissue20485 messages
2014-02-05 08:52:27Suzumizakicreate