Author Suzumizaki
Recipients Suzumizaki, amaury.forgeotdarc, brett.cannon, eric.snow, ncoghlan, serhiy.storchaka, vstinner
Date 2014-02-05.08:52:27
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
Thank you Victor about msg210125, I read the discussion on ML, May 2011.

Inside the articles, the previous discussion on tracker is found:
"On Windows, don't encode filenames in the import machinery"

Here is my memo, might be helpful to review the discussions.

-- About Window CE --
* Windows CE series have GetProcAddress() at First.
* with Windows CE 3.0, GetProcAddressA() is added.
* but Python community chose 'A' version to support Windows CE.
* Windows CE continues as Windows Embedded Compact today.
* but Python3 for Windows CE seems not to be distributed.

-- About Windows Desktop and Servers --
* Windows Desktops and Servers have GetProcAddress() only, neither A nor W postfix appended.
* GetProcAddress() on Windows Desktop and Servers takes LPCSTR as the 2nd parameter.
* but the parameter, in this case, is null-terminated binary block. neither MBCS nor UTF-8. 
* Visual C++ 2010 encodes non-ASCII export symbols as UTF-8.
* Because the 2 reasons described above the 2 lines, We can give UTF-8 encoded string to GetProcAddress().

I checked the last fact with my Window Japanese Editions:
* XP Home Edition (32bit)
* Vista Home Premium (64bit)
* Windows 8.1 Pro (64bit)

GetProcAddress (Windows CE)
The type of the 2nd parameter is LPC"W"STR, and the document says LPCSTR version added on CE 3.0.

GetProcAddress (Windows Desktop/Server)
The type of the 2nd parameter is LPCSTR, nor LPC"T"STR neither LPC"W"STR.
Note that the example seems to be wrong about using TEXT macro.

PythonCE (seems stopped at Python 2.5 compatible)

Symbols seem to be encoded utf-8 inside Windows Executable

-- About C/C++ Standards --
* C99 says the significant length of identifiers are 63.
* C99 allows to use Unicode to name identifiers.
* but not define how to translate \uNNNN or \uNNNNNNNN forms used in "quotations".
* C++11 defines u8"" literals. we can make utf-8 char* string inside u8"quotes" with \u formats.
* but the encoding of source file is platform dependent.
* also, how to export symbols is platform dependent.

-- About C/C++ tool kits --
* Window Executable can contain 2048 chars per each exported symbol.
* Visual C++ 2010 seems to encode exporting symbols with UTF-8.
* gcc don't have logical limit of the length of identifiers.
* Currently, Visual C++ 2010 and LLVM/Clang supports using UTF-8 in whole source code.
* gcc only support \uNNNN or \uNNNNNNNN form.
* About GetProcAddress() functions, see previous memo about Windows.
Date User Action Args
2014-02-05 08:52:28Suzumizakisetrecipients: + Suzumizaki, brett.cannon, amaury.forgeotdarc, ncoghlan, vstinner, eric.snow, serhiy.storchaka
2014-02-05 08:52:28Suzumizakisetmessageid: <>
2014-02-05 08:52:28Suzumizakilinkissue20485 messages
2014-02-05 08:52:27Suzumizakicreate