classification
Title: console encode is not utf-8!!
Type: behavior Stage: resolved
Components: Versions: Python 3.9
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: christian.heimes, eryksun, twoone3, vstinner
Priority: normal Keywords:

Created on 2021-02-01 14:15 by twoone3, last changed 2021-02-01 23:46 by twoone3. This issue is now closed.

Files
File name Uploaded Description Edit
Screenshot_2021_0201_225300.png twoone3, 2021-02-01 14:53
389e661314157b8f.jpg twoone3, 2021-02-01 15:10
Messages (12)
msg386069 - (view) Author: twoone3 (twoone3) Date: 2021-02-01 14:15
https://docs.python.org/3/c-api/init_config.html?highlight=pypreconfig_initpythonconfig#c.PyPreConfig
When I use this api,The coding of the console has not changed,utf8_mode is 1.
This is my code.
PyPreConfig_InitPythonConfig(&cfg);
cfg.utf8_mode = -1;
Py_PreInitialize(&cfg);
msg386070 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-02-01 14:18
What's the result of the Py_PreInitialize(&cfg) call?
msg386071 - (view) Author: twoone3 (twoone3) Date: 2021-02-01 14:25
PyPreConfig cfg;
		PyPreConfig_InitPythonConfig(&cfg);
		cfg.utf8_mode = -1;
		PyStatus status = Py_PreInitialize(&cfg);
		if (PyStatus_Exception(status)) {
			Py_ExitStatusException(status);
		}

I use this to test,there are no exceptions
msg386072 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-02-01 14:30
"utf8_mode = -1" falls back to command line, env vars, locales, and eventually disables UTF-8 mode. Try "cfg.utf8_mode = 1" as documented at https://docs.python.org/3/c-api/init_config.html?highlight=pypreconfig_initpythonconfig#c.Py_PreInitializeFromArgs
msg386073 - (view) Author: twoone3 (twoone3) Date: 2021-02-01 14:51
After I changed it to 1, the console code remained unchanged.
I use' chcp 65001' first.
then i use PyPreConfig and Py_initialize
then i 'cout << u8"Chinese中文" << endl;
It is not utf-8.
msg386074 - (view) Author: twoone3 (twoone3) Date: 2021-02-01 14:53
This is out put
msg386075 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-02-01 14:56
PyPreConfig.utf8_mode=1 enables the *Python* UTF-8 Mode:
* https://docs.python.org/dev/c-api/init_config.html?highlight=pypreconfig_initpythonconfig#c.PyPreConfig.utf8_mode
* https://docs.python.org/dev/library/os.html#utf8-mode

> then i 'cout << u8"Chinese中文" << endl;

This is C++. C++ is not aware of the Python UTF-8 Mode.

You misunderstood the purpose of the Python UTF-8 Mode.

std::cout must be configured differently. This is not a Python problem. I suggest to close the issue.
msg386076 - (view) Author: twoone3 (twoone3) Date: 2021-02-01 15:10
Look it!
It's really Python's problem
msg386077 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-02-01 15:13
It's not a Python problem. The Python configuration API only configures Python's input/output API to UTF-8 mode. It does not affect the C++ input/output cout API.
msg386078 - (view) Author: twoone3 (twoone3) Date: 2021-02-01 15:15
when i use python 3.6.5
There won't be this problem
msg386105 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2021-02-01 20:35
I assume you're linking to the CRT dynamically, which is shared with "python39.dll", which means you're sharing the configured locale with Python. Since you're not using an isolated configuration, the LC_CTYPE locale will be set to the current user's default locale (configured in "HKCU\Control Panel\International"). 

If the STDOUT low I/O file is in ANSI text mode, and the LC_CTYPE locale is not the default "C" locale, and it's a console file, then C write() does a double translated write. First, the UTF-8 byte string is decoded to wide-character UTF-16 using the current LC_CTYPE locale encoding. Then the wide-character string is encoded back to a byte string using the console output code page. The first step leads to mojibake if the locale encoding isn't UTF-8.

At a minimum, you'll need to add `cfg.configure_locale = 0` in order to prevent Python from configuring the LC_CTYPE locale to the default user locale. 

That said, your code should be written to work in locales other than the default "C" locale. For the past few years, Windows ucrt has supported UTF-8 as a locale encoding, such as via setlocale(LC_CTYPE, ".utf8"). Alternatively, or in addition to the latter, you can use std::wcout with wide-character strings and switch stdout to UTF-8 Unicode mode via _setmode(_fileno(stdout), _O_U8TEXT). In this case, the CRT writes to the console via putwch(), which calls the wide-character WinAPI function WriteConsoleW(). If your code uses UTF-8 byte strings, you'll have to decode them to UTF-16 wide-character strings before writing to stdout.
msg386122 - (view) Author: twoone3 (twoone3) Date: 2021-02-01 23:46
Thank you for solving my problem
History
Date User Action Args
2021-02-01 23:46:11twoone3setmessages: + msg386122
2021-02-01 20:35:35eryksunsetnosy: + eryksun
messages: + msg386105
2021-02-01 15:15:55twoone3setmessages: + msg386078
2021-02-01 15:13:58christian.heimessetstatus: open -> closed
resolution: not a bug
messages: + msg386077

stage: resolved
2021-02-01 15:10:13twoone3setfiles: + 389e661314157b8f.jpg

messages: + msg386076
2021-02-01 14:56:49vstinnersetnosy: + vstinner
messages: + msg386075
2021-02-01 14:53:55twoone3setfiles: + Screenshot_2021_0201_225300.png

messages: + msg386074
2021-02-01 14:51:10twoone3setmessages: + msg386073
2021-02-01 14:30:51christian.heimessetmessages: + msg386072
2021-02-01 14:25:52twoone3setmessages: + msg386071
2021-02-01 14:18:47christian.heimessetnosy: + christian.heimes
messages: + msg386070
2021-02-01 14:15:27twoone3create