classification
Title: Printing specific Unicode characters causes unwanted beeping in Windows 7 console
Type: behavior Stage: resolved
Components: Unicode, Windows Versions: Python 3.8
process
Status: closed Resolution: third party
Dependencies: Superseder:
Assigned To: Nosy List: eryksun, ezio.melotti, john_miller, paul.moore, steve.dower, tim.golden, zach.ware
Priority: normal Keywords:

Created on 2020-09-20 10:10 by john_miller, last changed 2020-09-23 10:53 by john_miller. This issue is now closed.

Messages (6)
msg377214 - (view) Author: (john_miller) Date: 2020-09-20 10:10
I noticed some beeping when printing data to the console (cmd), so I narrowed down the problem to three specific Unicode-characters somehow producing beeps when printed to the Windows console (CMD). This also works in interactive mode. (I checked every single Unicode-character up to sys.maxunicode (except character-values between 0xD800 and 0xE000) by bisecting ever-more fine-grained until I narrowed it down to only those three characters producing unwanted beeps.)

Isolated example of bug: for i in ['\u2022', '\u2024', '\u2219']: print(i)

\u0007 beeps too, but this works as intended.

Python version: 3.8.5 (tags/v3.8.5:580fbb0, Jul 20 2020, 15:43:08) [MSC v.1926 32 bit (Intel)]
Windows version: Microsoft Windows 7 Professional (6.1.7601 Service Pack 1 Build 7601)
CMD active code page: 850 (According to chcp-command.) 
sys.stdout.encoding: utf-8

Might be loosly related to PEP-528 (https://www.python.org/dev/peps/pep-0528/)

Addendum:
Piping the output like: 'python.exe script.py >> testfile.txt' produced an error.
This might happen with more Unicode characters but I haven't checked.

Addendum-Example:
Traceback (most recent call last):
  File "pytest2.py", line 38, in <module>
    print(hex(i), ":", chr(i), end='\n')
  File "C:\Python38-32\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2023' in position 0: character maps to <undefined>
msg377218 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2020-09-20 15:24
If you've selected an OEM raster font in the console properties instead of a TrueType font (Consolas, Lucida Console, etc), then the console implements some magic to support the OEM character mapping in the raster font. The following shows that encoding to codepage 850 (the system OEM codepage in the UK and Canada) maps the characters that you've flagged, and only those characters, to an ASCII bell (0x07):

    >>> [hex(x) for x in range(0xFFFF)
    ...   if codecs.code_page_encode(850, chr(x), 'replace')[0] == b'\x07']
    ['0x7', '0x2022', '0x2024', '0x2219']

Please confirm whether you're using an OEM raster font. If you are, then this issue can be closed.

---

> UnicodeEncodeError: 'charmap' codec can't encode character 
> '\u2023' in position 0: character maps to <undefined>

Windows Python defaults to using the process active codepage (WinAPI GetACP) for non-console I/O. In Windows 7, the process active codepage is limited to the system ANSI encoding, which is limited to a legacy codepage such as 1252. You can force standard I/O to use UTF-8 via PYTHONIOENCODING, or force all I/O to default to UTF-8 by enabling UTF-8 mode via PYTHONUTF8.

---

For future reference, a console session is hosted by conhost.exe. The cmd.exe shell is a console client application, which uses standard I/O just like python.exe. If you run python.exe from Explorer, you won't see cmd.exe as a parent or in any way involved with the console session.

The confusion stems from how the system pretends that the process that allocates a console session owns the console window. This is a convenience to allow easily identifying and closing a console session in Task Manager, or via taskkill.exe. But it also leads people to confuse the console with the shell that they run -- such as CMD or PowerShell.
msg377231 - (view) Author: (john_miller) Date: 2020-09-20 19:15
It is set to use an OEM raster font.

(Should the information about the font and possible unwanted beeping be mentioned on https://docs.python.org/3/using/windows.html ? Perhaps even a short link regarding "Windows issues" under the print()-entry?)

Thanks for your help.

I was a bit lost on what happens under the hood encoding-wise as sys.stdout.encoding just returns 'utf-8'.

I CP850 required to encode those characters via CP850 to the bell character?

I'm not clear on how setting Python to use UTF-8 interacts with the regular console.

Basically there are two different basic fixes for printing various text without beeping to the console as-is (with the font set to the default setting raster font):
1. Set the font to a TTF-Font in the console settings
2. Filter or replace the offending Unicode characters (\u2022, \u2024, \u2219) in the Unicode string before printing it to "sys.stdout".

In the case where the UTF-8 flag is set (Environment variable set PYTHONUTF8=1 or "python.exe -X utf8 script.py" called within the console), it seems that it still beeps. In interactive mode and when entering the command in the console.

Both these cases still beep.
python.exe -X utf8 -c "print('\u2022')"
set /A PYTHONUTF8=1 & python.exe -c "print('\u2022')" & set PYTHONUTF8=

Using python.exe from the explorer without opening it from a console Window (that was opened via cmd.exe) also beeps.

So I assume that with the UTF-8 flag set the font still has to be set to a TTF-Font so it does not beep?

----
As for redirecting STDOUT in the console, the UTF-8 option seems to be sufficient, simply printing the character utf-8 encoded into the file:
python.exe -X utf8 -c "print('\u2023')" >> testfile.txt

A quick test for writing bytes to a file this way (see also Issue4571):
python.exe -X utf8 -c "import sys; sys.stdout.buffer.raw.write(b'\x20\x20a')" >> testfile.txt )
msg377235 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2020-09-20 22:05
> It is set to use an OEM raster font.

Okay, then the issue can either be closed as third-party or changed to a documentation enhancement. It could be documented that Unicode support requires selecting a TrueType font in the console properties. "Raster Fonts" uses OEM with a lossy best-fit conversion (e.g. "α" -> "a").  

> So I assume that with the UTF-8 flag set the font still has to be 
> set to a TTF-Font so it does not beep?

Yes, the conversion to OEM for "Raster Fonts" occurs in the console itself. 

> I CP850 required to encode those characters via CP850 to the 
> bell character?

Other OEM codepages have a different best-fit mapping. For example, codepage 437 maps U+2022 to ASCII bell, but not U+2024 or U+2219.

> I was a bit lost on what happens under the hood encoding-wise as 
> sys.stdout.encoding just returns 'utf-8'.

For a console file, the io module uses io._WindowsConsoleIO for the raw layer instead of io.FileIO. _WindowsConsoleIO internally uses the console's wide-character (UTF-16) API and converts to and from UTF-8. 

Overriding the standard I/O encoding to UTF-8 isn't needed for _WindowsConsoleIO. It's needed for regular files and pipes when standard I/O is redirected, since those still default to the process active codepage (usually system ANSI).
msg377247 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2020-09-21 10:29
I've decided to close this issue since extending the "Using Python on Windows" section of the docs to recommend using a TrueType font in the console would only be generally useful for people using Python 3.8 with Windows 7. Python 3.9+ requires Windows 8.1+, which defaults to using a TrueType font in console windows. In Windows 10, the console doesn't even support OEM "Raster Fonts", unless the option is selected to use the legacy console.
msg377366 - (view) Author: (john_miller) Date: 2020-09-23 10:53
>In Windows 10, the console doesn't even support OEM "Raster Fonts", unless the option is selected to use the legacy console.

So the setting could still be changed if a user decides to modify the setting in order to run some legacy-software.

A single sentence about the beeping and the recommendation to use a True Type Font instead of the OEM raster font in the console section of "Using Windows" might not hurt.

Seeing as there are lots of legacy features in the console that still work, setting a raster font might probably even be a compatibility option in future Windows versions.
History
Date User Action Args
2020-09-23 10:53:34john_millersetmessages: + msg377366
2020-09-21 10:29:44eryksunsetstatus: open -> closed
resolution: third party
messages: + msg377247

stage: resolved
2020-09-20 22:05:16eryksunsetmessages: + msg377235
2020-09-20 19:15:10john_millersetmessages: + msg377231
2020-09-20 15:24:48eryksunsetnosy: + eryksun
messages: + msg377218
2020-09-20 13:02:38vstinnersetnosy: - vstinner
2020-09-20 10:10:35john_millercreate