classification
Title: pip3 show causing Error for ConfigParaser
Type: behavior Stage: resolved
Components: Versions:
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: eryksun, mdileep, xtreak
Priority: normal Keywords:

Created on 2018-11-02 17:27 by mdileep, last changed 2019-04-21 20:46 by eryksun. This issue is now closed.

Messages (5)
msg329144 - (view) Author: Dileep (mdileep) Date: 2018-11-02 17:27
I' receiving error while viewing the package info of ConfigParser.

The command `pip3 show ConfigParser` doesn't cause any error where as following batch script is resulting this error

It has something to do with Autor name or Unicode handling of the Pip3
`for /f "tokens=1,2 delims=:" %a in ('pip3 show ConfigParser') do echo %a : %b`
 
Traceback (most recent call last):
  File "c:\program files (x86)\microsoft visual studio\shared\python36_64\lib\logging\__init__.py", line 995, in emit
    stream.write(msg)
  File "c:\program files (x86)\microsoft visual studio\shared\python36_64\lib\site-packages\pip\_vendor\colorama\ansitowin32.py", line 141, in write
    self.write_and_convert(text)
  File "c:\program files (x86)\microsoft visual studio\shared\python36_64\lib\site-packages\pip\_vendor\colorama\ansitowin32.py", line 169, in write_and_convert
    self.write_plain_text(text, cursor, len(text))
  File "c:\program files (x86)\microsoft visual studio\shared\python36_64\lib\site-packages\pip\_vendor\colorama\ansitowin32.py", line 174, in write_plain_text
    self.wrapped.write(text[start:end])
  File "c:\program files (x86)\microsoft visual studio\shared\python36_64\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u0141' in position 8: character maps to <undefined>
Call stack:
  File "c:\program files (x86)\microsoft visual studio\shared\python36_64\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\program files (x86)\microsoft visual studio\shared\python36_64\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\Scripts\pip3.exe\__main__.py", line 9, in <module>
    sys.exit(main())
  File "c:\program files (x86)\microsoft visual studio\shared\python36_64\lib\site-packages\pip\_internal\__init__.py", line 78, in main
    return command.main(cmd_args)
  File "c:\program files (x86)\microsoft visual studio\shared\python36_64\lib\site-packages\pip\_internal\cli\base_command.py", line 143, in main
    status = self.run(options, args)
  File "c:\program files (x86)\microsoft visual studio\shared\python36_64\lib\site-packages\pip\_internal\commands\show.py", line 47, in run
    results, list_files=options.files, verbose=options.verbose):
  File "c:\program files (x86)\microsoft visual studio\shared\python36_64\lib\site-packages\pip\_internal\commands\show.py", line 145, in print_results
    logger.info("Author: %s", dist.get('author', ''))
Message: 'Author: %s'
Arguments: ('Łukasz Langa',)
msg329667 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2018-11-11 06:53
Thanks for the report. The tracker is for issues related to CPython. I think this is an issue with pip and please file an issue at https://github.com/pypa/pip/issues. I can see some related issues as per your original report on googling specifically on Windows machines while processing non-ASCII characters in metadata.

* https://github.com/dib-lab/khmer/issues/1565
* https://github.com/pypa/pip/issues/1291
* https://github.com/pypa/pip/issues?q=is%3Aissue+sort%3Aupdated-desc+unicode+is%3Aopen

I propose closing this as third party.
msg329670 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2018-11-11 08:45
Sorry I overlooked the part where it says this works with command line but throws an error on batch script.
msg340620 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2019-04-21 19:26
I am able to reproduce this by explicitly setting PYTHONIOENCODING as ascii . But I am not sure about the defaults in Windows (related to your environment, os only?) and perhaps something has changed since 3.6 since it fails when executed under batch script or to close this as a pip related third party issue.

(py37-venv) ➜  cpython git:(master) ✗ PYTHONIOENCODING=ascii pip3 show ConfigParser
Name: configparser
Version: 3.7.4
Summary: Updated configparser from Python 3.7 for Python 2.6+.
Home-page: https://github.com/jaraco/configparser/
--- Logging error ---
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/logging/__init__.py", line 1036, in emit
    stream.write(msg)
UnicodeEncodeError: 'ascii' codec can't encode character '\u0141' in position 8: ordinal not in range(128)
Call stack:
  File "/Users/karthikeyansingaravelan/stuff/python/py37-venv/bin/pip3", line 11, in <module>
    sys.exit(main())
  File "/Users/karthikeyansingaravelan/stuff/python/py37-venv/lib/python3.7/site-packages/pip/_internal/__init__.py", line 78, in main
    return command.main(cmd_args)
  File "/Users/karthikeyansingaravelan/stuff/python/py37-venv/lib/python3.7/site-packages/pip/_internal/cli/base_command.py", line 176, in main
    status = self.run(options, args)
  File "/Users/karthikeyansingaravelan/stuff/python/py37-venv/lib/python3.7/site-packages/pip/_internal/commands/show.py", line 47, in run
    results, list_files=options.files, verbose=options.verbose):
  File "/Users/karthikeyansingaravelan/stuff/python/py37-venv/lib/python3.7/site-packages/pip/_internal/commands/show.py", line 145, in print_results
    logger.info("Author: %s", dist.get('author', ''))
Message: 'Author: %s'
Arguments: ('\u0141ukasz Langa',)
Author-email: lukasz@langa.pl
License: UNKNOWN
msg340621 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2019-04-21 20:46
In Windows, Python defaults to the system ANSI codepage (e.g. 1252 in the West) for non-console standard I/O. For the case of a `for /f` loop in CMD, stdout is a pipe, so Python defaults to writing ANSI encoded text to its end of the pipe. I recommend overriding the encoding to UTF-8 using the PYTHONIOENCODING environment variable. 

CMD uses the console's output codepage to decode bytes read from its end of the pipe, so the batch script should temporarily change the console codepage to UTF-8 via `chcp.com 65001`. Note that this won't work if CMD is running without a console (i.e. a DETACHED_PROCESS), in which case it defaults to ANSI. (I don't recommend running without a console. If no window is required, use CREATE_NO_WINDOW or a hidden window instead.) First save the current console codepage, parsed from the output of running `chcp.com` without arguments. Then you can restore the original console codepage after the loop.

After decoding the text, CMD's `echo` command writes to the console using the wide-character WriteConsoleW function, so there's no problem at this stage -- up to the limits of the console's text support. FYI, in lieu of Python getting the blame for this too, the Windows console can only render Basic Multilingual Plane (i.e. UCS-2) text, and it doesn't support automatic font fallback or complex scripts. If the console can't display a character, it displays the font's default glyph (e.g. an empty rectangle), or two default glyphs for a surrogate pair. However, we can still copy text from the console in this case.
History
Date User Action Args
2019-04-21 20:46:42eryksunsetstatus: open -> closed
type: behavior
messages: + msg340621

resolution: not a bug
stage: resolved
2019-04-21 19:26:20xtreaksetnosy: + eryksun
messages: + msg340620
2018-11-11 08:45:45xtreaksetmessages: + msg329670
2018-11-11 06:53:20xtreaksetnosy: + xtreak
messages: + msg329667
2018-11-02 17:27:26mdileepcreate