classification
Title: [Python 2] pip error on windows whose current user name contains non-ascii characters
Type: crash Stage: resolved
Components: Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: third party
Dependencies: Superseder:
Assigned To: Nosy List: Marcus.Smith, Suzumizaki, dstufft, ncoghlan, paul.moore, serhiy.storchaka, tanbro-liu, vstinner, xtreak
Priority: normal Keywords: patch

Created on 2015-05-28 02:20 by tanbro-liu, last changed 2018-12-22 10:13 by serhiy.storchaka. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 9911 closed xtreak, 2018-10-16 12:15
Messages (7)
msg244244 - (view) Author: (tanbro-liu) Date: 2015-05-28 02:20
On windows8.1 x64, current user name contains non-ascii characters. When executing ``pip`` in the command-line, such an error happens::

	C:\Users\雪彦>pip
	Traceback (most recent call last):
	  File "C:\Python27\lib\runpy.py", line 162, in _run_module_as_main
		"__main__", fname, loader, pkg_name)
	  File "C:\Python27\lib\runpy.py", line 72, in _run_code
		exec code in run_globals
	  File "C:\Python27\Scripts\pip.exe\__main__.py", line 9, in <module>
	  File "C:\Python27\lib\site-packages\pip\__init__.py", line 210, in main
		cmd_name, cmd_args = parseopts(args)
	  File "C:\Python27\lib\site-packages\pip\__init__.py", line 165, in parseopts
		parser.print_help()
	  File "C:\Python27\lib\optparse.py", line 1676, in print_help
		file.write(self.format_help().encode(encoding, "replace"))
	  File "C:\Python27\lib\optparse.py", line 1656, in format_help
		result.append(self.format_option_help(formatter))
	  File "C:\Python27\lib\optparse.py", line 1639, in format_option_help
		result.append(group.format_help(formatter))
	  File "C:\Python27\lib\optparse.py", line 1120, in format_help
		result += OptionContainer.format_help(self, formatter)
	  File "C:\Python27\lib\optparse.py", line 1091, in format_help
		result.append(self.format_option_help(formatter))
	  File "C:\Python27\lib\optparse.py", line 1080, in format_option_help
		result.append(formatter.format_option(option))
	  File "C:\Python27\lib\optparse.py", line 322, in format_option
		help_text = self.expand_default(option)
	  File "C:\Python27\lib\site-packages\pip\baseparser.py", line 110, in expand_de
	fault
		return optparse.IndentedHelpFormatter.expand_default(self, option)
	  File "C:\Python27\lib\optparse.py", line 288, in expand_default
		return option.help.replace(self.default_tag, str(default_value))
	UnicodeEncodeError: 'ascii' codec can't encode characters in position 9-10: ordi
	nal not in range(128)

i think, we can modify /lib/optparse.py line 288 to avoid such an error in windows::

  -- return option.help.replace(self.default_tag, str(default_value))
  ++ return option.help.replace(
  ++     self.default_tag,
  ++     default_value.encode(sys.getfilesystemencoding())
  ++     if isinstance(default_value, uicnode)
  ++     else str(default_value)
  ++ )
msg326134 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2018-09-23 05:48
Thanks for the patch. Would you like to make a GitHub PR. I think it's a problem with optparse in general while trying to have a default value with unicode character and %default in the help string. The same code is present in Python 3 but strings are unicode by default. An example code will be below : 

# -*- coding: utf-8 -*-

from optparse import OptionParser

parser = OptionParser()
parser.add_option("-f", "--file", dest="filename",
                  help="write to FILE. Default value %default", metavar="FILE", default="早上好")

(options, args) = parser.parse_args()

$ python3.6 ../backups/bpo24307.py --help
Usage: bpo24307.py [options]

Options:
  -h, --help            show this help message and exit
  -f FILE, --file=FILE  write to FILE. Default value 早上好

$ python2.7 ../backups/bpo24307.py --help
Traceback (most recent call last):
  File "../backups/bpo24307.py", line 9, in <module>
    (options, args) = parser.parse_args()
  File "/usr/local/Cellar/python@2/2.7.14_3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/optparse.py", line 1400, in parse_args
    stop = self._process_args(largs, rargs, values)
  File "/usr/local/Cellar/python@2/2.7.14_3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/optparse.py", line 1440, in _process_args
    self._process_long_opt(rargs, values)
  File "/usr/local/Cellar/python@2/2.7.14_3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/optparse.py", line 1515, in _process_long_opt
    option.process(opt, value, values, self)
  File "/usr/local/Cellar/python@2/2.7.14_3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/optparse.py", line 789, in process
    self.action, self.dest, opt, value, values, parser)
  File "/usr/local/Cellar/python@2/2.7.14_3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/optparse.py", line 811, in take_action
    parser.print_help()
  File "/usr/local/Cellar/python@2/2.7.14_3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/optparse.py", line 1670, in print_help
    file.write(self.format_help().encode(encoding, "replace"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 148: ordinal not in range(128)


Thanks
msg327322 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2018-10-08 06:01
Since @tanbro-liu hasn't responded I am proposing this to be an easy issue. The issue is that %default in optparse doesn't handle unicode values. The fix would be to make the patch in msg244244 as a PR attributing to the original author and add a test called test_unicode_default with a unicode value as default similar to test_float_default [1] that uses a default float value.

[1] https://github.com/python/cpython/blob/4a7dd30f5810e8861a3834159a222ab32d5c97d0/Lib/test/test_optparse.py#L607
msg327881 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-10-17 09:45
I suppose there is a similar issue in Python 3 with bytes default.

Using unicode() in Python 2 will make the help string an Unicode string, and this can cause an issue with translated help string. And this will cause an issue with non-ASCII 8-bit strings.

Using repr() looks a right way of solving such issues, but this will change the output for 8-bit strings.
msg327883 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-10-17 10:02
pip is not part of Python 2, so I suggest to close this issue as "third party".

I dislike changing optparse just for pip. For me, the bug should be fixed in pip, not in optparse. I see a high risk of breaking applications which currently work as expected. If the default value is a non-ASCII string, unicode() will raise a UnicodeDecodeError.
msg327887 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2018-10-17 10:16
@Victor I think this is an issue with optparse where it can't handle non-ASCII strings for %default that is exposed by pip. I can see similar places where non-ASCII strings can cause issue in argparse for unicode choices (issue35009). I think this is a general issue where str() is used where non-ASCII strings throw this error. I am quite new to unicode so I don't know if this issue needs to be fixed in Python 2.7 or it's an error from the user end where their script needs to be fixed?
msg332337 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-12-22 08:35
Even if you encode the Unicode default for output, the user can not specify the same value, unless you use custom converter. For example, if you encode u"早上好" as string "\xe6\x97\xa9\xe4\xb8\x8a\xe5\xa5\xbd" (in UTF-8), the user can only specify the argument as a 8-bit string "\xe6\x97\xa9\xe4\xb8\x8a\xe5\xa5\xbd" which differs from a Unicode string u"早上好".

Even if you use a custom converter which decodes 8-bit strings to Unicode, it makes sense to specify the default value as encoded string, because it will be pass to the converter.

Non-ascii unicode values never supported as default values. This issue is rather a feature request than a bug report. It is too late to add new features in 2.7. The right solution is to upgrade to Python 3. Eventually, solving similar issues was one of purposes of creating Python 3.
History
Date User Action Args
2018-12-22 10:13:09serhiy.storchakasetstatus: open -> closed
resolution: third party
stage: patch review -> resolved
2018-12-22 08:35:07serhiy.storchakasetmessages: + msg332337
2018-10-17 10:16:11xtreaksetmessages: + msg327887
2018-10-17 10:02:53vstinnersetnosy: + vstinner
messages: + msg327883
2018-10-17 09:55:45vstinnersettitle: pip error on windows whose current user name contains non-ascii characters -> [Python 2] pip error on windows whose current user name contains non-ascii characters
2018-10-17 09:45:07serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg327881
2018-10-16 12:15:11xtreaksetkeywords: + patch
stage: patch review
pull_requests: + pull_request9269
2018-10-08 06:01:58xtreaksetmessages: + msg327322
2018-09-23 05:48:25xtreaksetmessages: + msg326134
2018-09-22 08:04:15xtreaksetnosy: + xtreak
2015-06-11 07:34:54Suzumizakisetnosy: + Suzumizaki
2015-05-28 02:27:57ned.deilysetnosy: + paul.moore, ncoghlan, dstufft, Marcus.Smith
2015-05-28 02:20:59tanbro-liucreate