This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: optparse: various problems with unicode and gettext
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.10, Python 3.9, Python 3.8
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: Sean.Wang, akuchling, aronacher, ash, bethard, eric.araujo, ezio.melotti, gward, iritkatriel, ivilata, loewis, sampablokuper, vstinner
Priority: normal Keywords: patch

Created on 2008-05-20 15:31 by ash, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
test_optparse.py ash, 2008-05-20 15:31
optparse.py.patch ash, 2008-05-20 15:42 review
optparse_unicode.py ivilata, 2008-06-18 08:13 Show optparse's string type inconsistency.
optparse_unicode2.py ivilata, 2008-06-18 08:26 Optparse's string type inconsistency.
issue2931.txt akuchling, 2015-04-13 17:52 review
Messages (21)
msg67129 - (view) Author: Alexey Shamrin (ash) Date: 2008-05-20 15:31
In the process of trying to use optparse with russian messages, I found
several problems with gettext and unicode handling:

1. optparse.OptionParser.error function doesn't work with unicode argument
2. optparse doesn't work when its error messages are gettext-translated
3. optparse fails running 'prog.py --help > out.txt' with unicode help
(at least on my system: Windows XP, Russian)

I have attached a file demonstrating these problems: test_optparse.py.
You can run it either using nose[1] or directly, manually uncommenting
test_* functions one-by-one.

[1]: http://www.somethingaboutorange.com/mrl/projects/nose/

Here's the result of running `nosetests test_optparse.py`:

EEF
======================================================================
ERROR: OptionParser.error function doesn't work with unicode argument
----------------------------------------------------------------------
Traceback (most recent call last):
  File
"c:\python25\lib\site-packages\nose-0.10.2-py2.5.egg\nose\case.py", line
182, in runTest
    self.test(*self.arg)
  File "C:\work\test_optparse.py", line 10, in test_unicode_error
    optparse.OptionParser().error(russian_text)
  File "C:\Python25\lib\optparse.py", line 1562, in error
    self.exit(2, "%s: error: %s\n" % (self.get_prog_name(), msg))
  File "C:\Python25\lib\optparse.py", line 1551, in exit
    sys.stderr.write(msg)
UnicodeEncodeError: 'ascii' codec can't encode characters in position
28-34: ordinal not in range(128)

======================================================================
ERROR: optparse doesn't work when its error messages are gettext-translated
----------------------------------------------------------------------
Traceback (most recent call last):
  File
"c:\python25\lib\site-packages\nose-0.10.2-py2.5.egg\nose\case.py", line
182, in runTest
    self.test(*self.arg)
  File "C:\work\test_optparse.py", line 25, in
test_translated_unicode_error_message
    optparse.OptionParser().parse_args(["--unknown"])
  File "C:\Python25\lib\optparse.py", line 1380, in parse_args
    self.error(str(err))
UnicodeEncodeError: 'ascii' codec can't encode characters in position
0-6: ordinal not in range(128)

======================================================================
FAIL: optparse fails running 'prog.py --help > out.txt' with unicode help
----------------------------------------------------------------------
Traceback (most recent call last):
  File
"c:\python25\lib\site-packages\nose-0.10.2-py2.5.egg\nose\case.py", line
182, in runTest
    self.test(*self.arg)
  File "C:\work\test_optparse.py", line 42, in test_redirected_unicode_help
    assert '?????' not in dummy_stdout.getvalue()
AssertionError

----------------------------------------------------------------------
Ran 3 tests in 0.000s

FAILED (errors=2, failures=1)
msg67130 - (view) Author: Alexey Shamrin (ash) Date: 2008-05-20 15:42
I've also attached a patch that fixes all these issues and also allows
the word "error" to be translated with gettext.

Regarding the use of `locale.getpreferredencoding` instead of
`sys.getdefaultencoding`. On my system (Windows XP, Russian) I get:

Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys, locale
>>> sys.getdefaultencoding()
'ascii'
>>> locale.getpreferredencoding()
'cp1251'

Using cp1251 on my system makes much more sense. It's used as a default
encoding everywhere in the system. For example, in Notepad.
msg68329 - (view) Author: Sam Pablo Kuper (sampablokuper) Date: 2008-06-17 15:59
Using non-ASCII characters in an optparse help string also causes 
UnicodeDecodeErrors. Here's the relevant part of the traceback:

File "/home/spk30/opt/ActivePython-2.5/lib/python2.5/optparse.py", line 
1655, in print_help
    file.write(self.format_help().encode(encoding, "replace"))

NB. Adding an encoding declaration at the beginning of the python 
script which used a non-ASCII character in an optparse help string 
didn't solve the problem.
msg68354 - (view) Author: Alexey Shamrin (ash) Date: 2008-06-17 23:29
sampablokuper, I don't think your problem is relevant to this issue. In
addition to encoding declaration you should use unicode strings: u"your
non-ASCII text". Or wait for Python 3.0, where strings will be unicode
by default.
msg68357 - (view) Author: Sam Pablo Kuper (sampablokuper) Date: 2008-06-18 03:43
ash, you are correct; my bad. Thanks for the heads-up.
msg68359 - (view) Author: Ivan Vilata i Balaguer (ivilata) Date: 2008-06-18 08:13
What I find most bothersome is that ``optparse`` is being inconsistent
in the types of localised strings it expects. It needs Unicode strings
for snippets forming part of the help message, while it expects normal
strings in other places like ``OptionParser.error()`` --a fact which
isn't documented at all, BTW.

I've been developing a medium app lately with localised messages all
over the place using several packages in the standard library and
``optarparse``'s help messages are the only place where Unicode strings
have been required. I'm not saying that ``optparse`` shouldn't use
Unicode, but it'd be nice if it was consistent and the fact was documented.

I'm attaching a tiny script which uses ``optparse``.  Just try to change
any appearance of the ``s`` normal string to Unicode ``us`` or
vice-versa, then call the program with ``--help`` or no arguments (it
requires one) and you get a ``UnicodeError``. Thanks!
msg68360 - (view) Author: Ivan Vilata i Balaguer (ivilata) Date: 2008-06-18 08:26
The attached version of ``optparse_unicode.py`` doensn't depend on a
UTF-8 locale, sorry.
msg90253 - (view) Author: Alexey Shamrin (ash) Date: 2009-07-08 07:30
More than a year passed since I reported this... Could someone suggest
how to move this forward? If needed, I can try to improve patch, test or
description of this issue. Should I, for example, split this into
separate issues?
msg110638 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-07-18 11:55
Alexy, there would be a much better chance of getting this accepted if you could supply a patch file that also included unit tests.
msg120534 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-11-05 20:49
It would be nice to test argparse for the same behavior.
msg120598 - (view) Author: Steven Bethard (bethard) * (Python committer) Date: 2010-11-06 08:34
Yep, argparse almost certainly has the same kind of problems - I basically copied the optparse gettext behavior into argparse because I don't really know how that stuff works but figured people must have wanted what was in there. ;-)
msg130737 - (view) Author: Ivan Vilata i Balaguer (ivilata) Date: 2011-03-13 11:34
After so much time I've checked again with the little script I sent and I see that it doesn't happen under Python 2.7 (2.7.1+), but it does under 2.6 (2.6.6) and 2.5 (2.5.5).
msg130745 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-03-13 15:08
I’m afraid 2.5 and 2.6 don’t get bug fixes any more, only security fixes.  For 2.7 and 3.x, even if your bug can’t be reproduced, I think it would be useful to add the test to prevent a regression.
msg130747 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-03-13 15:16
+1
msg240683 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2015-04-13 17:52
I've turned ash's test program into a bunch of test cases against Python 3.5 trunk.  Is it worth committing them?
msg241097 - (view) Author: Greg Ward (gward) (Python committer) Date: 2015-04-15 13:15
> I've turned ash's test program into a bunch of test cases against 
> Python 3.5 trunk.  Is it worth committing them?

Yeah, probably. Review comments...

+        try:
+            self.parser.error(RUSSIAN_TEXT)
+        except InterceptedError:
+            pass

Why not self.assertRaises()?

Also, when I run the test on its own, it prints

'''
Usage: regrtest.py [options]

regrtest.py: error: Русский текст --unknown
'''

to stderr. Probably need to fiddle with sys.stderr to fix that. Blech.

Finally:

+        try:
+            import optparse
+            old_gettext = optparse._
+            optparse._ = dummy_gettext
+
+            try:
+                OptionParser().parse_args(["--unknown"])
+            except SystemExit:
+                pass
+        finally:
+            optparse._ = old_gettext

This is a lot easier with mock.
msg258739 - (view) Author: Sean Wang (Sean.Wang) Date: 2016-01-21 08:01
This bug still exists in Python 2.7.10 with optparse version 1.5.3.
When the default_value is not ASCII encoded, it would raise `UnicodeEncodeError: 'ascii' codec can't encode characters`

this error is due to the `str` usage in `expand_default` method:

    def expand_default(self, option):
        if self.parser is None or not self.default_tag:
            return option.help

        default_value = self.parser.defaults.get(option.dest)
        if default_value is NO_DEFAULT or default_value is None:
            default_value = self.NO_DEFAULT_VALUE

        return option.help.replace(self.default_tag, str(default_value))
msg258741 - (view) Author: Sean Wang (Sean.Wang) Date: 2016-01-21 08:05
Sorry, missed one condition:
I used `unicode_literals` in Python 2.7.10, example below:

>>> from __future__ import unicode_literals
>>> str('api名称')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 3-4: ordinal not in range(128)
msg258744 - (view) Author: Sean Wang (Sean.Wang) Date: 2016-01-21 08:22
when an unicode option.default_value could not be ascii encoded, it would throw exception, detailed logs below:
  File "/Users/seanwang/Documents/dev/foo/bar.py", line 119, in main
    parser.print_help()
  File "/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/optparse.py", line 1670, in print_help
    file.write(self.format_help().encode(encoding, "replace"))
  File "/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/optparse.py", line 1650, in format_help
    result.append(self.format_option_help(formatter))
  File "/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/optparse.py", line 1630, in format_option_help
    result.append(OptionContainer.format_option_help(self, formatter))
  File "/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/optparse.py", line 1074, in format_option_help
    result.append(formatter.format_option(option))
  File "/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/optparse.py", line 316, in format_option
    help_text = self.expand_default(option)
  File "/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/optparse.py", line 288, in expand_default
    return option.help.replace(self.default_tag, str(default_value))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 3-4: ordinal not in range(128)
msg380485 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2020-11-07 01:30
The tests have not been merged yet.
msg380591 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-11-09 14:55
This issue was reported in 2008 on Python 2.5. The latest comment is about Python 2.

The latest Python version is now Python 3.9 and uses Unicode by default.

I close the issue since there is no activity for 4 years. More tests are always welcomed, so someone can still add new tests. Note that the optparse module is deprecated since Python 3.2.
History
Date User Action Args
2022-04-11 14:56:34adminsetgithub: 47180
2020-11-09 14:55:16vstinnersetstatus: open -> closed
resolution: out of date
messages: + msg380591

stage: patch review -> resolved
2020-11-07 01:30:48iritkatrielsetnosy: + iritkatriel

messages: + msg380485
versions: + Python 3.8, Python 3.9, Python 3.10, - Python 3.1, Python 2.7, Python 3.2
2016-01-21 08:22:27Sean.Wangsetmessages: + msg258744
2016-01-21 08:05:08Sean.Wangsetmessages: + msg258741
2016-01-21 08:01:36Sean.Wangsetnosy: + Sean.Wang
messages: + msg258739
2015-04-15 13:15:10gwardsetmessages: + msg241097
2015-04-13 17:52:58akuchlingsetfiles: + issue2931.txt
nosy: + akuchling
messages: + msg240683

2014-02-03 19:17:25BreamoreBoysetnosy: - BreamoreBoy
2011-03-13 15:16:22ezio.melottisetnosy: loewis, gward, bethard, ivilata, vstinner, aronacher, ezio.melotti, ash, eric.araujo, sampablokuper, BreamoreBoy
messages: + msg130747
2011-03-13 15:08:42eric.araujosetnosy: loewis, gward, bethard, ivilata, vstinner, aronacher, ezio.melotti, ash, eric.araujo, sampablokuper, BreamoreBoy
messages: + msg130745
2011-03-13 11:34:51ivilatasetnosy: loewis, gward, bethard, ivilata, vstinner, aronacher, ezio.melotti, ash, eric.araujo, sampablokuper, BreamoreBoy
messages: + msg130737
2010-11-06 08:34:02bethardsetmessages: + msg120598
2010-11-05 20:49:02eric.araujosetnosy: + eric.araujo, bethard
messages: + msg120534
2010-07-18 11:55:05BreamoreBoysetnosy: + BreamoreBoy, aronacher

messages: + msg110638
versions: + Python 3.1, Python 3.2, - Python 2.6
2009-07-10 04:53:26ashsetversions: + Python 2.7
2009-07-08 07:30:33ashsetmessages: + msg90253
2009-05-16 18:18:06ajaksu2setversions: + Python 2.6, - Python 2.5
nosy: + loewis, vstinner, ezio.melotti

priority: normal
type: behavior
stage: patch review
2008-06-18 08:26:58ivilatasetfiles: + optparse_unicode2.py
messages: + msg68360
2008-06-18 08:13:49ivilatasetfiles: + optparse_unicode.py
nosy: + ivilata
messages: + msg68359
2008-06-18 03:43:25sampablokupersetmessages: + msg68357
2008-06-17 23:29:16ashsetmessages: + msg68354
2008-06-17 15:59:33sampablokupersetnosy: + sampablokuper
messages: + msg68329
versions: + Python 2.5
2008-05-20 15:42:26ashsetfiles: + optparse.py.patch
keywords: + patch
messages: + msg67130
2008-05-20 15:31:18ashcreate