msg67129 - (view) |
Author: Alexey Shamrin (ash) |
Date: 2008-05-20 15:31 |
In the process of trying to use optparse with russian messages, I found
several problems with gettext and unicode handling:
1. optparse.OptionParser.error function doesn't work with unicode argument
2. optparse doesn't work when its error messages are gettext-translated
3. optparse fails running 'prog.py --help > out.txt' with unicode help
(at least on my system: Windows XP, Russian)
I have attached a file demonstrating these problems: test_optparse.py.
You can run it either using nose[1] or directly, manually uncommenting
test_* functions one-by-one.
[1]: http://www.somethingaboutorange.com/mrl/projects/nose/
Here's the result of running `nosetests test_optparse.py`:
EEF
======================================================================
ERROR: OptionParser.error function doesn't work with unicode argument
----------------------------------------------------------------------
Traceback (most recent call last):
File
"c:\python25\lib\site-packages\nose-0.10.2-py2.5.egg\nose\case.py", line
182, in runTest
self.test(*self.arg)
File "C:\work\test_optparse.py", line 10, in test_unicode_error
optparse.OptionParser().error(russian_text)
File "C:\Python25\lib\optparse.py", line 1562, in error
self.exit(2, "%s: error: %s\n" % (self.get_prog_name(), msg))
File "C:\Python25\lib\optparse.py", line 1551, in exit
sys.stderr.write(msg)
UnicodeEncodeError: 'ascii' codec can't encode characters in position
28-34: ordinal not in range(128)
======================================================================
ERROR: optparse doesn't work when its error messages are gettext-translated
----------------------------------------------------------------------
Traceback (most recent call last):
File
"c:\python25\lib\site-packages\nose-0.10.2-py2.5.egg\nose\case.py", line
182, in runTest
self.test(*self.arg)
File "C:\work\test_optparse.py", line 25, in
test_translated_unicode_error_message
optparse.OptionParser().parse_args(["--unknown"])
File "C:\Python25\lib\optparse.py", line 1380, in parse_args
self.error(str(err))
UnicodeEncodeError: 'ascii' codec can't encode characters in position
0-6: ordinal not in range(128)
======================================================================
FAIL: optparse fails running 'prog.py --help > out.txt' with unicode help
----------------------------------------------------------------------
Traceback (most recent call last):
File
"c:\python25\lib\site-packages\nose-0.10.2-py2.5.egg\nose\case.py", line
182, in runTest
self.test(*self.arg)
File "C:\work\test_optparse.py", line 42, in test_redirected_unicode_help
assert '?????' not in dummy_stdout.getvalue()
AssertionError
----------------------------------------------------------------------
Ran 3 tests in 0.000s
FAILED (errors=2, failures=1)
|
msg67130 - (view) |
Author: Alexey Shamrin (ash) |
Date: 2008-05-20 15:42 |
I've also attached a patch that fixes all these issues and also allows
the word "error" to be translated with gettext.
Regarding the use of `locale.getpreferredencoding` instead of
`sys.getdefaultencoding`. On my system (Windows XP, Russian) I get:
Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys, locale
>>> sys.getdefaultencoding()
'ascii'
>>> locale.getpreferredencoding()
'cp1251'
Using cp1251 on my system makes much more sense. It's used as a default
encoding everywhere in the system. For example, in Notepad.
|
msg68329 - (view) |
Author: Sam Pablo Kuper (sampablokuper) |
Date: 2008-06-17 15:59 |
Using non-ASCII characters in an optparse help string also causes
UnicodeDecodeErrors. Here's the relevant part of the traceback:
File "/home/spk30/opt/ActivePython-2.5/lib/python2.5/optparse.py", line
1655, in print_help
file.write(self.format_help().encode(encoding, "replace"))
NB. Adding an encoding declaration at the beginning of the python
script which used a non-ASCII character in an optparse help string
didn't solve the problem.
|
msg68354 - (view) |
Author: Alexey Shamrin (ash) |
Date: 2008-06-17 23:29 |
sampablokuper, I don't think your problem is relevant to this issue. In
addition to encoding declaration you should use unicode strings: u"your
non-ASCII text". Or wait for Python 3.0, where strings will be unicode
by default.
|
msg68357 - (view) |
Author: Sam Pablo Kuper (sampablokuper) |
Date: 2008-06-18 03:43 |
ash, you are correct; my bad. Thanks for the heads-up.
|
msg68359 - (view) |
Author: Ivan Vilata i Balaguer (ivilata) |
Date: 2008-06-18 08:13 |
What I find most bothersome is that ``optparse`` is being inconsistent
in the types of localised strings it expects. It needs Unicode strings
for snippets forming part of the help message, while it expects normal
strings in other places like ``OptionParser.error()`` --a fact which
isn't documented at all, BTW.
I've been developing a medium app lately with localised messages all
over the place using several packages in the standard library and
``optarparse``'s help messages are the only place where Unicode strings
have been required. I'm not saying that ``optparse`` shouldn't use
Unicode, but it'd be nice if it was consistent and the fact was documented.
I'm attaching a tiny script which uses ``optparse``. Just try to change
any appearance of the ``s`` normal string to Unicode ``us`` or
vice-versa, then call the program with ``--help`` or no arguments (it
requires one) and you get a ``UnicodeError``. Thanks!
|
msg68360 - (view) |
Author: Ivan Vilata i Balaguer (ivilata) |
Date: 2008-06-18 08:26 |
The attached version of ``optparse_unicode.py`` doensn't depend on a
UTF-8 locale, sorry.
|
msg90253 - (view) |
Author: Alexey Shamrin (ash) |
Date: 2009-07-08 07:30 |
More than a year passed since I reported this... Could someone suggest
how to move this forward? If needed, I can try to improve patch, test or
description of this issue. Should I, for example, split this into
separate issues?
|
msg110638 - (view) |
Author: Mark Lawrence (BreamoreBoy) * |
Date: 2010-07-18 11:55 |
Alexy, there would be a much better chance of getting this accepted if you could supply a patch file that also included unit tests.
|
msg120534 - (view) |
Author: Éric Araujo (eric.araujo) * |
Date: 2010-11-05 20:49 |
It would be nice to test argparse for the same behavior.
|
msg120598 - (view) |
Author: Steven Bethard (bethard) * |
Date: 2010-11-06 08:34 |
Yep, argparse almost certainly has the same kind of problems - I basically copied the optparse gettext behavior into argparse because I don't really know how that stuff works but figured people must have wanted what was in there. ;-)
|
msg130737 - (view) |
Author: Ivan Vilata i Balaguer (ivilata) |
Date: 2011-03-13 11:34 |
After so much time I've checked again with the little script I sent and I see that it doesn't happen under Python 2.7 (2.7.1+), but it does under 2.6 (2.6.6) and 2.5 (2.5.5).
|
msg130745 - (view) |
Author: Éric Araujo (eric.araujo) * |
Date: 2011-03-13 15:08 |
I’m afraid 2.5 and 2.6 don’t get bug fixes any more, only security fixes. For 2.7 and 3.x, even if your bug can’t be reproduced, I think it would be useful to add the test to prevent a regression.
|
msg130747 - (view) |
Author: Ezio Melotti (ezio.melotti) * |
Date: 2011-03-13 15:16 |
+1
|
msg240683 - (view) |
Author: A.M. Kuchling (akuchling) * |
Date: 2015-04-13 17:52 |
I've turned ash's test program into a bunch of test cases against Python 3.5 trunk. Is it worth committing them?
|
msg241097 - (view) |
Author: Greg Ward (gward) |
Date: 2015-04-15 13:15 |
> I've turned ash's test program into a bunch of test cases against
> Python 3.5 trunk. Is it worth committing them?
Yeah, probably. Review comments...
+ try:
+ self.parser.error(RUSSIAN_TEXT)
+ except InterceptedError:
+ pass
Why not self.assertRaises()?
Also, when I run the test on its own, it prints
'''
Usage: regrtest.py [options]
regrtest.py: error: Русский текст --unknown
'''
to stderr. Probably need to fiddle with sys.stderr to fix that. Blech.
Finally:
+ try:
+ import optparse
+ old_gettext = optparse._
+ optparse._ = dummy_gettext
+
+ try:
+ OptionParser().parse_args(["--unknown"])
+ except SystemExit:
+ pass
+ finally:
+ optparse._ = old_gettext
This is a lot easier with mock.
|
msg258739 - (view) |
Author: Sean Wang (Sean.Wang) |
Date: 2016-01-21 08:01 |
This bug still exists in Python 2.7.10 with optparse version 1.5.3.
When the default_value is not ASCII encoded, it would raise `UnicodeEncodeError: 'ascii' codec can't encode characters`
this error is due to the `str` usage in `expand_default` method:
def expand_default(self, option):
if self.parser is None or not self.default_tag:
return option.help
default_value = self.parser.defaults.get(option.dest)
if default_value is NO_DEFAULT or default_value is None:
default_value = self.NO_DEFAULT_VALUE
return option.help.replace(self.default_tag, str(default_value))
|
msg258741 - (view) |
Author: Sean Wang (Sean.Wang) |
Date: 2016-01-21 08:05 |
Sorry, missed one condition:
I used `unicode_literals` in Python 2.7.10, example below:
>>> from __future__ import unicode_literals
>>> str('api名称')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 3-4: ordinal not in range(128)
|
msg258744 - (view) |
Author: Sean Wang (Sean.Wang) |
Date: 2016-01-21 08:22 |
when an unicode option.default_value could not be ascii encoded, it would throw exception, detailed logs below:
File "/Users/seanwang/Documents/dev/foo/bar.py", line 119, in main
parser.print_help()
File "/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/optparse.py", line 1670, in print_help
file.write(self.format_help().encode(encoding, "replace"))
File "/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/optparse.py", line 1650, in format_help
result.append(self.format_option_help(formatter))
File "/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/optparse.py", line 1630, in format_option_help
result.append(OptionContainer.format_option_help(self, formatter))
File "/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/optparse.py", line 1074, in format_option_help
result.append(formatter.format_option(option))
File "/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/optparse.py", line 316, in format_option
help_text = self.expand_default(option)
File "/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/optparse.py", line 288, in expand_default
return option.help.replace(self.default_tag, str(default_value))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 3-4: ordinal not in range(128)
|
msg380485 - (view) |
Author: Irit Katriel (iritkatriel) * |
Date: 2020-11-07 01:30 |
The tests have not been merged yet.
|
msg380591 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-11-09 14:55 |
This issue was reported in 2008 on Python 2.5. The latest comment is about Python 2.
The latest Python version is now Python 3.9 and uses Unicode by default.
I close the issue since there is no activity for 4 years. More tests are always welcomed, so someone can still add new tests. Note that the optparse module is deprecated since Python 3.2.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:34 | admin | set | github: 47180 |
2020-11-09 14:55:16 | vstinner | set | status: open -> closed resolution: out of date messages:
+ msg380591
stage: patch review -> resolved |
2020-11-07 01:30:48 | iritkatriel | set | nosy:
+ iritkatriel
messages:
+ msg380485 versions:
+ Python 3.8, Python 3.9, Python 3.10, - Python 3.1, Python 2.7, Python 3.2 |
2016-01-21 08:22:27 | Sean.Wang | set | messages:
+ msg258744 |
2016-01-21 08:05:08 | Sean.Wang | set | messages:
+ msg258741 |
2016-01-21 08:01:36 | Sean.Wang | set | nosy:
+ Sean.Wang messages:
+ msg258739
|
2015-04-15 13:15:10 | gward | set | messages:
+ msg241097 |
2015-04-13 17:52:58 | akuchling | set | files:
+ issue2931.txt nosy:
+ akuchling messages:
+ msg240683
|
2014-02-03 19:17:25 | BreamoreBoy | set | nosy:
- BreamoreBoy
|
2011-03-13 15:16:22 | ezio.melotti | set | nosy:
loewis, gward, bethard, ivilata, vstinner, aronacher, ezio.melotti, ash, eric.araujo, sampablokuper, BreamoreBoy messages:
+ msg130747 |
2011-03-13 15:08:42 | eric.araujo | set | nosy:
loewis, gward, bethard, ivilata, vstinner, aronacher, ezio.melotti, ash, eric.araujo, sampablokuper, BreamoreBoy messages:
+ msg130745 |
2011-03-13 11:34:51 | ivilata | set | nosy:
loewis, gward, bethard, ivilata, vstinner, aronacher, ezio.melotti, ash, eric.araujo, sampablokuper, BreamoreBoy messages:
+ msg130737 |
2010-11-06 08:34:02 | bethard | set | messages:
+ msg120598 |
2010-11-05 20:49:02 | eric.araujo | set | nosy:
+ eric.araujo, bethard messages:
+ msg120534
|
2010-07-18 11:55:05 | BreamoreBoy | set | nosy:
+ BreamoreBoy, aronacher
messages:
+ msg110638 versions:
+ Python 3.1, Python 3.2, - Python 2.6 |
2009-07-10 04:53:26 | ash | set | versions:
+ Python 2.7 |
2009-07-08 07:30:33 | ash | set | messages:
+ msg90253 |
2009-05-16 18:18:06 | ajaksu2 | set | versions:
+ Python 2.6, - Python 2.5 nosy:
+ loewis, vstinner, ezio.melotti
priority: normal type: behavior stage: patch review |
2008-06-18 08:26:58 | ivilata | set | files:
+ optparse_unicode2.py messages:
+ msg68360 |
2008-06-18 08:13:49 | ivilata | set | files:
+ optparse_unicode.py nosy:
+ ivilata messages:
+ msg68359 |
2008-06-18 03:43:25 | sampablokuper | set | messages:
+ msg68357 |
2008-06-17 23:29:16 | ash | set | messages:
+ msg68354 |
2008-06-17 15:59:33 | sampablokuper | set | nosy:
+ sampablokuper messages:
+ msg68329 versions:
+ Python 2.5 |
2008-05-20 15:42:26 | ash | set | files:
+ optparse.py.patch keywords:
+ patch messages:
+ msg67130 |
2008-05-20 15:31:18 | ash | create | |