classification
Title: optparse: various problems with unicode and gettext
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.2, Python 3.1, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: aronacher, ash, bethard, eric.araujo, ezio.melotti, gward, haypo, ivilata, loewis, sampablokuper
Priority: normal Keywords: patch

Created on 2008-05-20 15:31 by ash, last changed 2014-02-03 19:17 by BreamoreBoy.

Files
File name Uploaded Description Edit
test_optparse.py ash, 2008-05-20 15:31
optparse.py.patch ash, 2008-05-20 15:42 review
optparse_unicode.py ivilata, 2008-06-18 08:13 Show optparse's string type inconsistency.
optparse_unicode2.py ivilata, 2008-06-18 08:26 Optparse's string type inconsistency.
Messages (14)
msg67129 - (view) Author: Alexey Shamrin (ash) Date: 2008-05-20 15:31
In the process of trying to use optparse with russian messages, I found
several problems with gettext and unicode handling:

1. optparse.OptionParser.error function doesn't work with unicode argument
2. optparse doesn't work when its error messages are gettext-translated
3. optparse fails running 'prog.py --help > out.txt' with unicode help
(at least on my system: Windows XP, Russian)

I have attached a file demonstrating these problems: test_optparse.py.
You can run it either using nose[1] or directly, manually uncommenting
test_* functions one-by-one.

[1]: http://www.somethingaboutorange.com/mrl/projects/nose/

Here's the result of running `nosetests test_optparse.py`:

EEF
======================================================================
ERROR: OptionParser.error function doesn't work with unicode argument
----------------------------------------------------------------------
Traceback (most recent call last):
  File
"c:\python25\lib\site-packages\nose-0.10.2-py2.5.egg\nose\case.py", line
182, in runTest
    self.test(*self.arg)
  File "C:\work\test_optparse.py", line 10, in test_unicode_error
    optparse.OptionParser().error(russian_text)
  File "C:\Python25\lib\optparse.py", line 1562, in error
    self.exit(2, "%s: error: %s\n" % (self.get_prog_name(), msg))
  File "C:\Python25\lib\optparse.py", line 1551, in exit
    sys.stderr.write(msg)
UnicodeEncodeError: 'ascii' codec can't encode characters in position
28-34: ordinal not in range(128)

======================================================================
ERROR: optparse doesn't work when its error messages are gettext-translated
----------------------------------------------------------------------
Traceback (most recent call last):
  File
"c:\python25\lib\site-packages\nose-0.10.2-py2.5.egg\nose\case.py", line
182, in runTest
    self.test(*self.arg)
  File "C:\work\test_optparse.py", line 25, in
test_translated_unicode_error_message
    optparse.OptionParser().parse_args(["--unknown"])
  File "C:\Python25\lib\optparse.py", line 1380, in parse_args
    self.error(str(err))
UnicodeEncodeError: 'ascii' codec can't encode characters in position
0-6: ordinal not in range(128)

======================================================================
FAIL: optparse fails running 'prog.py --help > out.txt' with unicode help
----------------------------------------------------------------------
Traceback (most recent call last):
  File
"c:\python25\lib\site-packages\nose-0.10.2-py2.5.egg\nose\case.py", line
182, in runTest
    self.test(*self.arg)
  File "C:\work\test_optparse.py", line 42, in test_redirected_unicode_help
    assert '?????' not in dummy_stdout.getvalue()
AssertionError

----------------------------------------------------------------------
Ran 3 tests in 0.000s

FAILED (errors=2, failures=1)
msg67130 - (view) Author: Alexey Shamrin (ash) Date: 2008-05-20 15:42
I've also attached a patch that fixes all these issues and also allows
the word "error" to be translated with gettext.

Regarding the use of `locale.getpreferredencoding` instead of
`sys.getdefaultencoding`. On my system (Windows XP, Russian) I get:

Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys, locale
>>> sys.getdefaultencoding()
'ascii'
>>> locale.getpreferredencoding()
'cp1251'

Using cp1251 on my system makes much more sense. It's used as a default
encoding everywhere in the system. For example, in Notepad.
msg68329 - (view) Author: Sam Pablo Kuper (sampablokuper) Date: 2008-06-17 15:59
Using non-ASCII characters in an optparse help string also causes 
UnicodeDecodeErrors. Here's the relevant part of the traceback:

File "/home/spk30/opt/ActivePython-2.5/lib/python2.5/optparse.py", line 
1655, in print_help
    file.write(self.format_help().encode(encoding, "replace"))

NB. Adding an encoding declaration at the beginning of the python 
script which used a non-ASCII character in an optparse help string 
didn't solve the problem.
msg68354 - (view) Author: Alexey Shamrin (ash) Date: 2008-06-17 23:29
sampablokuper, I don't think your problem is relevant to this issue. In
addition to encoding declaration you should use unicode strings: u"your
non-ASCII text". Or wait for Python 3.0, where strings will be unicode
by default.
msg68357 - (view) Author: Sam Pablo Kuper (sampablokuper) Date: 2008-06-18 03:43
ash, you are correct; my bad. Thanks for the heads-up.
msg68359 - (view) Author: Ivan Vilata i Balaguer (ivilata) Date: 2008-06-18 08:13
What I find most bothersome is that ``optparse`` is being inconsistent
in the types of localised strings it expects. It needs Unicode strings
for snippets forming part of the help message, while it expects normal
strings in other places like ``OptionParser.error()`` --a fact which
isn't documented at all, BTW.

I've been developing a medium app lately with localised messages all
over the place using several packages in the standard library and
``optarparse``'s help messages are the only place where Unicode strings
have been required. I'm not saying that ``optparse`` shouldn't use
Unicode, but it'd be nice if it was consistent and the fact was documented.

I'm attaching a tiny script which uses ``optparse``.  Just try to change
any appearance of the ``s`` normal string to Unicode ``us`` or
vice-versa, then call the program with ``--help`` or no arguments (it
requires one) and you get a ``UnicodeError``. Thanks!
msg68360 - (view) Author: Ivan Vilata i Balaguer (ivilata) Date: 2008-06-18 08:26
The attached version of ``optparse_unicode.py`` doensn't depend on a
UTF-8 locale, sorry.
msg90253 - (view) Author: Alexey Shamrin (ash) Date: 2009-07-08 07:30
More than a year passed since I reported this... Could someone suggest
how to move this forward? If needed, I can try to improve patch, test or
description of this issue. Should I, for example, split this into
separate issues?
msg110638 - (view) Author: Mark Lawrence (BreamoreBoy) Date: 2010-07-18 11:55
Alexy, there would be a much better chance of getting this accepted if you could supply a patch file that also included unit tests.
msg120534 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-11-05 20:49
It would be nice to test argparse for the same behavior.
msg120598 - (view) Author: Steven Bethard (bethard) * (Python committer) Date: 2010-11-06 08:34
Yep, argparse almost certainly has the same kind of problems - I basically copied the optparse gettext behavior into argparse because I don't really know how that stuff works but figured people must have wanted what was in there. ;-)
msg130737 - (view) Author: Ivan Vilata i Balaguer (ivilata) Date: 2011-03-13 11:34
After so much time I've checked again with the little script I sent and I see that it doesn't happen under Python 2.7 (2.7.1+), but it does under 2.6 (2.6.6) and 2.5 (2.5.5).
msg130745 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-03-13 15:08
I’m afraid 2.5 and 2.6 don’t get bug fixes any more, only security fixes.  For 2.7 and 3.x, even if your bug can’t be reproduced, I think it would be useful to add the test to prevent a regression.
msg130747 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-03-13 15:16
+1
History
Date User Action Args
2014-02-03 19:17:25BreamoreBoysetnosy: - BreamoreBoy
2011-03-13 15:16:22ezio.melottisetnosy: loewis, gward, bethard, ivilata, haypo, aronacher, ezio.melotti, ash, eric.araujo, sampablokuper, BreamoreBoy
messages: + msg130747
2011-03-13 15:08:42eric.araujosetnosy: loewis, gward, bethard, ivilata, haypo, aronacher, ezio.melotti, ash, eric.araujo, sampablokuper, BreamoreBoy
messages: + msg130745
2011-03-13 11:34:51ivilatasetnosy: loewis, gward, bethard, ivilata, haypo, aronacher, ezio.melotti, ash, eric.araujo, sampablokuper, BreamoreBoy
messages: + msg130737
2010-11-06 08:34:02bethardsetmessages: + msg120598
2010-11-05 20:49:02eric.araujosetnosy: + eric.araujo, bethard
messages: + msg120534
2010-07-18 11:55:05BreamoreBoysetnosy: + BreamoreBoy, aronacher

messages: + msg110638
versions: + Python 3.1, Python 3.2, - Python 2.6
2009-07-10 04:53:26ashsetversions: + Python 2.7
2009-07-08 07:30:33ashsetmessages: + msg90253
2009-05-16 18:18:06ajaksu2setversions: + Python 2.6, - Python 2.5
nosy: + loewis, haypo, ezio.melotti

priority: normal
type: behavior
stage: patch review
2008-06-18 08:26:58ivilatasetfiles: + optparse_unicode2.py
messages: + msg68360
2008-06-18 08:13:49ivilatasetfiles: + optparse_unicode.py
nosy: + ivilata
messages: + msg68359
2008-06-18 03:43:25sampablokupersetmessages: + msg68357
2008-06-17 23:29:16ashsetmessages: + msg68354
2008-06-17 15:59:33sampablokupersetnosy: + sampablokuper
messages: + msg68329
versions: + Python 2.5
2008-05-20 15:42:26ashsetfiles: + optparse.py.patch
keywords: + patch
messages: + msg67130
2008-05-20 15:31:18ashcreate