Title: Bug in argparse - not supporting utf8
Type: compile error Stage: resolved
Components: Versions:
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: Ali Razmjoo, josh.r, methane, r.david.murray
Priority: normal Keywords:

Created on 2017-09-14 17:48 by Ali Razmjoo, last changed 2017-09-30 13:38 by serhiy.storchaka. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 3577 closed Ali Razmjoo, 2017-09-14 17:48
Messages (7)
msg302190 - (view) Author: Ali Razmjoo (Ali Razmjoo) * Date: 2017-09-14 17:48
Regarding #3468 discussion, there is the same bug was in argparse (and optparse) which fixed in this PR. utf8 is not supported in argprase module

current pr:

msg302191 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-09-14 17:51
As I requested in the PR, please provide a way to reproduce the bug you are reporting.
msg302192 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-09-14 17:56
Note that as far as I know without a reproducer, it is confusing to me to talk about argparse supporting or not supporting utf8.  It deals only with text strings, which are unicode.  Or is this a 2.7 only bug report?  (Although even there it would be a question of unicode support, not utf8).
msg302379 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2017-09-17 18:01
You reported stack trace on Github pull request.
But discussion should be made here, not in pull request.

As far as reading traceback, your problem is solved already in Python 3.6, by PEP 528.
msg302910 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2017-09-25 03:51
May I close this issue and pull request?
msg302964 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2017-09-25 17:00
Based on the OP's patch, it looks like they have a problem where they have non-ASCII text in their output strings (either due to using non-ASCII switches, or using non-ASCII help documentation), but sys.stdout/sys.stderr are configured for some encoding that doesn't support said characters, so they're getting exceptions when the help message is sent to the screen automatically (e.g. by running with --help).

It's only sort of a bug in Python: Fundamentally, the problem is a script that assumes arbitrary Unicode support being run under a locale that doesn't provide it. The solution provided is bad though: It shouldn't be trying to force UTF8 output regardless of locale.

A simple repro, at least on Linux-like systems, would be to run Python with LANG=C (and no other LC variables set), then do:

    import argparse
    parser = argparse.ArgumentParser()
    parser.add_argument('-f', help=chr(233)) # help is 'é'

While the patch as given is wrong (with the exception of Windows weirdness, blithely ignoring/second-guessing the locale is a terrible idea), it's not a terrible idea to fix this in some way; if nothing else, it might make sense to have some fallback approach when the exception is raised (e.g. encoding the output with errors='ignore' or the like) so running --help at least provides *some* output even with incompatible locales, rather than dying with an error in the help message handling code itself.
msg303407 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2017-09-30 12:00
We already accept PEP 529 and PEP 538.
And there are #15216 (PR 2343).
So I don't think we need more solutions.
Date User Action Args
2017-09-30 13:38:44serhiy.storchakasetstatus: open -> closed
resolution: rejected
stage: resolved
2017-09-30 12:00:59methanesetmessages: + msg303407
2017-09-25 17:00:43josh.rsetstatus: pending -> open
nosy: + josh.r
messages: + msg302964

2017-09-25 05:19:24serhiy.storchakasetstatus: open -> pending
2017-09-25 03:51:06methanesetmessages: + msg302910
2017-09-17 18:01:01methanesetnosy: + methane
messages: + msg302379
2017-09-14 17:56:25r.david.murraysetmessages: + msg302192
2017-09-14 17:51:37r.david.murraysetnosy: + r.david.murray
messages: + msg302191
2017-09-14 17:48:43Ali Razmjoocreate