Title: argparse breaks long lines on NO-BREAK SPACE
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.7, Python 3.6, Python 3.5
Status: closed Resolution: fixed
Assigned To: xiang.zhang Nosy List: martin.panter, paul.j3, peter.otten, python-dev, roysmith, serhiy.storchaka, steven.daprano, wim.glenn, xiang.zhang
Priority: normal Keywords: patch

Created on 2017-01-17 03:08 by steven.daprano, last changed 2022-04-11 14:58 by admin.

Messages (11)
msg285607 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2017-01-17 03:08
argparse help incorrectly breaks long lines on U+u00A0 NO-BREAK SPACE.

The attached script has been run on Python 3.5.3rc1 in a terminal window 80 columns wide, and it produces output::

    usage: [-h] [--no-condensedxxxx]

    optional arguments:
      -h, --help          show this help message and exit
      --no-condensedxxxx  Disable default font-style: condensed. Also disables "M+
                          1M" condensed monospace.

I expected the last line should have broken just before the "M+ 1M", rather than in the middle.

See also #20491.
msg285608 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2017-01-17 03:33
Here's a slightly simpler demo, without the (fortunately harmless) typo.
msg285609 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2017-01-17 03:39
Maybe a duplicate of Issue 16623
msg285610 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2017-01-17 04:09
textwrap has been fixed in #20491 but this problem still exists. The reason seems to be that argparse replaces the non-break spaces with spaces:

before self.whitespace_matcher.sub
'Disable default font-style: condensed.  Also disables "M+\\xa01M" condensed monospace.'
after self.whitespace_matcher.sub
'Disable default font-style: condensed. Also disables "M+ 1M" condensed monospace.'
msg285611 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2017-01-17 06:06
I think this is a regression when coming to 3.x. In 2.7, r'\s+' is by default in ASCII mode and won't match unicode non-breaking spaces. In 3.x it's by default unicode mode so non-breaking spaces are replaced by spaces. I think we can just use [ \t\n\r\f\v]+.

Since here are more active core devs I am going to close #16623 and move this forward here.
msg285984 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2017-01-22 04:02
v2 addresses the comments. I didn't receive the review notification mail so just saw them today. :-(
msg285991 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-01-22 06:26

You perhaps need to change your email provider Xiang. It fails too often.
msg285994 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2017-01-22 06:43
New changeset 98cde683b9c6 by Xiang Zhang in branch '3.5':
Issue #29290: argparse help messages won't wrap at non-breaking spaces.

New changeset 1754722ec296 by Xiang Zhang in branch '3.6':
Issue #29290: Merge 3.5.

New changeset c47a72627f0c by Xiang Zhang in branch 'default':
Issue #29290: Merge 3.6.
msg285995 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2017-01-22 06:45
Thanks Serhiy. 

BTW, #16623 is about 2.7 and the cause is wrap doesn't handle unicode non-breaking spaces right. So it's not the same thing as here.
msg293472 - (view) Author: wim glenn (wim.glenn) * Date: 2017-05-11 04:16
The test "test_help_non_breaking_spaces" from Zhang's commit fails on my platform (other 1563 tests in the module all pass).  

Interestingly, if running the entire test suite, it doesn't fail.  It's only when executing the module directly that causes the failure.    

I guess there is some mutable state, or the test doesn't entirely setup correctly.
msg293473 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2017-05-11 05:18
I can't reproduce the failure in anyway. :-( Could you do some investigation and give more info?
