classification
Title: textwrap: Non-breaking space not honored
Type: behavior Stage: resolved
Components: Library (Lib), Unicode Versions: Python 3.7, Python 3.6, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Mariatta, benjamin.peterson, dbudinova, eric.araujo, ezio.melotti, georg.brandl, joebauer, kunkku, lemburg, loewis, maatt, mcepl, pitrou, python-dev, r.david.murray, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2014-02-02 20:46 by kunkku, last changed 2017-05-10 23:06 by mcepl. This issue is now closed.

Files
File name Uploaded Description Edit
textwrap-honor-non-breaking-spaces.patch kunkku, 2014-02-02 20:46 Suggested correction review
textwrap-honor-non-breaking-spaces.patch kunkku, 2014-02-03 21:32 Correction with test cases added review
honor-non-breaking-spaces.patch kunkku, 2014-02-04 17:39 Correction with C-style formatting review
new_textwrap.patch dbudinova, 2014-03-18 21:21 patch with test for NARROW NO-BREAK SPACE review
issue20491_verbose.patch maatt, 2014-04-14 17:40 Verbose regex patch.
honor-non-breaking-spaces2.patch serhiy.storchaka, 2016-10-05 07:39 review
Pull Requests
URL Status Linked Edit
PR 552 closed dstufft, 2017-03-31 16:36
Messages (13)
msg210013 - (view) Author: Kaarle Ritvanen (kunkku) * Date: 2014-02-02 20:46
The textwrap module does not distinguish non-breaking space (\xa0) from other whitespace when determining word boundaries.

In the beginning of the module, the _whitespace variable is defined to address this issue but is not used in the regular expressions determining the splitting rules.
msg210026 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-02-02 22:00
Thanks for the patch, Kaarle. Could you add some tests in Lib/test/test_textwrap?

Also, for your contribution to be integrated, we'll need you to sign a contributor's agreement: http://www.python.org/psf/contrib/contrib-form/
msg210187 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-02-04 09:41
It looks to me that code can be a little more clear if use C-style formatting.
msg213605 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2014-03-15 00:46
Using a multiline regex (with re.VERBOSE) would also avoid the clutter of parens and quotes.
msg213642 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-03-15 06:16
What about other spaces: '\N{OGHAM SPACE MARK}', '\N{EN QUAD}', '\N{EM QUAD}', '\N{EN SPACE}', '\N{EM SPACE}', '\N{THREE-PER-EM SPACE}', '\N{FOUR-PER-EM SPACE}', '\N{SIX-PER-EM SPACE}', '\N{FIGURE SPACE}', '\N{PUNCTUATION SPACE}', '\N{THIN SPACE}', '\N{HAIR SPACE}', '\N{LINE SEPARATOR}', '\N{PARAGRAPH SEPARATOR}', '\N{NARROW NO-BREAK SPACE}', '\N{MEDIUM MATHEMATICAL SPACE}', '\N{IDEOGRAPHIC SPACE}'? In Python 2 textwrap supported only 8-bit spaces, but Python 3 should support full Unicode. And from this side of view the proposed patch is a regression.
msg213942 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2014-03-18 04:17
NON-BREAKING SPACE and NARROW NON-BREAKING SPACE are characters whose intent is clear and who are used by knowledgeable users and smart software, for example LibreOffice with an fr_FR locale.  I don’t know about the other characters listed by Serhiy, and I wouldn’t worry about them unless users requested support for them or another core dev explained why they should be supported.

A comment at the start of the module (where _whitespace, used in the patch here, is defined) even talks about NBSP; it is focused on bytes though and should be updated for the Python 3 unicode world.
msg214019 - (view) Author: dani (dbudinova) * Date: 2014-03-18 21:21
changed honor-non-breaking-spaces.patch:
used \N{NO-BREAK SPACE} instead of \xa0

added test for \N{NARROW NO-BREAK SPACE}
msg214032 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2014-03-18 22:27
Thank you, this looks really good.  I left some comments on rietveld.
msg216130 - (view) Author: Matt Chaput (maatt) Date: 2014-04-14 17:40
Patch on top of dbudinova's that attempts to replace the concatenation of strings with a verbose regex.
msg277969 - (view) Author: Johannes Bauer (joebauer) Date: 2016-10-03 17:55
Hey there,

wanted to follow up on the state of this... is there a reason why this has not made it into vanilla yet? If so, I'd like to try to help out clear impediments if I can.

This issue is *really*, really, really annoying me. I've posted about a year ago on python-list (http://code.activestate.com/lists/python-list/685604/) and was referred to this bug and thought I'd wait it out. But now the last change was 2 years ago and no relief in sight.

So if nothing else, please take it as a gentle reassurance that this bug is really affecting real-world scenarios and annoying as hell. Especially since the semantic of a non-breaking space is pretty much exactly to *not* break on text wrapping.

If there's anything I can contribute to get things going again, by all means please let me know. All hands on deck!

Cheers,
Johannes
msg278094 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-10-05 00:02
It probably just got forgotten.  If you want to help move it forward please do a review of the patch (see https://docs.python.org/devguide/tracker.html#reviewing-patches), including whether or not all outstanding review comments have been addressed, and post your recommendations here.
msg278114 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-10-05 07:39
The code of the textwrap module was changed since publishing the last patch. Proposed patch resolves conflicts and addresses Eric's comments.

Maybe add breaking Unicode spaces (OGHAM SPACE MARK, EN QUAD, etc) to _whitespace?

I think in future we should implement the Unicode line breaking algorithm [1].

[1] http://www.unicode.org/reports/tr14/
msg279395 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-10-25 11:48
New changeset fcabef0ce773 by Serhiy Storchaka in branch '3.5':
Issue #20491: The textwrap.TextWrapper class now honors non-breaking spaces.
https://hg.python.org/cpython/rev/fcabef0ce773

New changeset bfa400108fc5 by Serhiy Storchaka in branch '3.6':
Issue #20491: The textwrap.TextWrapper class now honors non-breaking spaces.
https://hg.python.org/cpython/rev/bfa400108fc5

New changeset b86dacb9e668 by Serhiy Storchaka in branch 'default':
Issue #20491: The textwrap.TextWrapper class now honors non-breaking spaces.
https://hg.python.org/cpython/rev/b86dacb9e668
History
Date User Action Args
2017-05-10 23:06:51mceplsetnosy: + mcepl
2017-03-31 16:36:33dstufftsetpull_requests: + pull_request1067
2016-10-25 11:48:32serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2016-10-25 11:48:11python-devsetnosy: + python-dev
messages: + msg279395
2016-10-05 07:39:21serhiy.storchakasetfiles: + honor-non-breaking-spaces2.patch

messages: + msg278114
2016-10-05 00:36:08Mariattasetnosy: + Mariatta
2016-10-05 00:02:55r.david.murraysetversions: + Python 3.6, Python 3.7, - Python 2.7, Python 3.4
2016-10-05 00:02:36r.david.murraysetnosy: + r.david.murray
messages: + msg278094
2016-10-03 17:55:57joebauersetnosy: + joebauer
messages: + msg277969
2014-04-14 17:40:13maattsetfiles: + issue20491_verbose.patch
nosy: + maatt
messages: + msg216130

2014-03-18 22:27:24eric.araujosetstage: test needed -> patch review
messages: + msg214032
versions: + Python 3.5, - Python 3.3
2014-03-18 21:21:27dbudinovasetfiles: + new_textwrap.patch
nosy: + dbudinova
messages: + msg214019

2014-03-18 04:17:53eric.araujosetmessages: + msg213942
2014-03-15 06:18:48serhiy.storchakasetnosy: + loewis
2014-03-15 06:17:25serhiy.storchakasetnosy: + lemburg, vstinner, benjamin.peterson, ezio.melotti
components: + Unicode
2014-03-15 06:16:35serhiy.storchakasetmessages: + msg213642
2014-03-15 00:46:37eric.araujosetnosy: + eric.araujo
messages: + msg213605
2014-02-04 17:39:39kunkkusetfiles: + honor-non-breaking-spaces.patch
2014-02-04 09:41:03serhiy.storchakasetmessages: + msg210187
2014-02-03 21:32:46kunkkusetfiles: + textwrap-honor-non-breaking-spaces.patch
2014-02-02 22:00:49pitrousetnosy: + pitrou
messages: + msg210026
2014-02-02 21:16:47serhiy.storchakasetnosy: + georg.brandl, serhiy.storchaka
stage: test needed

versions: + Python 2.7, - Python 3.5
2014-02-02 20:46:37kunkkucreate