Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

textwrap: Non-breaking space not honored #64690

Closed
kunkku mannequin opened this issue Feb 2, 2014 · 13 comments
Closed

textwrap: Non-breaking space not honored #64690

kunkku mannequin opened this issue Feb 2, 2014 · 13 comments
Labels
3.7 (EOL) end of life stdlib Python modules in the Lib dir topic-unicode type-bug An unexpected behavior, bug, or error

Comments

@kunkku
Copy link
Mannequin

kunkku mannequin commented Feb 2, 2014

BPO 20491
Nosy @malemburg, @loewis, @birkenfeld, @pitrou, @vstinner, @benjaminp, @mcepl, @ezio-melotti, @merwok, @bitdancer, @serhiy-storchaka, @Mariatta
PRs
  • [Do Not Merge] Convert Misc/NEWS so that it is managed by towncrier #552
  • Files
  • textwrap-honor-non-breaking-spaces.patch: Suggested correction
  • textwrap-honor-non-breaking-spaces.patch: Correction with test cases added
  • honor-non-breaking-spaces.patch: Correction with C-style formatting
  • new_textwrap.patch: patch with test for NARROW NO-BREAK SPACE
  • issue20491_verbose.patch: Verbose regex patch.
  • honor-non-breaking-spaces2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2016-10-25.11:48:32.559>
    created_at = <Date 2014-02-02.20:46:37.754>
    labels = ['3.7', 'type-bug', 'library', 'expert-unicode']
    title = 'textwrap: Non-breaking space not honored'
    updated_at = <Date 2017-05-10.23:06:51.432>
    user = 'https://bugs.python.org/kunkku'

    bugs.python.org fields:

    activity = <Date 2017-05-10.23:06:51.432>
    actor = 'mcepl'
    assignee = 'none'
    closed = True
    closed_date = <Date 2016-10-25.11:48:32.559>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)', 'Unicode']
    creation = <Date 2014-02-02.20:46:37.754>
    creator = 'kunkku'
    dependencies = []
    files = ['33872', '33890', '33911', '34497', '34827', '44968']
    hgrepos = []
    issue_num = 20491
    keywords = ['patch']
    message_count = 13.0
    messages = ['210013', '210026', '210187', '213605', '213642', '213942', '214019', '214032', '216130', '277969', '278094', '278114', '279395']
    nosy_count = 17.0
    nosy_names = ['lemburg', 'loewis', 'georg.brandl', 'pitrou', 'vstinner', 'benjamin.peterson', 'mcepl', 'ezio.melotti', 'eric.araujo', 'r.david.murray', 'python-dev', 'joebauer', 'serhiy.storchaka', 'kunkku', 'dbudinova', 'maatt', 'Mariatta']
    pr_nums = ['552']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue20491'
    versions = ['Python 3.5', 'Python 3.6', 'Python 3.7']

    @kunkku
    Copy link
    Mannequin Author

    kunkku mannequin commented Feb 2, 2014

    The textwrap module does not distinguish non-breaking space (\xa0) from other whitespace when determining word boundaries.

    In the beginning of the module, the _whitespace variable is defined to address this issue but is not used in the regular expressions determining the splitting rules.

    @kunkku kunkku mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Feb 2, 2014
    @pitrou
    Copy link
    Member

    pitrou commented Feb 2, 2014

    Thanks for the patch, Kaarle. Could you add some tests in Lib/test/test_textwrap?

    Also, for your contribution to be integrated, we'll need you to sign a contributor's agreement: http://www.python.org/psf/contrib/contrib-form/

    @serhiy-storchaka
    Copy link
    Member

    It looks to me that code can be a little more clear if use C-style formatting.

    @merwok
    Copy link
    Member

    merwok commented Mar 15, 2014

    Using a multiline regex (with re.VERBOSE) would also avoid the clutter of parens and quotes.

    @serhiy-storchaka
    Copy link
    Member

    What about other spaces: '\N{OGHAM SPACE MARK}', '\N{EN QUAD}', '\N{EM QUAD}', '\N{EN SPACE}', '\N{EM SPACE}', '\N{THREE-PER-EM SPACE}', '\N{FOUR-PER-EM SPACE}', '\N{SIX-PER-EM SPACE}', '\N{FIGURE SPACE}', '\N{PUNCTUATION SPACE}', '\N{THIN SPACE}', '\N{HAIR SPACE}', '\N{LINE SEPARATOR}', '\N{PARAGRAPH SEPARATOR}', '\N{NARROW NO-BREAK SPACE}', '\N{MEDIUM MATHEMATICAL SPACE}', '\N{IDEOGRAPHIC SPACE}'? In Python 2 textwrap supported only 8-bit spaces, but Python 3 should support full Unicode. And from this side of view the proposed patch is a regression.

    @merwok
    Copy link
    Member

    merwok commented Mar 18, 2014

    NON-BREAKING SPACE and NARROW NON-BREAKING SPACE are characters whose intent is clear and who are used by knowledgeable users and smart software, for example LibreOffice with an fr_FR locale. I don’t know about the other characters listed by Serhiy, and I wouldn’t worry about them unless users requested support for them or another core dev explained why they should be supported.

    A comment at the start of the module (where _whitespace, used in the patch here, is defined) even talks about NBSP; it is focused on bytes though and should be updated for the Python 3 unicode world.

    @dbudinova
    Copy link
    Mannequin

    dbudinova mannequin commented Mar 18, 2014

    changed honor-non-breaking-spaces.patch:
    used \N{NO-BREAK SPACE} instead of \xa0

    added test for \N{NARROW NO-BREAK SPACE}

    @merwok
    Copy link
    Member

    merwok commented Mar 18, 2014

    Thank you, this looks really good. I left some comments on rietveld.

    @maatt
    Copy link
    Mannequin

    maatt mannequin commented Apr 14, 2014

    Patch on top of dbudinova's that attempts to replace the concatenation of strings with a verbose regex.

    @joebauer
    Copy link
    Mannequin

    joebauer mannequin commented Oct 3, 2016

    Hey there,

    wanted to follow up on the state of this... is there a reason why this has not made it into vanilla yet? If so, I'd like to try to help out clear impediments if I can.

    This issue is *really*, really, really annoying me. I've posted about a year ago on python-list (http://code.activestate.com/lists/python-list/685604/) and was referred to this bug and thought I'd wait it out. But now the last change was 2 years ago and no relief in sight.

    So if nothing else, please take it as a gentle reassurance that this bug is really affecting real-world scenarios and annoying as hell. Especially since the semantic of a non-breaking space is pretty much exactly to *not* break on text wrapping.

    If there's anything I can contribute to get things going again, by all means please let me know. All hands on deck!

    Cheers,
    Johannes

    @bitdancer
    Copy link
    Member

    It probably just got forgotten. If you want to help move it forward please do a review of the patch (see https://docs.python.org/devguide/tracker.html#reviewing-patches), including whether or not all outstanding review comments have been addressed, and post your recommendations here.

    @bitdancer bitdancer added the 3.7 (EOL) end of life label Oct 5, 2016
    @serhiy-storchaka
    Copy link
    Member

    The code of the textwrap module was changed since publishing the last patch. Proposed patch resolves conflicts and addresses Eric's comments.

    Maybe add breaking Unicode spaces (OGHAM SPACE MARK, EN QUAD, etc) to _whitespace?

    I think in future we should implement the Unicode line breaking algorithm [1].

    [1] http://www.unicode.org/reports/tr14/

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 25, 2016

    New changeset fcabef0ce773 by Serhiy Storchaka in branch '3.5':
    Issue bpo-20491: The textwrap.TextWrapper class now honors non-breaking spaces.
    https://hg.python.org/cpython/rev/fcabef0ce773

    New changeset bfa400108fc5 by Serhiy Storchaka in branch '3.6':
    Issue bpo-20491: The textwrap.TextWrapper class now honors non-breaking spaces.
    https://hg.python.org/cpython/rev/bfa400108fc5

    New changeset b86dacb9e668 by Serhiy Storchaka in branch 'default':
    Issue bpo-20491: The textwrap.TextWrapper class now honors non-breaking spaces.
    https://hg.python.org/cpython/rev/b86dacb9e668

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life stdlib Python modules in the Lib dir topic-unicode type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants