Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TextWrapper break_long_words=True, break_on_hyphens=True on long words #72846

Closed
peterjc mannequin opened this issue Nov 10, 2016 · 5 comments
Closed

TextWrapper break_long_words=True, break_on_hyphens=True on long words #72846

peterjc mannequin opened this issue Nov 10, 2016 · 5 comments
Labels
3.7 (EOL) end of life stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@peterjc
Copy link
Mannequin

peterjc mannequin commented Nov 10, 2016

BPO 28660
Nosy @birkenfeld, @peterjc, @serhiy-storchaka, @iritkatriel
PRs
  • bpo-28660: make TextWrapper break long words on hyphens  #22721
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-10-18.17:14:00.555>
    created_at = <Date 2016-11-10.16:35:01.718>
    labels = ['3.7', 'type-bug', 'library']
    title = 'TextWrapper break_long_words=True, break_on_hyphens=True on long words'
    updated_at = <Date 2020-10-18.17:14:00.555>
    user = 'https://github.com/peterjc'

    bugs.python.org fields:

    activity = <Date 2020-10-18.17:14:00.555>
    actor = 'serhiy.storchaka'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-10-18.17:14:00.555>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)']
    creation = <Date 2016-11-10.16:35:01.718>
    creator = 'maubp'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 28660
    keywords = ['patch']
    message_count = 5.0
    messages = ['280522', '304806', '378718', '378724', '378878']
    nosy_count = 4.0
    nosy_names = ['georg.brandl', 'maubp', 'serhiy.storchaka', 'iritkatriel']
    pr_nums = ['22721']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue28660'
    versions = ['Python 2.7', 'Python 3.5', 'Python 3.6', 'Python 3.7']

    @peterjc
    Copy link
    Mannequin Author

    peterjc mannequin commented Nov 10, 2016

    Quoting https://docs.python.org/2/library/textwrap.html

    width (default: 70) The maximum length of wrapped lines. As long as there are no individual words in the input text longer than width, TextWrapper guarantees that no output line will be longer than width characters.

    It appears that with break_long_words=True and break_on_hyphens=True, any hyphenated term longer than the specified width does not get preferentially broken at a hyphen.

    Example input:

    We used the enyzme 2-succinyl-6-hydroxy-2,4-cyclohexadiene-1-carboxylate synthase.

    Using break_long_words=True, break_on_hyphens=True
    ==================================================
    We used the enyzme 2-succinyl-6-hydroxy-2,4-cycloh
    exadiene-1-carboxylate synthase.
    ==================================================

    Expected result using break_long_words=True, break_on_hyphens=True
    ==================================================
    We used the enyzme 2-succinyl-6-hydroxy-2,4-
    cyclohexadiene-1-carboxylate synthase.
    ==================================================

    Given a width=50, then the 53 character long "word" of "2-succinyl-6-hydroxy-2,4-cyclohexadiene-1-carboxylate" must be split somewhere, and since break_on_hyphens=True it should break at a hyphen as shown above as the desired output.

    Sample code:

    import textwrap
    w = 50
    text = "We used the enyzme 2-succinyl-6-hydroxy-2,4-cyclohexadiene-1-carboxylate synthase."
    print("Input:")
    print("=" * w)
    print(text)
    print("=" * w)
    print("Using break_long_words=True, break_on_hyphens=True")
    print("=" * w)
    print(textwrap.fill(text, width=w, break_long_words=True, break_on_hyphens=True))
    print("=" * w)

    @serhiy-storchaka serhiy-storchaka added stdlib Python modules in the Lib dir 3.7 (EOL) end of life type-bug An unexpected behavior, bug, or error labels Nov 10, 2016
    @serhiy-storchaka
    Copy link
    Member

    This is because the current algorithm of breaking on hyphens allows to break only between letters. This prevents breaking dates and times. Perhaps it should be made more lenient in the case of too long word.

    @iritkatriel
    Copy link
    Member

    textwrap does not actually apply the break-on-hyphen algorithm at all to long words. It just chops them up into depth-sized pieces.

    The PR I just submitted looks for hyphens and uses them as cut points if they exist, without any attempt to understand their context.

    @iritkatriel
    Copy link
    Member

    Actually I see what Serhiy meant about the hyphen algorithm - the regex breaking up words. Yes, this is applied to long words and the reason he stated for this issue is correct.

    It is probably possible to make that regex understand width and long-words, but it would be more complicated and will need to be recalculated for each width. I think long words are not the typical input, so it's better to handle them separately and keep the rest simple.

    @serhiy-storchaka
    Copy link
    Member

    New changeset b81c833 by Irit Katriel in branch 'master':
    bpo-28660: Make TextWrapper break long words on hyphens (GH-22721)
    b81c833

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants