This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author samwyse
Recipients samwyse
Date 2015-11-29.02:07:23
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1448762844.39.0.812229060367.issue25760@psf.upfronthosting.co.za>
In-reply-to
Content
Single character words in a hyphenated phrase are not split correctly.  The root issue it the wordsep_re class variable.  To reproduce, run the following:

>>> import textwrap
>>> textwrap.TextWrapper.wordsep_re.split('two-and-a-half-hour')
['', 'two-', 'and-a', '-half-', 'hour']

It works if 'a' is replaces with two or more alphabetic characters.

>>> textwrap.TextWrapper.wordsep_re.split('two-and-aa-half-hour')
['', 'two-', '', 'and-', '', 'aa-', '', 'half-', 'hour']

The problem is in this part of the pattern:  (?=\w+[^0-9\W])

I confess that I don't understand the situation that would require that complicated of a pattern.  Why wouldn't (?=\w) would work?
History
Date User Action Args
2015-11-29 02:07:24samwysesetrecipients: + samwyse
2015-11-29 02:07:24samwysesetmessageid: <1448762844.39.0.812229060367.issue25760@psf.upfronthosting.co.za>
2015-11-29 02:07:23samwyselinkissue25760 messages
2015-11-29 02:07:23samwysecreate