Message 255558 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	samwyse
Recipients	samwyse
Date	2015-11-29.02:07:23
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1448762844.39.0.812229060367.issue25760@psf.upfronthosting.co.za>
In-reply-to

Content
Single character words in a hyphenated phrase are not split correctly. The root issue it the wordsep_re class variable. To reproduce, run the following: >>> import textwrap >>> textwrap.TextWrapper.wordsep_re.split('two-and-a-half-hour') ['', 'two-', 'and-a', '-half-', 'hour'] It works if 'a' is replaces with two or more alphabetic characters. >>> textwrap.TextWrapper.wordsep_re.split('two-and-aa-half-hour') ['', 'two-', '', 'and-', '', 'aa-', '', 'half-', 'hour'] The problem is in this part of the pattern: (?=\w+[^0-9\W]) I confess that I don't understand the situation that would require that complicated of a pattern. Why wouldn't (?=\w) would work?

Single character words in a hyphenated phrase are not split correctly.  The root issue it the wordsep_re class variable.  To reproduce, run the following:

>>> import textwrap
>>> textwrap.TextWrapper.wordsep_re.split('two-and-a-half-hour')
['', 'two-', 'and-a', '-half-', 'hour']

It works if 'a' is replaces with two or more alphabetic characters.

>>> textwrap.TextWrapper.wordsep_re.split('two-and-aa-half-hour')
['', 'two-', '', 'and-', '', 'aa-', '', 'half-', 'hour']

The problem is in this part of the pattern:  (?=\w+[^0-9\W])

I confess that I don't understand the situation that would require that complicated of a pattern.  Why wouldn't (?=\w) would work?

History
Date	User	Action	Args
2015-11-29 02:07:24	samwyse	set	recipients: + samwyse
2015-11-29 02:07:24	samwyse	set	messageid: <1448762844.39.0.812229060367.issue25760@psf.upfronthosting.co.za>
2015-11-29 02:07:23	samwyse	link	issue25760 messages
2015-11-29 02:07:23	samwyse	create