Message 291107 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	Robert Lujo
Recipients	Robert Lujo, ezio.melotti, mrabarnett
Date	2017-04-04.07:00:40
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1491289242.07.0.894132585391.issue29977@psf.upfronthosting.co.za>
In-reply-to

Content
Hello, I assume I have hit some bug/misbehaviour in re module. I will provide you "working" example: import re RE_C_COMMENTS = re.compile(r"/\(.\|\s)?\/", re.MULTILINE\|re.DOTALL\|re.UNICODE) text = "Special section / valves:\n\n\nsilicone\n\n\n\n\n\n\nHarness:\n\n\nmetal and plastic fibre\n\n\n\n\n\n\nInner frame:\n\n\nmultibutylene\n\n\n\n\n\n\nWeight:\n\n\n147 g\n\n\n\n\n\n\n\n\n\n\n\n\n\nSelection guide\n" and then this command takes forever: RE_C_COMMENTS.sub(" ", text, re.MULTILINE\|re.DOTALL\|re.UNICODE) and the same problem you can notice on first 90 chars, it takes 10s on my machine: RE_C_COMMENTS.sub(" ", text[:90], re.MULTILINE\|re.DOTALL\|re.UNICODE) Some clarification: I try to remove the C style comments from text with non-greedy regular expression, and in this case start of comment (/) is found, and end of comment (/) can not be found. Notice the multiline and other re options. Python versions used: '2.7.11 (default, Jan 22 2016, 16:30:50) \n[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]' / macOs 10.12.13 and: '2.7.12 (default, Nov 19 2016, 06:48:10) \n[GCC 5.4.0 20160609]' -> Linux 84-Ubuntu SMP Wed Feb 1 17:20:32 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Hello, 

I assume I have hit some bug/misbehaviour in re module. I will provide you "working" example:

import re
RE_C_COMMENTS    = re.compile(r"/\*(.|\s)*?\*/", re.MULTILINE|re.DOTALL|re.UNICODE)
text = "Special section /* valves:\n\n\nsilicone\n\n\n\n\n\n\nHarness:\n\n\nmetal and plastic fibre\n\n\n\n\n\n\nInner frame:\n\n\nmultibutylene\n\n\n\n\n\n\nWeight:\n\n\n147 g\n\n\n\n\n\n\n\n\n\n\n\n\n\nSelection guide\n"

and then this command takes forever:
RE_C_COMMENTS.sub(" ", text, re.MULTILINE|re.DOTALL|re.UNICODE)

and the same problem you can notice on first 90 chars, it takes 10s on my machine:
RE_C_COMMENTS.sub(" ", text[:90], re.MULTILINE|re.DOTALL|re.UNICODE)

Some clarification: I try to remove the C style comments from text with non-greedy regular expression, and in this case start of comment (/*) is found, and end of comment (*/) can not be found. Notice the multiline and other re options.

Python versions used: 

'2.7.11 (default, Jan 22 2016, 16:30:50) \n[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]' / macOs 10.12.13

and:
'2.7.12 (default, Nov 19 2016, 06:48:10) \n[GCC 5.4.0 20160609]' -> 
Linux 84-Ubuntu SMP Wed Feb 1 17:20:32 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

History
Date	User	Action	Args
2017-04-04 07:00:42	Robert Lujo	set	recipients: + Robert Lujo, ezio.melotti, mrabarnett
2017-04-04 07:00:42	Robert Lujo	set	messageid: <1491289242.07.0.894132585391.issue29977@psf.upfronthosting.co.za>
2017-04-04 07:00:42	Robert Lujo	link	issue29977 messages
2017-04-04 07:00:40	Robert Lujo	create