Issue1054564
Created on 2004-10-26 12:55 by rwhent, last changed 2005-02-14 11:35 by effbot. This issue is now closed.
| Messages (5) | |||
|---|---|---|---|
| msg22865 - (view) | Author: Rob (rwhent) | Date: 2004-10-26 12:55 | |
Whilst parsing some extremely long strings I found that the
re.match causes segmentation faults on Solaris 2.8
when strings being matched contain '*.?' and the
contents of the regex which matches this part of the
regex exceeds 10000 chars (actually it seemed to be
exactly at 8192 chars)
This is the regex used:
if re.match('^.*?\[\s*[A-Za-z_0-9]+\s*\].*',string):
This regex looks for '[alphaNum_]' present in a large
string
When it failed the string was 8192 chars long with no
matching '[alphaNum_]' present. If I reduce the length
of the string below 8192 it works ok.
This is a major issue to my application as some string
to be parsed are very large. I saw some discussion on
another bulletin board with a similar issue
|
|||
| msg22866 - (view) | Author: Fredrik Lundh (effbot) * ![]() |
Date: 2004-10-26 13:20 | |
Logged In: YES user_id=38376 The max recursion limit problem in the re module is well-known. Until this limitation in the implementation is removed, to work around it check http://www.python.org/dev/doc/devel/lib/module-re.html http://python/org/sf/493252 |
|||
| msg22867 - (view) | Author: Fredrik Lundh (effbot) * ![]() |
Date: 2004-10-26 13:24 | |
Logged In: YES
user_id=38376
btw, if you're searching for things, why not use the "search"
method?
if re.search('\[\s*[A-Za-z_0-9]+\s*\]', string):
(also, "[A-Za-z_0-9]" is better spelled "\w")
|
|||
| msg22868 - (view) | Author: Josiah Carlson (josiahcarlson) | Date: 2004-10-30 15:44 | |
Logged In: YES
user_id=341410
In the case of this particular search, you could write your
own little searcher. The following could likely be done
better, but this is a quick 5-minute job that won't core on
you unless something is really wrong with Python, and may be
a reasonable stopgap until someone re-does the regular
expression library.
import string
def find_thing(s):
sp = 0
d = dict.fromkeys(list(string.letters+string.digits+'_'))
while sp < len(s):
start = None
for i in xrange(sp, len(s)):
if s[i] == '[':
start = i
break
if start is None:
return
for i in xrange(start+1, len(s)):
if s[i] in d:
continue
elif s[i] == ']':
return s[start:i+1]
else:
sp = i
break
It returns None on failure to find, and the string otherwise.
|
|||
| msg22869 - (view) | Author: Fredrik Lundh (effbot) * ![]() |
Date: 2005-02-14 11:35 | |
Logged In: YES user_id=38376 closing, due to lack of feedback. suggested workarounds should solve the problem. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2004-10-26 12:55:58 | rwhent | create | |
