Author vbr
Recipients akitada, akuchling, amaury.forgeotdarc, collinwinter, doerwalter, ezio.melotti, georg.brandl, gregory.p.smith, jaylogan, jimjjewett, loewis, mark, moreati, mrabarnett, nneonneo, pitrou, rsc, sjmachin, timehorse, vbr
Date 2009-08-10.19:27:46
SpamBayes Score 8.54317e-14
Marked as misclassified No
Message-id <1249932468.1.0.843595392458.issue2636@psf.upfronthosting.co.za>
In-reply-to
Content
I'd like to confirm, that the above reported error is fixed in 
issue2636-20090810#2.zip
While testing the new features a bit, I noticed some irregularity in 
handling the Unicode Character Properties; 
I tried randomly some of those mentioned at http://www.regular-
expressions.info/unicode.html using the simple findall like above.

It seems, that only the short abbreviated forms of the properties are 
supported, however, the long variants are handled in different ways.
Namely, the properties names containing whitespace or other non-letter 
characters cause some probably unexpected exception:

>>> regex.findall(ur"\p{Ll}", u"abcDEF")
[u'a', u'b', u'c']
# works ok

\p{LowercaseLetter} isn't supported, but seems to be handled, as it 
throws "error: undefined property name" at the end of the traceback.

\p{Lowercase Letter} \p{Lowercase_Letter} \p{Lowercase-Letter} 
isn't probably expected, the traceback is:

>>> regex.findall(ur"\p{Lowercase_Letter}", u"abcDEF")
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "C:\Python25\lib\regex.py", line 194, in findall
    return _compile(pattern, flags).findall(string)
  File "C:\Python25\lib\regex.py", line 386, in _compile
    parsed = _parse_pattern(source, info)
  File "C:\Python25\lib\regex.py", line 465, in _parse_pattern
    branches = [_parse_sequence(source, info)]
  File "C:\Python25\lib\regex.py", line 477, in _parse_sequence
    item = _parse_item(source, info)
  File "C:\Python25\lib\regex.py", line 485, in _parse_item
    element = _parse_element(source, info)
  File "C:\Python25\lib\regex.py", line 610, in _parse_element
    return _parse_escape(source, info, False)
  File "C:\Python25\lib\regex.py", line 844, in _parse_escape
    return _parse_property(source, ch == "p", here, in_set)
  File "C:\Python25\lib\regex.py", line 983, in _parse_property
    if info.local_flags & IGNORECASE and not in_set:
NameError: global name 'info' is not defined
>>> 

Of course, arbitrary strings other than properties names are handled 
identically.

Python 2.6.2 version behaves the same like 2.5.4.

vbr
History
Date User Action Args
2009-08-10 19:27:48vbrsetrecipients: + vbr, loewis, akuchling, doerwalter, georg.brandl, collinwinter, gregory.p.smith, jimjjewett, sjmachin, amaury.forgeotdarc, pitrou, nneonneo, rsc, timehorse, mark, ezio.melotti, mrabarnett, jaylogan, akitada, moreati
2009-08-10 19:27:48vbrsetmessageid: <1249932468.1.0.843595392458.issue2636@psf.upfronthosting.co.za>
2009-08-10 19:27:46vbrlinkissue2636 messages
2009-08-10 19:27:46vbrcreate