Message91917
I'd like to add some detail to the previous msg91473
The current behaviour of the character properties looks a bit
surprising sometimes:
>>>
>>> regex.findall(ur"\p{UppercaseLetter}", u"QW\p{UppercaseLetter}as")
[u'Q', u'W', u'U', u'L']
>>> regex.findall(ur"\p{Uppercase Letter}", u"QW\p{Uppercase Letter}as")
[u'\\p{Uppercase Letter}']
>>> regex.findall(ur"\p{UppercaseÄÄÄLetter}", u"QW\p
{UppercaseÄÄÄLetter}as")
[u'\\p{Uppercase\xc4\xc4\xc4Letter}']
>>> regex.findall(ur"\p{UppercaseQQQLetter}", u"QW\p
{UppercaseQQQLetter}as")
Traceback (most recent call last):
File "<pyshell#34>", line 1, in <module>
regex.findall(ur"\p{UppercaseQQQLetter}", u"QW\p
{UppercaseQQQLetter}as")
...
File "C:\Python26\lib\regex.py", line 1178, in _parse_property
raise error("undefined property name '%s'" % name)
error: undefined property name 'UppercaseQQQLetter'
>>>
i.e. potential property names consisting only from the ascii-letters
(+ _, -) are looked up and either used or an error is raised,
other names (containing whitespace or non-ascii letters) aren't treated
as a special expression, hence, they either match their literal value
or simply don't match (without errors).
Is this the intended behaviour?
I am not sure whether it is maybe defined somewhere, or there are some
de-facto standards for this...
I guess, the space in the property names might be allowed (unless there
are some implications for the parser...), otherwise the fallback
handling of invalid property names as normal strings is probably the
expected way.
vbr |
|
Date |
User |
Action |
Args |
2009-08-24 12:55:52 | vbr | set | recipients:
+ vbr, loewis, akuchling, georg.brandl, collinwinter, gregory.p.smith, jimjjewett, sjmachin, amaury.forgeotdarc, pitrou, nneonneo, rsc, timehorse, mark, ezio.melotti, mrabarnett, jaylogan, akitada, moreati, r.david.murray |
2009-08-24 12:55:51 | vbr | set | messageid: <1251118551.67.0.263524283717.issue2636@psf.upfronthosting.co.za> |
2009-08-24 12:55:50 | vbr | link | issue2636 messages |
2009-08-24 12:55:49 | vbr | create | |
|