This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vbr
Recipients akitada, akuchling, amaury.forgeotdarc, collinwinter, ezio.melotti, georg.brandl, gregory.p.smith, jaylogan, jimjjewett, loewis, mark, moreati, mrabarnett, nneonneo, pitrou, r.david.murray, rsc, sjmachin, timehorse, vbr
Date 2009-08-24.12:55:49
SpamBayes Score 1.9026234e-09
Marked as misclassified No
Message-id <1251118551.67.0.263524283717.issue2636@psf.upfronthosting.co.za>
In-reply-to
Content
I'd like to add some detail to the previous msg91473

The current behaviour of the character properties looks a bit 
surprising sometimes:

>>> 
>>> regex.findall(ur"\p{UppercaseLetter}", u"QW\p{UppercaseLetter}as")
[u'Q', u'W', u'U', u'L']
>>> regex.findall(ur"\p{Uppercase Letter}", u"QW\p{Uppercase Letter}as")
[u'\\p{Uppercase Letter}']
>>> regex.findall(ur"\p{UppercaseÄÄÄLetter}", u"QW\p
{UppercaseÄÄÄLetter}as")
[u'\\p{Uppercase\xc4\xc4\xc4Letter}']
>>> regex.findall(ur"\p{UppercaseQQQLetter}", u"QW\p
{UppercaseQQQLetter}as")

Traceback (most recent call last):
  File "<pyshell#34>", line 1, in <module>
    regex.findall(ur"\p{UppercaseQQQLetter}", u"QW\p
{UppercaseQQQLetter}as")
...
  File "C:\Python26\lib\regex.py", line 1178, in _parse_property
    raise error("undefined property name '%s'" % name)
error: undefined property name 'UppercaseQQQLetter'
>>> 

i.e. potential property names consisting only from the ascii-letters  
(+ _, -) are looked up and either used or an error is raised,
other names (containing whitespace or non-ascii letters) aren't treated 
as a special expression, hence, they either match their literal value 
or simply don't match (without errors).

Is this the intended behaviour? 
I am not sure whether it is maybe defined somewhere, or there are some 
de-facto standards for this...
I guess, the space in the property names might be allowed (unless there 
are some implications for the parser...), otherwise the fallback 
handling of invalid property names as normal strings is probably the 
expected way.
vbr
History
Date User Action Args
2009-08-24 12:55:52vbrsetrecipients: + vbr, loewis, akuchling, georg.brandl, collinwinter, gregory.p.smith, jimjjewett, sjmachin, amaury.forgeotdarc, pitrou, nneonneo, rsc, timehorse, mark, ezio.melotti, mrabarnett, jaylogan, akitada, moreati, r.david.murray
2009-08-24 12:55:51vbrsetmessageid: <1251118551.67.0.263524283717.issue2636@psf.upfronthosting.co.za>
2009-08-24 12:55:50vbrlinkissue2636 messages
2009-08-24 12:55:49vbrcreate