Message129041
In the re docs, it states the following for the conditional regular expression syntax:
(?(id/name)yes-pattern|no-pattern)
Will try to match with yes-pattern if the group with given id or name exists, and with no-pattern if it doesn’t. no-pattern is optional and can be omitted. For example, (<)?(\w+@\w+(?:\.\w+)+)(?(1)>) is a poor email matching pattern, which will match with '<user@host.com>' as well as 'user@host.com', but not with '<user@host.com'.
this regex is incomplete as it allows for 'user@host.com>':
>>> bool(re.match(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>)', '<user@host.com>'))
True
>>> bool(re.match(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>)', 'user@host.com'))
True
>>> bool(re.match(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>)', '<user@host.com'))
False
>>> bool(re.match(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>)', 'user@host.com>'))
True
This error has existed since this feature was added in 2.4...
http://docs.python.org/release/2.4.4/lib/re-syntax.html
... through the 3.3. docs...
http://docs.python.org/dev/py3k/library/re.html#regular-expression-syntax
The fix is to add the end char '$' to the regex to get all 4 working:
>>> bool(re.match(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>|$)', '<user@host.com>'))
True
>>> bool(re.match(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>|$)', 'user@host.com'))
True
>>> bool(re.match(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>|$)', '<user@host.com'))
False
>>> bool(re.match(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>|$)', 'user@host.com>'))
False
If accepted, I propose this patch (also attached):
$ svn diff re.rst
Index: re.rst
===================================================================
--- re.rst (revision 88499)
+++ re.rst (working copy)
@@ -297,9 +297,9 @@
``(?(id/name)yes-pattern|no-pattern)``
Will try to match with ``yes-pattern`` if the group with given *id* or *name*
exists, and with ``no-pattern`` if it doesn't. ``no-pattern`` is optional and
- can be omitted. For example, ``(<)?(\w+@\w+(?:\.\w+)+)(?(1)>)`` is a poor email
+ can be omitted. For example, ``(<)?(\w+@\w+(?:\.\w+)+)(?(1)>|$)`` is a poor email
matching pattern, which will match with ``'<user@host.com>'`` as well as
- ``'user@host.com'``, but not with ``'<user@host.com'``.
+ ``'user@host.com'``, but not with ``'<user@host.com'`` nor ``'user@host.com>'`` . |
|
Date |
User |
Action |
Args |
2011-02-22 08:48:20 | wesley.chun | set | recipients:
+ wesley.chun, docs@python |
2011-02-22 08:48:20 | wesley.chun | set | messageid: <1298364500.67.0.815268640848.issue11283@psf.upfronthosting.co.za> |
2011-02-22 08:48:19 | wesley.chun | link | issue11283 messages |
2011-02-22 08:48:19 | wesley.chun | create | |
|