This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ezio.melotti
Recipients belopolsky, eric.araujo, ezio.melotti, fdrake, pluskid, python-dev, r.david.murray, v+python
Date 2011-04-05.18:51:55
SpamBayes Score 4.2822e-06
Marked as misclassified No
Message-id <1302029515.95.0.81455137429.issue7311@psf.upfronthosting.co.za>
In-reply-to
Content
With 3.2 the situation is more complicated because there is a strict and a non-strict mode.
The strict mode uses:
attrfind = re.compile(
    r'\s*([a-zA-Z_][-.:a-zA-Z_0-9]*)(\s*=\s*'
    r'(\'[^\']*\'|"[^"]*"|[-a-zA-Z0-9./,:;+*%?!&$\(\)_#=~@]*))?')

and the tolerant mode uses:
attrfind_tolerant = re.compile(
    r'\s*([a-zA-Z_][-.:a-zA-Z_0-9]*)(\s*=\s*'
    r'(\'[^\']*\'|"[^"]*"|[^>\s]*))?')

This means that the strict mode doesn't allow valid non-ASCII chars, and that tolerant mode is a little too permissive.

The attached patch changes the strict regex to be more permissive and leaves the tolerant regex unchanged. The difference between the two are now so small that the tolerant version could be removed, except that re.search is used instead of re.match when the tolerant regex is used.
History
Date User Action Args
2011-04-05 18:51:56ezio.melottisetrecipients: + ezio.melotti, fdrake, belopolsky, eric.araujo, v+python, r.david.murray, pluskid, python-dev
2011-04-05 18:51:55ezio.melottisetmessageid: <1302029515.95.0.81455137429.issue7311@psf.upfronthosting.co.za>
2011-04-05 18:51:55ezio.melottilinkissue7311 messages
2011-04-05 18:51:55ezio.melotticreate