This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vbr
Recipients akitada, akuchling, amaury.forgeotdarc, collinwinter, ezio.melotti, georg.brandl, gregory.p.smith, jaylogan, jimjjewett, loewis, mark, moreati, mrabarnett, nneonneo, pitrou, r.david.murray, rsc, sjmachin, timehorse, vbr
Date 2010-02-17.23:43:24
SpamBayes Score 2.220446e-16
Marked as misclassified No
Message-id <1266450207.35.0.249463945425.issue2636@psf.upfronthosting.co.za>
In-reply-to
Content
I just tested the fix for unicode tracebacks and found some possibly weird results (not sure how/whether it should be fixed, as these inputs are indeed rather artificial...).
(win XPp SP3 Czech, Python 2.6.4)

Using the cmd console, the output is fine (for the characters it can accept and display)

>>> regex.findall(ur"\p{InBasicLatinĚ}", u"aé")
Traceback (most recent call last):
...
  File "C:\Python26\lib\regex.py", line 1244, in _parse_property
    raise error("undefined property name '%s'" % name)
regex.error: undefined property name 'InBasicLatinĚ'
>>>

(same result for other distorted "proprety names" containing e.g. ěščřžýáíéúůßäëiöüîô ...

However, in Idle the output differs depending on the characters present

>>> regex.findall(ur"\p{InBasicLatinÉ}", u"ab c")
yields the expected
...
  File "C:\Python26\lib\regex.py", line 1244, in _parse_property
    raise error("undefined property name '%s'" % name)
error: undefined property name 'InBasicLatinÉ'

but

>>> regex.findall(ur"\p{InBasicLatinĚ}", u"ab c")

Traceback (most recent call last):
...
  File "C:\Python26\lib\regex.py", line 1244, in _parse_property
    raise error("undefined property name '%s'" % name)
  File "C:\Python26\lib\regex.py", line 167, in __init__
    message = message.encode(sys.stdout.encoding)
  File "C:\Python26\lib\encodings\cp1250.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode character u'\xcc' in position 37: character maps to <undefined>
>>> 

which might be surprising, as cp1250 should be able to encode "Ě", maybe there is some intermediate ascii step?

using the wxpython pyShell I get its specific encoding error:

regex.findall(ur"\p{InBasicLatinÉ}", u"ab c")
Traceback (most recent call last):
...
  File "C:\Python26\lib\regex.py", line 1102, in _parse_escape
    return _parse_property(source, info, in_set, ch)
  File "C:\Python26\lib\regex.py", line 1244, in _parse_property
    raise error("undefined property name '%s'" % name)
  File "C:\Python26\lib\regex.py", line 167, in __init__
    message = message.encode(sys.stdout.encoding)
AttributeError: PseudoFileOut instance has no attribute 'encoding'

(the same for \p{InBasicLatinĚ} etc.)


In python 3.1 in Idle, all of these exceptions are displayed correctly, also in other scripts or with special characters.

Maybe in python 2.x e.g. repr(...) of the unicode error messages could be used in order to avoid these problems, but I don't know, what the conventions are in these cases.


Another issue I found here (unrelated to tracebacks) are backslashes or punctuation (except the handled -_) in the property names, which just lead to failed mathces and no exceptions about unknown property names

regex.findall(u"\p{InBasic.Latin}", u"ab c")
[]


I was also surprised by the added pos/endpos parameters, as I used flags as a non-keyword third parameter for the re functions in my code (probably my fault ...)

re.findall(pattern, string, flags=0)

regex.findall(pattern, string, pos=None, endpos=None, flags=0, overlapped=False)

(is there a specific reason for this order, or could it be changed to maintain compatibility with the current re module?)

I hope, at least some of these remarks make some sense;
  thanks for the continued work on this module!

   vbr
History
Date User Action Args
2010-02-17 23:43:27vbrsetrecipients: + vbr, loewis, akuchling, georg.brandl, collinwinter, gregory.p.smith, jimjjewett, sjmachin, amaury.forgeotdarc, pitrou, nneonneo, rsc, timehorse, mark, ezio.melotti, mrabarnett, jaylogan, akitada, moreati, r.david.murray
2010-02-17 23:43:27vbrsetmessageid: <1266450207.35.0.249463945425.issue2636@psf.upfronthosting.co.za>
2010-02-17 23:43:25vbrlinkissue2636 messages
2010-02-17 23:43:24vbrcreate