Message57710
The following error is uncatchable:
>>> try: ur'\U0010FFFF'
... except UnicodeDecodeError: pass
...
UnicodeDecodeError: 'rawunicodeescape' codec can't decode byte 0x5c
in position 0: \Uxxxxxxxx out of range
This is in a narrow unicode build:
>>> sys.version_info, hex(sys.maxunicode)
((2, 5, 1, 'final', 0), '0xffff')
Of course the r in ur'...' is redundant in the test case above, but
there are cases in which it isn't...
>>> ur'\U0010FFFF\test'
u'\U0010ffff\\test'
- from a wide unicode build
>>> ur'\U0010FFFF\test'
UnicodeDecodeError: 'rawunicodeescape' codec can't decode byte 0x5c
in position 0: \Uxxxxxxxx out of range
- from the narrow unicode build
The problem occurs with .decode('raw-unicode-escape') too.
>>> '\U0010FFFF\test'.decode('raw-unicode-escape')
Traceback (most recent call last):
[&c.]
Most surprisingly of all, however, this problem doesn't occur when you
don't use a raw string:
>>> u'\U0010ffff\\test'
u'\U0010ffff\\test'
So there is at least a workaround for all cases, which is why this bug
is marked as Severity: minor. It did take a while to work out that what
manifests with ur mightn't apply to u, however; it's usually one's first
thought to think the bug is with you, not with python. |
|
Date |
User |
Action |
Args |
2007-11-20 21:17:39 | sbp | set | spambayes_score: 0.0933549 -> 0.09335492 recipients:
+ sbp |
2007-11-20 21:17:39 | sbp | set | spambayes_score: 0.0933549 -> 0.0933549 messageid: <1195593459.02.0.388666867716.issue1477@psf.upfronthosting.co.za> |
2007-11-20 21:17:38 | sbp | link | issue1477 messages |
2007-11-20 21:17:38 | sbp | create | |
|