This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author sbp
Recipients sbp
Date 2007-11-20.21:17:38
SpamBayes Score 0.09335492
Marked as misclassified No
Message-id <1195593459.02.0.388666867716.issue1477@psf.upfronthosting.co.za>
In-reply-to
Content
The following error is uncatchable:

>>> try: ur'\U0010FFFF'
... except UnicodeDecodeError: pass
... 
UnicodeDecodeError: 'rawunicodeescape' codec can't decode byte 0x5c 
in position 0: \Uxxxxxxxx out of range

This is in a narrow unicode build:

>>> sys.version_info, hex(sys.maxunicode)
((2, 5, 1, 'final', 0), '0xffff')

Of course the r in ur'...' is redundant in the test case above, but
there are cases in which it isn't...

>>> ur'\U0010FFFF\test'
u'\U0010ffff\\test'
- from a wide unicode build

>>> ur'\U0010FFFF\test'
UnicodeDecodeError: 'rawunicodeescape' codec can't decode byte 0x5c 
in position 0: \Uxxxxxxxx out of range
- from the narrow unicode build

The problem occurs with .decode('raw-unicode-escape') too.

>>> '\U0010FFFF\test'.decode('raw-unicode-escape')
Traceback (most recent call last):
[&c.]

Most surprisingly of all, however, this problem doesn't occur when you
don't use a raw string:

>>> u'\U0010ffff\\test'
u'\U0010ffff\\test'

So there is at least a workaround for all cases, which is why this bug
is marked as Severity: minor. It did take a while to work out that what
manifests with ur mightn't apply to u, however; it's usually one's first
thought to think the bug is with you, not with python.
History
Date User Action Args
2007-11-20 21:17:39sbpsetspambayes_score: 0.0933549 -> 0.09335492
recipients: + sbp
2007-11-20 21:17:39sbpsetspambayes_score: 0.0933549 -> 0.0933549
messageid: <1195593459.02.0.388666867716.issue1477@psf.upfronthosting.co.za>
2007-11-20 21:17:38sbplinkissue1477 messages
2007-11-20 21:17:38sbpcreate