This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: unicode_escape encoding fails for '\\Upsilon'
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Edward.K..Ream, edreamleo, ezio.melotti, r.david.murray
Priority: normal Keywords:

Created on 2013-04-26 13:44 by Edward.K..Ream, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (5)
msg187852 - (view) Author: Edward K. Ream (Edward.K..Ream) Date: 2013-04-26 13:44
On both windows and Linux the following fails on Python 2.7:

   s = '\\Upsilon'
   unicode(s,"unicode_escape")

UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 0-7: end of string in escape sequence

BTW, the six.py package uses this call.  If this call doesn't work, six is broken.
msg187853 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2013-04-26 13:47
This is not a bug, \U should be followed by 8 hex digits and it indicates a Unicode codepoint:
>>> '\\u0065'.decode('unicode_escape')
u'e'
>>> '\\U00000065'.decode('unicode_escape')
u'e'
>>> '\\Upsilon'.decode('unicode_escape')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 0-7: end of string in escape sequence
>>> u'\Upsilon'
  File "<stdin>", line 1
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-7: end of string in escape sequence
>>> u'\U00000065'
u'e'
msg187854 - (view) Author: Edward K. Ream (Edward.K..Ream) Date: 2013-04-26 13:51
Thanks for your quick reply.

If this is not a bug, why does six define six.u as unicode(s,"unicode_escape") for *all* u constants??
msg187855 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-04-26 13:54
Because as Ezio demonstrated, it produces the same result as using the 'u' prefix on the same string.
msg187858 - (view) Author: Edward K Ream (edreamleo) * Date: 2013-04-26 14:26
On Fri, Apr 26, 2013 at 8:51 AM, Edward K. Ream <report@bugs.python.org>wrote:

>
> If this is not a bug, why does six define six.u as
> unicode(s,"unicode_escape") for *all* u constants??
>

Oops.  The following works::

    s = r'\\Upsilon'
    unicode(s,"unicode_escape")

My apologies for the noise.

Edward
History
Date User Action Args
2022-04-11 14:57:44adminsetgithub: 62050
2013-04-26 14:26:16edreamleosetnosy: + edreamleo
messages: + msg187858
2013-04-26 13:54:43r.david.murraysetnosy: + r.david.murray
messages: + msg187855
2013-04-26 13:51:11Edward.K..Reamsetmessages: + msg187854
2013-04-26 13:47:57ezio.melottisetstatus: open -> closed

type: crash -> behavior

nosy: + ezio.melotti
messages: + msg187853
resolution: not a bug
stage: resolved
2013-04-26 13:44:53Edward.K..Reamcreate