This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ezio.melotti
Recipients Adam.Matan, ezio.melotti
Date 2011-04-02.20:34:00
SpamBayes Score 1.2472803e-09
Marked as misclassified No
Message-id <1301776441.04.0.155816606179.issue11741@psf.upfronthosting.co.za>
In-reply-to
Content
Mixing byte and unicode strings should always be avoided, because the implicit coercion to unicode works only if the byte strings contains only ASCII, and fails otherwise.
Several modules -- including shutil, glob, and os.path -- have API that work with both byte and unicode strings, but fail when you mix the two:
>>> os.path.join('א', 'א')  # both byte strings -- works
'\xd7\x90/\xd7\x90'
>>> os.path.join(u'א', u'א')  # both unicode -- works
u'\u05d0/\u05d0'
>>> os.path.join('a', u'א')  # mixed, ASCII-only byte string -- works
u'a/\u05d0'

>>> os.path.join(u'א', 'א')  # mixed, non-ASCII -- fails
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.6/posixpath.py", line 70, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd7 in position 1: ordinal not in range(128)
>>> os.path.join('א', u'א')  # mixed, non-ASCII -- fails
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.6/posixpath.py", line 70, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd7 in position 0: ordinal not in range(128)
>>>
History
Date User Action Args
2011-04-02 20:34:01ezio.melottisetrecipients: + ezio.melotti, Adam.Matan
2011-04-02 20:34:01ezio.melottisetmessageid: <1301776441.04.0.155816606179.issue11741@psf.upfronthosting.co.za>
2011-04-02 20:34:00ezio.melottilinkissue11741 messages
2011-04-02 20:34:00ezio.melotticreate