This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: ntpath.join() error with Chinese character Path
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: StupidHod, ezio.melotti, vstinner, zach.ware
Priority: normal Keywords:

Created on 2014-07-21 00:49 by StupidHod, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (4)
msg223552 - (view) Author: StupidHod (StupidHod) Date: 2014-07-21 00:49
When ntpath.join() works with a path that with Chinese character ,a unicode Decode error will happen.
detailes as:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb2 in position 0: ordinal not in range(128)

As the interparter said,it happened at
result_path = result_path + p_path line 84.

as I modif this expression with "result_path = str(result_path) + str(p_path)",it works well.
msg223553 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2014-07-21 02:08
What type are your arguments, str, unicode, or a mix?  I can reproduce your issue using a unicode and a str containing a non-ASCII character, while any other combination "works":

>>> import os
>>> os.path.join('test', 'test\x85')
'test\\test\x85'
>>> os.path.join('test', u'test\x85')
u'test\\test\x85'
>>> os.path.join(u'test', 'test\x85')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\ntpath.py", line 84, in join
    result_path = result_path + p_path
UnicodeDecodeError: 'ascii' codec can't decode byte 0x85 in position 4: ordinal not in range(128)
>>> os.path.join(u'test', u'test\x85')
u'test\\test\x85'

The fact that any mixed-type combination works is sheer accident.  This is just a side effect of Python 2's 'bolted-on' approach to Unicode, and the fix is to upgrade to Python 3.  If you have to stay with Python 2, you can try to fix your code by making sure you decode all input to unicode as soon as you get it, and only encode to str when you have to (which is basically what you need to do in Python 3, but Python won't give you helpful exceptions at the source of the problem in 2.x).

I don't believe there's anything that should be changed in ntpath.join.
msg223554 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2014-07-21 02:15
Agreed.  Make sure that the arguments you are passing to ntpath.join() have the same type (i.e. either both unicode, or both string).
msg223555 - (view) Author: StupidHod (StupidHod) Date: 2014-07-21 02:36
You are correct,the resultpath is unicode and path is str.Tks for your
comments.

2014-07-21 10:15 GMT+08:00 Ezio Melotti <report@bugs.python.org>:

>
> Ezio Melotti added the comment:
>
> Agreed.  Make sure that the arguments you are passing to ntpath.join()
> have the same type (i.e. either both unicode, or both string).
>
> ----------
> nosy: +ezio.melotti
> resolution:  -> not a bug
> stage:  -> resolved
> status: open -> closed
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue22019>
> _______________________________________
>
History
Date User Action Args
2022-04-11 14:58:06adminsetgithub: 66218
2014-07-21 02:36:12StupidHodsetmessages: + msg223555
2014-07-21 02:15:42ezio.melottisetstatus: open -> closed

nosy: + ezio.melotti
messages: + msg223554

resolution: not a bug
stage: resolved
2014-07-21 02:08:19zach.waresetnosy: + zach.ware
messages: + msg223553
2014-07-21 01:55:18pitrousetnosy: + vstinner
2014-07-21 00:49:15StupidHodcreate