classification
Title: os.path.relpath returns inconsistent types
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Matt.Bachmann, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2014-04-24 06:40 by Matt.Bachmann, last changed 2014-05-28 15:57 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
reldir.patch Matt.Bachmann, 2014-04-24 06:40 Patch to described issue review
Messages (9)
msg217119 - (view) Author: Matt Bachmann (Matt.Bachmann) * Date: 2014-04-24 06:40
I noticed an issue passing in unicode to os.path.relpath.

Specifically that in some cases when passing in unicode I would get back unicode and others I would get back a string. Below I demonstrate the issue. I also attached a patch.

Is this an issue or am I misunderstanding something. Is the patch reasonable? Totally willing to improve and i'll admit I cannot test the ntpath version.

Python 2.7.6 (default, Apr  9 2014, 11:48:52)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.38)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.path.relpath(u'.', u'.')
'.'
>>> os.path.relpath(u'.', u'../')
u'bachmann'
msg218516 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-05-14 10:47
I think this is not a bug and shouldn't be fixed in 2.7.
msg218528 - (view) Author: Matt Bachmann (Matt.Bachmann) * Date: 2014-05-14 13:28
Can you help me understand why not?

If I give it two unicode strings it sometimes gives me back a unicode and sometimes gives me back a string.

In python3 this does what I expect. 

In python27 I now have to check the type I get back because I cannot be sure what type I will be getting back.
msg219074 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-05-25 07:58
Why you should check the type? There is no difference between '.' and u'.'.
msg219092 - (view) Author: Matt Bachmann (Matt.Bachmann) * Date: 2014-05-25 14:55
There is a difference! '.' is a bytes string and u'.' is a unicode one! 

I found this problem because I work on a project that supports both python2 and python3.

In python3 I pass in unicode I get back unicode. In python2.7 I pass in unicode and I get back a bytes string. We need to ensure that all data in the system is unicode. 

Under 2.7 I get unicode sometimes and bytes other times so I need to do this ugly check 

    root_rel_path = os.path.relpath(self._cwd, self._root)
    if isinstance(root_rel_path, six.binary_type):
        root_rel_path = root_rel_path.decode()

in order to ensure that my string is once again of the correct type.
msg219094 - (view) Author: Matt Bachmann (Matt.Bachmann) * Date: 2014-05-25 15:21
Perhaps this is the bug I should be filing but here is why this comes up for me. 

I get different output from this function if I pass in two types.

On my machine:
os.path.relpath(u'test_srcl.txt', u'.') returns u'test_src.txt'
os.path.relpath(u'test_srcl.txt', '.') returns u'../../Users/bachmann/Code/diff-cover/diff_cover/tests/fixtures/test_src.txt'

I make a couple calls to this function, if the first call gives me back a byte string and I pass it to the second call I get the incorrect result. So I need to decode.

If the function always gave back the same type as I gave it I would not have this issue.
msg219097 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-05-25 16:27
In Python 2 str is coerced to unicode, so most functions should return the same (or compatible) result for str and unicode argument if it contains only 7-bit ASCII characters. Of course there are several obvious exceptions, such as type() or repr(). And presumably there are several bugs.

Apparently the actual bug in your case is that os.path.relpath(u'test_srcl.txt', u'.') and os.path.relpath(u'test_srcl.txt', '.') return totally different results.

What are os.getcwd(), os.getcwdu(), ntpath.abspath(ntpath.normpath(p)) for p in [u'test_srcl.txt', 'test_srcl.txt', u'.', '.'] in your case?
msg219126 - (view) Author: Matt Bachmann (Matt.Bachmann) * Date: 2014-05-26 04:55
Looking into the project im working on I discovered why relpath was acting strangely.

It is because the project mocks get_cwd but not get_cwdu. Your request helped me track that down :-)

So that is not an issue. However, the issue described in the original ticket definitely happens in a clean python shell.

I still think it is bad that the method sometimes returns str and sometimes returns unicode, but I see your point that ultimately the byte strings that do come out of here coerce into unicode cleanly.

Thanks for working though this with me.
msg219284 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-05-28 15:57
OK. I say the original issue is not a bug.
History
Date User Action Args
2014-05-28 15:57:53serhiy.storchakasetstatus: open -> closed
resolution: not a bug
messages: + msg219284

stage: resolved
2014-05-26 04:55:06Matt.Bachmannsetmessages: + msg219126
2014-05-25 16:27:46serhiy.storchakasetmessages: + msg219097
2014-05-25 15:21:42Matt.Bachmannsetmessages: + msg219094
2014-05-25 14:55:53Matt.Bachmannsetmessages: + msg219092
2014-05-25 07:58:36serhiy.storchakasetmessages: + msg219074
2014-05-14 13:28:06Matt.Bachmannsetmessages: + msg218528
2014-05-14 10:47:05serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg218516
2014-04-24 06:40:22Matt.Bachmanncreate