This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author rbcollins
Recipients Jarek.Śmiejczak, Lin.Wei, docs@python, jhonglei, ncoghlan, rbcollins, serhiy.storchaka, steve.dower, vinay.sajip, vstinner
Date 2016-09-12.23:28:55
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1473722936.47.0.0654204582231.issue20140@psf.upfronthosting.co.za>
In-reply-to
Content
Given two (or more) parameters where one is unicode and one is not, upcasting will occur multiples times in path.join on windows: 
 - '\\' is str and will cast up safely in all codecs
 - the other str (or bytes) parameter will be upcast using sys.defaultencoding which is often / usually ASCII on Windows

This will then fail when the str parameter is not valid ASCII.

From this we can conclude that this is a failure to use path.join correctly: if all the parameters passed in were unicode, no error would occur as only '\\' would be getting coerced to unicode.

The interesting question is why there was a str parameter that wasn't valid ASCII; and that lies with path.expanduser() which is returning a str for the non-ascii home directory.

Changing that to return unicode rather than a no-encoding specified str when HOME or HOMEPATH etc etc contain non-ascii characters is a change that would worry me - specifically that we'd encounter code that assumes it is always str, e.g. by calling path.join(expanduser('~fred'), '\xe1\xbd\x84D') which will then blow up.

Worth noting too is that 

 expanduser(u'~user/\u14ffd')

will also blow up in the same way in the same situation - as it ends up decoding the user home path when it concatenates userhome and path[i:].

So, what to do:
 - It might be worth testing a patch that changes expanduser to decode the environment variables - I'm not sure whether we'd want the filesystemencoding or the defaultencoding for handling these environment variables. Steve Dower probably knows :).
 - Or we say 'sorry, too hard in 2.7' and move on: join *itself* is fine here, given the limits of 2.7.
History
Date User Action Args
2016-09-12 23:28:56rbcollinssetrecipients: + rbcollins, vinay.sajip, ncoghlan, vstinner, docs@python, serhiy.storchaka, steve.dower, Jarek.Śmiejczak, jhonglei, Lin.Wei
2016-09-12 23:28:56rbcollinssetmessageid: <1473722936.47.0.0654204582231.issue20140@psf.upfronthosting.co.za>
2016-09-12 23:28:56rbcollinslinkissue20140 messages
2016-09-12 23:28:55rbcollinscreate