classification
Title: Cryptic traceback from os.path.join when mixing str & bytes
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, eric.araujo, georg.brandl, hynek, ncoghlan, pitrou, python-dev, r.david.murray, terry.reedy
Priority: normal Keywords: needs review, patch

Created on 2012-06-25 11:45 by ncoghlan, last changed 2012-07-15 15:04 by hynek. This issue is now closed.

Files
File name Uploaded Description Edit
nicer-error-on-str-bytes-mix.diff hynek, 2012-06-25 13:35 review
Messages (14)
msg163945 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-06-25 11:45
As seen in #4489, the traceback when mixing str and bytes in os.path.join is rather cryptic and hard to decipher if you've never encountered it before:

>>> import os.path
>>> os.path.join(b'', '')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.2/posixpath.py", line 78, in join
    if b.startswith(sep):
TypeError: startswith first arg must be str or a tuple of str, not bytes

>>> os.path.join('', b'')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.2/posixpath.py", line 78, in join
    if b.startswith(sep):
TypeError: startswith first arg must be bytes or a tuple of bytes, not str

While it's slightly less cryptic with a real source file (since you can at least see the os.path.join call), you have to have some how idea of how os.path.join works to realise that:
- type(sep) == type(args[0])
- b in args[1:]

The challenge is to generate a more user friendly error message without making the normal case of correct types any slower.
msg163948 - (view) Author: Hynek Schlawack (hynek) * (Python committer) Date: 2012-06-25 11:47
Catch TypeError, check whether bytes & str are being mixed, re-raise if not, say something user-friendly if yes?
msg163953 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-06-25 11:57
Yeah, that should do it - I just hadn't looked at the structure of the code to see how annoying it will be to separate the check out from the if statement.

Good use case for PEP 409, too :)
msg163967 - (view) Author: Hynek Schlawack (hynek) * (Python committer) Date: 2012-06-25 13:35
Here's a fix. It would be nice if it could make for 3.3 final I guess. Tests pass for Linux and OS X.
msg163982 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-06-25 15:31
Just a nit, you can use double quotes instead of escaping apostrophes ;)
msg164442 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012-07-01 01:12
This strikes me as a bugfix that does not get backported because code might depend on the bug. If the policy for exception messages, such as it is, documented somewhere?
msg164538 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012-07-02 19:51
> This strikes me as a bugfix that does not get backported because code might depend on
> the bug. If the policy for exception messages, such as it is, documented somewhere?

I don’t know if it’s written anywhere, but the rule I follow is that even though exact error messages are not specified by the Python language, they should not be changed in bugfix releases (I seem to remember bugs for argparse and tarfile where this was invoked).  The commit policy seems in flux these days however, with code cleanups going in stable versions (urllib and collections recently).
msg164540 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-07-02 19:53
> If the policy for exception messages, such as it is, documented
> somewhere?

I think we are quite free to change exception messages, as long as the change is motivated. They are not part of the API definitions, and people shouldn't rely on them (the exception may for KeyError and such exceptions where the message is actually the value of the failed key).
msg164541 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012-07-02 20:13
Patch looks good.  Left some comments on Rietveld and +1 to Antoine’s note about quotes :)
msg164562 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012-07-02 23:02
This is definitely a motivated change ;-). I would like this re-typed as behavior and applied to 2.7, 3.2, and 3.3. I see the current message as being much like a garbled sentence in the docs that people can only understand if they already understand the background.
msg164597 - (view) Author: Hynek Schlawack (hynek) * (Python committer) Date: 2012-07-03 12:32
Cool, I'll finish it up in the sprints on Saturday.
msg165297 - (view) Author: Hynek Schlawack (hynek) * (Python committer) Date: 2012-07-12 11:25
I didn’t, because I pulled an Antoine and enjoyed weather, people and Florence. :)

I understand it’s okay to apply this towards 3.2 & 3.3 (with the proposed fixes)?

2.7 is not necessary because this problem doesn’t exist there:

> os.path.join(b'foo', u'bar')
u'foo/bar'

Introducing explicit sanity checks there would cause people wanting to eat our hearts. :)
msg165529 - (view) Author: Roundup Robot (python-dev) Date: 2012-07-15 14:47
New changeset cf1ac0c9e753 by Hynek Schlawack in branch '3.2':
#15180: Clarify posixpath.join() error message when mixing str & bytes
http://hg.python.org/cpython/rev/cf1ac0c9e753

New changeset 1462b963e5ce by Hynek Schlawack in branch 'default':
#15180: Clarify posixpath.join() error message when mixing str & bytes
http://hg.python.org/cpython/rev/1462b963e5ce
msg165530 - (view) Author: Hynek Schlawack (hynek) * (Python committer) Date: 2012-07-15 15:04
Fixed for 3.2 & default.
History
Date User Action Args
2012-07-15 15:04:24hyneksetstatus: open -> closed
resolution: fixed
messages: + msg165530

stage: patch review -> resolved
2012-07-15 14:47:56python-devsetnosy: + python-dev
messages: + msg165529
2012-07-12 11:25:15hyneksetmessages: + msg165297
2012-07-03 12:32:19hyneksetmessages: + msg164597
2012-07-02 23:02:21terry.reedysetmessages: + msg164562
2012-07-02 20:13:36eric.araujosetmessages: + msg164541
2012-07-02 19:53:13pitrousetmessages: + msg164540
2012-07-02 19:51:03eric.araujosetnosy: + eric.araujo, r.david.murray, georg.brandl
messages: + msg164538
2012-07-01 01:12:37terry.reedysetnosy: + terry.reedy
messages: + msg164442
2012-06-25 15:31:42pitrousetnosy: + pitrou
messages: + msg163982
2012-06-25 14:19:01Arfreversetnosy: + Arfrever
2012-06-25 13:35:10hyneksetkeywords: + needs review, patch
files: + nicer-error-on-str-bytes-mix.diff
messages: + msg163967

stage: needs patch -> patch review
2012-06-25 11:57:10ncoghlansetmessages: + msg163953
2012-06-25 11:47:20hyneksetnosy: + hynek
messages: + msg163948
2012-06-25 11:45:37ncoghlancreate