classification
Title: BytesWarning annoyances {'key': 'value'}.get(b'key')
Type: behavior Stage:
Components: Interpreter Core, Unicode Versions: Python 3.2
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, flox, mark.dickinson, pitrou, vstinner
Priority: normal Keywords:

Created on 2010-08-19 00:47 by vstinner, last changed 2010-08-19 17:48 by flox. This issue is now closed.

Messages (8)
msg114314 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-08-19 00:47
With python3 -bb: {'key': 'value'}[b'key'] raises a BytesWarning, but {'key': 'value'}[b'missing_key'] doesn't. The warning is unexpected here because it's an implicit comparaison (I mean, different than an explicit: 'key' == b'key'), we cannot check that the dict keys are all bytes / unicode (at least, I don't want to). And so I think that it should be fixed.

First lookdict_unicode() is used because all dict keys are unicode, but lookdict_unicode() falls back to lookdict() because the asked key type is not unicode.

lookdict() checks the hash: they matches, hash('key') == hash(b'key'). Then it compares the two key objects with PyObject_RichCompareBool(startkey, key, Py_EQ). PyUnicode_RichCompare() returns NotImplemented, and so bytes_richcompare() is called. Finally, bytes_richcompare() raises the BytesWarning.
msg114315 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-08-19 00:51
I found this problem while running test_os with python -bb: there is an error on os.get_exec_path() because this function checks if b'PATH' key exists in the input dictionary. Extract of the function:

def get_exec_path(env=None):
    if env is None:
        env = environ

    try:
        path_list = env.get('PATH')
    except TypeError:
        path_list = None

    if supports_bytes_environ:
        try:
            path_listb = env[b'PATH']
        except (KeyError, TypeError):
            pass
        else:
    (...)
msg114337 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-08-19 10:25
Oh. flox told me that there are other cases raising BytesWarning:

'abc' in {b'abc'}
'abc' in (b'xxx',)
'abc' in [b'xxx']

I suppose that the behaviour is different (only fail with same value / fail with different values) because set does first compare the hash (as dict), whereas tuple and list use a classic comparaison.
msg114338 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-08-19 10:26
Various annoyances:

>>> some_set = {'oOO', b'oOO'}
BytesWarning: Comparison between bytes and string

>>> 'a' in {b'', ''}
BytesWarning: Comparison between bytes and string

>>> 'abc' in (b'def', 123)
BytesWarning: Comparison between bytes and string

>>> 'abc' in {b'abc', 123}
BytesWarning: Comparison between bytes and string

>>> {42: 'abc'} == {42: b'def'}
BytesWarning: Comparison between bytes and string
msg114357 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-08-19 12:36
Well, that's what BytesWarning is for. I agree it is annoying in normal use, but it is meant to ease porting of 2.x code. That's why it is only enabled when you use the corresponding command-line switch.

The warning in the dict case is especially important: otherwise it is easy to get a dict with duplicate bytes and unicode keys (say b"xxx" and "xxx"), and potentially different values.
msg114364 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-08-19 14:02
> The warning in the dict case is especially important

It's worth noting that this warning is dependent on hash() producing the same values for 'equivalent' bytes and str instances.  This seems a bit fragile, and is something that could potentially change in the future---with bytes and str comparing unequal, there's no reason for the hashes to correspond.

(It might even make sense to deliberately change the hash for either  bytes or str so that it doesn't match the other, just to expose any bugs that rely on the hashes being identical.)
msg114380 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-08-19 17:18
> It's worth noting that this warning is dependent on hash() producing
> the same values for 'equivalent' bytes and str instances.  This seems
> a bit fragile, and is something that could potentially change in the
> future---with bytes and str comparing unequal, there's no reason for
> the hashes to correspond.
> 
> (It might even make sense to deliberately change the hash for either
> bytes or str so that it doesn't match the other, just to expose any
> bugs that rely on the hashes being identical.)

Actually, no, the "consistency" of hashes is necessary for the
BytesWarning to be useful with dicts. Because the situations it is meant
to uncover are those where e.g. you have "A" as a key and you are
looking up b"A".

(you don't really care, on the other hand, if you are looking up b"A" in
a dict which has only "B"; and, yes, I know there will still be false
positives :-))
msg114383 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-08-19 17:42
> Actually, no, the "consistency" of hashes is necessary for the
> BytesWarning to be useful with dicts.

Yes.  That's precisely the point I was trying to make.  (Probably badly.)
That's why I was calling the usefulness of BytesWarning with dicts 'fragile'.
History
Date User Action Args
2010-08-19 17:48:53floxsettitle: {'key': 'value'} -> BytesWarning annoyances {'key': 'value'}.get(b'key')
2010-08-19 17:42:38mark.dickinsonsetmessages: + msg114383
2010-08-19 17:18:40pitrousetmessages: + msg114380
title: {'key': 'value'}[b'key'] raises a BytesWarning -> {'key': 'value'}
2010-08-19 14:02:19mark.dickinsonsetnosy: + mark.dickinson
messages: + msg114364
2010-08-19 12:36:08pitrousetstatus: open -> closed

nosy: + pitrou
messages: + msg114357

resolution: rejected
2010-08-19 12:31:47ezio.melottisetnosy: + ezio.melotti
2010-08-19 10:26:56floxsettype: behavior
2010-08-19 10:26:46floxsetnosy: + flox
messages: + msg114338
2010-08-19 10:25:29vstinnersetmessages: + msg114337
2010-08-19 00:51:10vstinnersetmessages: + msg114315
2010-08-19 00:47:25vstinnercreate