This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: bug in accessing bytes, inconsistent with normal strings and python 2.7
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.4, Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: kevinbhendricks, r.david.murray
Priority: normal Keywords:

Created on 2014-10-03 17:36 by kevinbhendricks, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 241 merged marco.buttu, 2017-02-22 21:34
Messages (3)
msg228348 - (view) Author: Kevin Hendricks (kevinbhendricks) Date: 2014-10-03 17:36
Hi,

I am working on porting my ebook code from Python 2.7 to work with both Python 2.7 and Python 3.4 and have found the following inconsistency I think is a bug ...

KevinsiMac:~ kbhend$ python3
Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 00:54:21) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

>>> o = '123456789'

>>> o[-3]
'7'

>>> type(o[-3])
<class 'str'>

>>> type(o)
<class 'str'>

the above is what I expected but under python 3 for bytes you get the following instead:

>>> o = b'123456789'

>>> o[-3]
55

>>> type(o[-3])
<class 'int'>

>>> type(o)
<class 'bytes'>
 


When I compare this to Python 2.7 for both bytestrings and unicode I see the expected behaviour. 

Python 2.7.7 (v2.7.7:f89216059edf, May 31 2014, 12:53:48) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.


>>> o = '123456789'

>>> o[-3]
'7'

>>> type(o[-3])
<type 'str'>

>>> type(o)
<type 'str'>


>>> o = u'123456789'

>>> o[-3]
u'7'

>>> type(o[-3])
<type 'unicode'>

>>> type(o)
<type 'unicode'>


I would consider this a bug as it makes it much harder to write python code that works on both python 2.7 and python 3.4
msg228363 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-10-03 19:06
Agreed, but that is a design decision that was taken long ago (regretted by more than a few but defended by others).  You can find a number of discussions of this by searching the python-dev archives, including some more recent discussions on possibilities for lessening the pain, but I don't remember if any of those turned into real proposals.

For now, you can find some helpers in six, or you can write your code using slice notation (b'abc'[1:2] == b'b').
msg228385 - (view) Author: Kevin Hendricks (kevinbhendricks) Date: 2014-10-03 21:20
Thanks for letting me know this was expected behaviour.  I see the same "issue" holds true while using:

for c in b'0123456789':
   print(ord(c))
 
I ended up using slices nearly everyplace.  Still ran into iterator issues.  Horrible hack really.  

I think I will spend some time reading the python dev archives to figure out how anyone could defend this approach.

FWIW, introducing a bytes class that works exactly like byte (non-unicode strings) in python 2.X but disallowing any automatic up-conversion to full unicode (like during concatenation), would have been a useful step.  

I work on decoding binary formatted ebook files all of the time, and python 3's second class treatment of bytes makes no sense to me.  Perfectly valid code can be written using only utf-8 and latin-1 encoded bytestrings with no need to upconvert to anything.  It is practically impossible to support code like that in Python 3.

Boggles the mind.

Thanks again for the fast response.

Kevin
History
Date User Action Args
2022-04-11 14:58:08adminsetgithub: 66739
2017-02-22 21:34:27marco.buttusetpull_requests: + pull_request203
2014-10-03 21:20:15kevinbhendrickssetmessages: + msg228385
2014-10-03 19:06:21r.david.murraysetstatus: open -> closed

nosy: + r.david.murray
messages: + msg228363

resolution: not a bug
stage: resolved
2014-10-03 17:36:30kevinbhendrickscreate