This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: ‘tokenize.detect_encoding’ is confused between text and bytes: no ‘startswith’ method on a byte string
Type: Stage:
Components: Library (Lib) Versions: Python 3.4
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Clarify error when ‘tokenize.detect_encoding’ receives text
View: 23297
Assigned To: Nosy List: bignose
Priority: normal Keywords:

Created on 2015-01-22 04:40 by bignose, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (1)
msg234470 - (view) Author: Ben Finney (bignose) Date: 2015-01-22 04:40
In `tokenize.detect_encoding` is the following code::

    first = read_or_stop()
    if first.startswith(BOM_UTF8):
        # …

The `read_or_stop` function is defined as::

    def read_or_stop():
        try:
            return readline()
        except StopIteration:
            return b''

So, on catching ``StopIteration``, the return value will be a byte string. The `detect_encoding` code then immediately calls `sartswith`, which fails::

    File "/usr/lib/python3.4/tokenize.py", line 409, in detect_encoding
      if first.startswith(BOM_UTF8):
  TypeError: startswith first arg must be str or a tuple of str, not bytes

One or both of those locations in the code is wrong. Either `read_or_stop` should never return a byte string; or `detect_encoding` should not assume it can call `startswith` on the result.
History
Date User Action Args
2022-04-11 14:58:12adminsetgithub: 67485
2015-01-22 04:58:02benjamin.petersonsetstatus: open -> closed
superseder: Clarify error when ‘tokenize.detect_encoding’ receives text
resolution: duplicate
2015-01-22 04:40:12bignosecreate