This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: linecache .updatecache fails on utf8 encoded files
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.0
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: georg.brandl Nosy List: ajaksu2, amaury.forgeotdarc, benjamin.peterson, georg.brandl, orivej, pitrou, ptn, pyscripter
Priority: critical Keywords: easy, patch

Created on 2007-12-22 05:17 by pyscripter, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
linecache.py.patch pyscripter, 2007-12-29 04:36
line.py ajaksu2, 2009-04-25 13:01
Messages (9)
msg58958 - (view) Author: PyScripter (pyscripter) Date: 2007-12-22 05:17
linecache.updatecache works as follows after it finds a module name:

fp = open(fullname, 'rU')
lines = fp.readlines()
fp.close()

It then tries to detect a file encoding comment...

The problem is that readlines fails with a UnicodeDecodeError if the 
file is utf8 encoded, the preferred locale encoding is something else 
and the file contains characters that cannot be decoded.

Instead the function should:
a) read the raw data into a bytes object 
b)then search for a file encoding comment and
c)use one if found else use utf8 since this is not the default file 
encoding.
msg59031 - (view) Author: PyScripter (pyscripter) Date: 2007-12-29 04:14
To reproduce the error:

a) Save the following file in utf-8 format as c:\temp\module1.py
# -*- coding: utf-8 -*-
print("ψ")

b) Run the following script:
import pdb
d = pdb.Pdb()
filename = r"c:\Temp\module1.py"
print(d.set_break(filename,1))

Expected result
None

Actual Result
Line c:\temp\module1.py:1 does not exist
msg59032 - (view) Author: PyScripter (pyscripter) Date: 2007-12-29 04:35
Patch file attached
msg66690 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2008-05-11 22:57
This should be fixed differently (directly applying the RE to bytes
objects), but it needs a re that handles bytes first.
msg70477 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2008-07-31 02:11
re now handles bytes, so what's next?
msg70779 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-08-06 10:16
After a look at the patch and at linecache.py, some comments:
- 'rbU' is strange, the 'U' flag has no effect for binary files, so it
should just be 'rb' instead
- I'm surprised we don't have a test_linecache.py in Lib/test
- The following lines at the end of updatecache() deserve a cleanup:

    try:
        lines = [line if isinstance(line, str) else str(line, coding)
                 for line in lines]
    except:
        pass  # Hope for the best

- The very shallow "except Exception as msg" should also be restricted
to (IOError, OSError) IMHO.
msg86503 - (view) Author: Daniel Diniz (ajaksu2) * (Python triager) Date: 2009-04-25 13:01
Cannot confirm with py3k on Linux, attaching the file I tested against.

I'll try to come up with a rough test_linecache to help fixing a couple
of issues :)
msg89466 - (view) Author: Pablo Torres Navarrete (ptn) Date: 2009-06-17 16:09
Cannot reproduce on Ubuntu 9.04 with py3k at revision 73267:

$ ./python
Python 3.1rc1+ (py3k:73267, Jun  7 2009, 14:45:03) 
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pdb
[49404 refs]
>>> d = pdb.Pdb()
[49446 refs]
>>> fname = r"temp/test.py"
[49448 refs]
>>> print(d.set_break(fname,1))
None
[49505 refs]
msg89506 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-06-18 14:17
This was probably fixed by issue4016 (r70587)
History
Date User Action Args
2022-04-11 14:56:29adminsetgithub: 46026
2009-06-18 14:17:40amaury.forgeotdarcsetstatus: open -> closed

nosy: + amaury.forgeotdarc
messages: + msg89506

resolution: out of date
2009-06-17 16:09:06ptnsetnosy: + ptn
messages: + msg89466
2009-04-25 13:01:19ajaksu2setfiles: + line.py
nosy: + ajaksu2
messages: + msg86503

2008-08-06 10:16:29pitrousetnosy: + pitrou
messages: + msg70779
2008-07-31 02:11:04benjamin.petersonsetnosy: + benjamin.peterson
messages: + msg70477
2008-05-11 22:57:22georg.brandlsetpriority: high -> critical
assignee: georg.brandl
messages: + msg66690
nosy: + georg.brandl
2008-01-30 09:58:05christian.heimessetpriority: high
keywords: + patch, easy
type: behavior
2008-01-29 17:18:12orivejsetnosy: + orivej
2007-12-29 04:36:00pyscriptersetfiles: + linecache.py.patch
messages: + msg59032
2007-12-29 04:14:09pyscriptersetmessages: + msg59031
2007-12-22 05:17:14pyscriptercreate