classification
Title: clarify that linecache only works on files that can be decoded successfully
Type: behavior Stage: needs patch
Components: Documentation, Library (Lib) Versions: Python 3.4, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: docs@python, eric.araujo, python-dev, r.david.murray, takluyver, terry.reedy, vstinner, ztane
Priority: high Keywords: patch

Created on 2011-03-31 10:07 by vstinner, last changed 2017-02-09 12:06 by ztane. This issue is now closed.

Files
File name Uploaded Description Edit
linecache-encoding-doc.patch takluyver, 2015-03-10 17:45 review
Messages (14)
msg132641 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-03-31 10:07
linecache document doesn't tell that the module reads the #coding:xxx cookie to get the encoding of the Python file. linecache reads this cookie since 41665 (May 09 2007).

"The linecache module allows one to get any line from any file, ..."

=> "any file"!

And the example uses /etc/passwd which is not a Python file.

Not only it reads the #coding:xxx cookie, but updatecache() tries also a PEP 302 loader to read the file.

linecache should be marked as very specific to Python scripts, or it should be patched to become more generic (don't read the cookie / use loader by default).

Note: the locale encoding may change between to calls to the linecache module.
msg132782 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-04-02 01:08
The help(linecache) Description is more specific as to the intention (based on traceback usage):

"This is intended to read lines from modules imported -- hence if a filename is not found, it will look down the module search path for a file by that name."

My experiments show that this is too specific. It *can* read any file that it can find and decode as utf-8 (default, or you say, locale encoding or coding in cookie line). 

Find = absolute path
>>> linecache.getline('c:/programs/pydev/py32/LICENSE', 1)
'A. HISTORY OF THE SOFTWARE\n'

or relative path on sys.path
>>> linecache.getline('idlelib/ChangeLog', 1)
'Please refer to the IDLEfork and IDLE CVS repositories for\n'
>>> linecache.getline('idlelib/extend.txt', 1)
'Writing an IDLE extension\n'

Decode fails on byte illegal for utf-8:
>>> linecache.getline('idlelib/CREDITS.txt', 1)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xf6 in position 1566: invalid start byte

(It reads and decodes entire file even though only line 1 was requested. It choked on Löwis. I believe Py3 distributed text files should be utf-8 instead of latin-1.)

If I got rules right, doc should say "Filename must be an absolute path or relative path that can be found on sys.path." and "File must be utf-8 encoded or locale encoded or a Python file with a coding cookie."

(If you tried /etc/passwd, how did it fail?)
msg178884 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-01-03 01:44
Well, my initial message doesn't convince me anymore today (especially after reading Terry's message), so I prefer to close the issue as rejected. I don't think that it's really a problem :-)
msg237685 - (view) Author: Thomas Kluyver (takluyver) * Date: 2015-03-09 18:07
Someone on reddit ran into this, expecting that linecache can be used for an arbitrary text file:
http://www.reddit.com/r/Python/comments/2yetxc/utf8_encoding_problems/

I was quite surprised that the docs say "allows one to get any line from any file." I've always understood that linecache is specifically for Python files, and the use of tokenize.open() means that it will only work for files that are UTF-8 or have the #coding: magic comment in the first two lines.

I think the docs should at least mention this; I'm happy to work on a patch for it at some point if people agree.
msg237686 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-03-09 18:20
Sure, clarifying the docs seems sensible.  "Any file" is slightly different from the reality.
msg237786 - (view) Author: Thomas Kluyver (takluyver) * Date: 2015-03-10 17:45
First attempt at describing this attached.
msg238242 - (view) Author: Thomas Kluyver (takluyver) * Date: 2015-03-16 21:39
Anything else I should be doing here?
msg238427 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-03-18 13:15
New changeset 51341af466e3 by Victor Stinner in branch '3.4':
Issue #11726: clarify linecache doc: linecache is written to cache Python
https://hg.python.org/cpython/rev/51341af466e3
msg238428 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-03-18 13:19
New changeset 01cb2107cbc3 by Victor Stinner in branch '3.4':
Issue #11726: Fix linecache example in the doc
https://hg.python.org/cpython/rev/01cb2107cbc3
msg238429 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-03-18 13:21
4 years to fix this minor documentation issue, I feel ashamed...
msg238689 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-03-20 15:07
I think that that patch that Victor committed is incorrect, and that Thomas's patch is closer to correct.  People *do* use linecache with files other than python source files, and as far as I can see we are not going to stop supporting that.  Given the original docs the intent clearly was that the interface be general, not python-file-specific.
msg238690 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-03-20 15:20
OK, on further investigation I guess it wasn't intended to be so general :)  But I still think we should make a nod to the reality that it can be used on other text files.  I'll re-close the issue but I may add a sentence to the docs.
msg238692 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-03-20 15:32
New changeset ceb14ecc1942 by R David Murray in branch '3.4':
#11726: Make linecache docs reflect that all files are treated the same.
https://hg.python.org/cpython/rev/ceb14ecc1942

New changeset 1a5c72f9ff53 by R David Murray in branch 'default':
Merge: #11726: Make linecache docs reflect that all files are treated the same.
https://hg.python.org/cpython/rev/1a5c72f9ff53
msg287408 - (view) Author: Antti Haapala (ztane) * Date: 2017-02-09 12:06
Every now and then there are new questions and answers regarding the use of `linecache` module on Stack Overflow for doing random access to text files, even though the documentation states that it is meant for Python source code files.

One problem is that the title still states: "11.9. linecache — Random access to text lines"; the title should really be changed to "Random access to Python source code lines" so that the title wouldn't imply that this is a general-purpose random access library for text files.
History
Date User Action Args
2017-02-09 12:06:56ztanesetnosy: + ztane
messages: + msg287408
2015-03-20 15:32:45python-devsetmessages: + msg238692
2015-03-20 15:20:13r.david.murraysetstatus: open -> closed

messages: + msg238690
2015-03-20 15:07:23r.david.murraysetstatus: closed -> open

messages: + msg238689
2015-03-18 13:21:11vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg238429
2015-03-18 13:19:56python-devsetmessages: + msg238428
2015-03-18 13:15:07python-devsetnosy: + python-dev
messages: + msg238427
2015-03-16 21:39:39takluyversetmessages: + msg238242
2015-03-10 17:45:48takluyversetfiles: + linecache-encoding-doc.patch
keywords: + patch
messages: + msg237786
2015-03-09 18:20:09r.david.murraysetstatus: closed -> open

type: behavior

title: linecache becomes specific to Python scripts in Python 3 -> clarify that linecache only works on files that can be decoded successfully
nosy: + r.david.murray
versions: + Python 3.4, Python 3.5, - Python 3.2, Python 3.3
messages: + msg237686
resolution: rejected -> (no value)
stage: needs patch
2015-03-09 18:07:37takluyversetnosy: + takluyver
messages: + msg237685
2013-01-03 01:44:26vstinnersetstatus: open -> closed
resolution: rejected
messages: + msg178884
2011-06-26 18:59:24terry.reedysetversions: - Python 3.1
2011-04-28 00:20:25eric.araujosetnosy: + eric.araujo
2011-04-02 01:08:39terry.reedysetnosy: + terry.reedy
messages: + msg132782
2011-03-31 16:03:08rhettingersetpriority: normal -> high
2011-03-31 10:07:51vstinnercreate