This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: improvements for linecache
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.0
process
Status: closed Resolution: later
Dependencies: Superseder:
Assigned To: Nosy List: pitrou, r.david.murray, umaxx
Priority: low Keywords: patch

Created on 2007-12-30 14:18 by umaxx, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
linecache.py.diff umaxx, 2007-12-30 14:18
Messages (7)
msg59041 - (view) Author: (umaxx) Date: 2007-12-30 14:18
here comes a simple patch for linecache core module, which does the
following:

- remove double comment
- instead of adding all lines with readlines() to the cache, just add
seek points for every line
- return lines from cached seek-points instead directly from dict-cache

advantages of this patch:

- reading lines from very big files (>1GB) is no problem anymore
- linecache can handle a large number of large files now 
- updatecache() is faster now because "for line in fp:" is faster than
readlines()

disadvantages:

- reading a single line from cache will be a little bit slower, then
before because of extra open() call to the file

summary:

- this diff presents a different caching approach which is able to
handle a lot of large files too

__future__-work:

- the code is ugly and unstructured, someone needs to beautify it
- an extra function: get_list_of_lines_from_list_of_linenumbers() would
be nice to have
- test-cases for cache-consistence would be nice to have
msg59106 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-01-02 22:18
I'll look at this when I have time. If you find someone else interested
in reviewing, please give them the patch!
msg79284 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-01-06 20:32
Looking at the patch, the recorded seek points will probably be wrong if
some newlines were translated (e.g. '\r\n' -> '\n') when reading the file.

I'm also not sure not what the use case for very big files is. linecache
is primarily used for printing tracebacks, the API isn't really
general-purpose.
msg79288 - (view) Author: (umaxx) Date: 2009-01-06 21:27
> Looking at the patch, the recorded seek points will probably be wrong if
> some newlines were translated (e.g. '\r\n' -> '\n') when reading the file.

ack, this could be a problem.

> I'm also not sure not what the use case for very big files is. 

this is easy to answer: i used it for example for parsing (still
growing) big log files from mail servers. parsing the whole file first
time, and than later: starting from line xyz+1 (xyz was the last line
recorded after first time parsing) *without* parsing the whole file
again. especially very useful for growing log files >1GB

just try to get linenumber 1234567 from a 2,3GB log file with the
current linecache implementation :)
the main idea behind the patch is to cache the seek points to save a lot
of time on big files.

> linecache is primarily used for printing tracebacks, the API 
> isn't really general-purpose.

i know :)
msg116823 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-09-18 18:36
@umaxx are you interested in taking this forward?
msg118174 - (view) Author: (umaxx) Date: 2010-10-08 10:11
@BreamoreBoy: what do you man by taking this forward?

The patch is there. Since three years now, no one else seems to be interested.

I personally do not have any interest in this anymore as I just do not use Python for this stuff anymore since a long time now too, so I do not care if Python linecache is going to be improved or not.

IMHO, such things like slow linecache could be the reason for people to switch to languages with faster String-Operations and Caches like Perl.

If you like just close the bug report or commit the patch or whatever - I do not care anymore.
msg118428 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-10-12 15:16
I am indeed going to close this.  The patch isn't complete, since there's the line ending issue Antoine pointed out, which implies that there are also some missing tests.  

I doubt that linecache performance is something that affects very many people, but if someday someone wants to pick this up and finish it, it sounds like there's no objection in principle to the change.
History
Date User Action Args
2022-04-11 14:56:29adminsetgithub: 46049
2010-10-12 15:16:11r.david.murraysetstatus: open -> closed
nosy: + r.david.murray, - BreamoreBoy
messages: + msg118428

resolution: later
stage: resolved
2010-10-08 10:11:45umaxxsetmessages: + msg118174
2010-09-18 18:36:33BreamoreBoysetnosy: + BreamoreBoy
messages: + msg116823
2009-01-06 21:27:43umaxxsetmessages: + msg79288
2009-01-06 20:32:46pitrousetnosy: + pitrou
messages: + msg79284
2009-01-06 05:05:29gvanrossumsetnosy: - gvanrossum
2009-01-06 05:05:23gvanrossumsetassignee: gvanrossum ->
2008-01-02 22:18:02gvanrossumsetpriority: low
assignee: gvanrossum
messages: + msg59106
keywords: + patch
nosy: + gvanrossum
2007-12-30 14:18:38umaxxcreate