This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: difflib should accept arbitrary line iterators
Type: enhancement Stage: test needed
Components: Library (Lib) Versions: Python 3.3
process
Status: closed Resolution: works for me
Dependencies: Superseder:
Assigned To: Nosy List: eric.araujo, giampaolo.rodola, gruszczy, techtonik, terry.reedy, tim.peters
Priority: normal Keywords:

Created on 2010-06-05 11:45 by techtonik, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (4)
msg107130 - (view) Author: anatoly techtonik (techtonik) Date: 2010-06-05 11:45
difflib operates on the lists, but it should be possible to use arbitrary generators. This will require internal limit on buffer size that has a side advantage of limiting difflib to available memory.
msg107246 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-06-06 22:28
If you allow me to rephrase your feature request to “difflib should allow arbitrary iterators that yield lines”, I’m +1.

Adjusting the version and adding tim_one to nosy as per py3k/Misc/maintainers.rst
msg132872 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-04-03 19:46
A quick look at the code doesn’t immediately tells me that difflib accepts sequences, not only lists.  I’m not sure iterators are accepted too.  What specific functions or methods have you found too strict?
msg184619 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2013-03-19 08:20
'difflib' is a module that defines three classes and some functions. It does not do anything in itself. SequenceMatcher, which is the basis for the other functions, operates on sequences of hashable objects. The inputs must be concrete random access indexed sequences because SequenceMatcher scans the inputs and then jumps around finding good matches and the complementary differences. It is completely unlike functions that take iterables as arguments and than call iter to get an iterator. If S-M took iterables as inputs, it would have to either copy all inputs with list (bad) or figure out whether or not to copy. I think users can continue to pass file.readlines() or list(iterable) as needed.

The two functions that mistakenly say inputs are 'lists' instead of 'sequences' will be fixed at part of another issue.
History
Date User Action Args
2022-04-11 14:57:01adminsetgithub: 53151
2013-03-19 08:20:15terry.reedysetstatus: open -> closed

nosy: + terry.reedy
messages: + msg184619

resolution: works for me
2011-04-03 19:46:36eric.araujosetstage: needs patch -> test needed
messages: + msg132872
versions: + Python 3.3, - Python 3.2
2011-04-02 14:02:23gruszczysetnosy: + gruszczy
2010-07-05 21:18:24brian.curtinsetnosy: - brian.curtin
2010-06-07 20:02:34giampaolo.rodolasetnosy: + giampaolo.rodola
2010-06-07 06:34:53techtoniksettitle: difflib: support input generators -> difflib should accept arbitrary line iterators
2010-06-07 00:29:55brian.curtinsetnosy: + brian.curtin

stage: needs patch
2010-06-06 22:28:52eric.araujosetversions: - Python 3.1, Python 3.3
nosy: + eric.araujo, tim.peters

messages: + msg107246

type: resource usage -> enhancement
2010-06-05 11:45:39techtonikcreate