Author nadeem.vawda
Recipients Michael.Fox, nadeem.vawda, pitrou, rhettinger, serhiy.storchaka, vstinner
Date 2013-05-19.18:50:56
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1368989456.71.0.0894411600818.issue18003@psf.upfronthosting.co.za>
In-reply-to
Content
> I agree that making lzma.open() wrap its return value in a BufferedReader
> (or BufferedWriter, as appropriate) is the way to go.

On second thoughts, there's no need to change the behavior for mode='wb'.
We can just return a BufferedReader for mode='rb', and leave the current
behavior (returning a raw LZMAFile) in place for mode='wb'.


I also ran some additional benchmarks for the bz2 and gzip modules. It
looks like those two modules would also benefit from having their open()
functions use io.BufferedReader:

[lzma]

  $ time xzcat src.xz | wc -l
  1057980

  real    0m0.543s
  user    0m0.556s
  sys     0m0.024s
  $ ../cpython/python -m timeit -s 'import lzma, io' 'f = lzma.open("src.xz", "r")' 'for line in f: pass'
  10 loops, best of 3: 2.01 sec per loop
  $ ../cpython/python -m timeit -s 'import lzma, io' 'f = io.BufferedReader(lzma.open("src.xz", "r"))' 'for line in f: pass'
  10 loops, best of 3: 795 msec per loop

[bz2]

  $ time bzcat src.bz2 | wc -l
  1057980

  real    0m1.322s
  user    0m1.324s
  sys     0m0.044s
  $ ../cpython/python -m timeit -s 'import bz2, io' 'f = bz2.open("src.bz2", "r")' 'for line in f: pass'
  10 loops, best of 3: 3.71 sec per loop
  $ ../cpython/python -m timeit -s 'import bz2, io' 'f = io.BufferedReader(bz2.open("src.bz2", "r"))' 'for line in f: pass'
  10 loops, best of 3: 2.04 sec per loop

[gzip]

  $ time zcat src.gz | wc -l
  1057980

  real    0m0.310s
  user    0m0.296s
  sys     0m0.028s
  $ ../cpython/python -m timeit -s 'import gzip, io' 'f = gzip.open("src.gz", "r")' 'for line in f: pass'
  10 loops, best of 3: 1.94 sec per loop
  $ ../cpython/python -m timeit -s 'import gzip, io' 'f = io.BufferedReader(gzip.open("src.gz", "r"))' 'for line in f: pass'
  10 loops, best of 3: 556 msec per loop
History
Date User Action Args
2013-05-19 18:50:56nadeem.vawdasetrecipients: + nadeem.vawda, rhettinger, pitrou, vstinner, serhiy.storchaka, Michael.Fox
2013-05-19 18:50:56nadeem.vawdasetmessageid: <1368989456.71.0.0894411600818.issue18003@psf.upfronthosting.co.za>
2013-05-19 18:50:56nadeem.vawdalinkissue18003 messages
2013-05-19 18:50:56nadeem.vawdacreate