Message 255813 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	serhiy.storchaka
Recipients	Arfrever, docs@python, flox, jaraco, jgeralnik, kristjan.jonsson, pitrou, r.david.murray, serhiy.storchaka, zach.ware
Date	2015-12-03.10:35:04
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1449138906.42.0.648784807131.issue15068@psf.upfronthosting.co.za>
In-reply-to

Content
Using readlines() instead of readline() was added in 4dbbf322a9df for performance. But it looks that now this is not needed. Naive implementation with readline() is about 2 times slower, but with careful optimization we can achieve the same performance (or better). Here are results of benchmarks. Unpatched: $ mkdir testdir $ for i in `seq 10`; do for j in `seq 1000`; do echo "$j"; done >"testdir/file$i"; done $ ./python -m timeit -s "import fileinput, glob; files = glob.glob('testdir/')" -- "f = fileinput.input(files)" "while f.readline(): pass" 10 loops, best of 3: 56.4 msec per loop $ ./python -m timeit -s "import fileinput, glob; files = glob.glob('testdir/')" -- "list(fileinput.input(files))"10 loops, best of 3: 68.4 msec per loop Patched: $ ./python -m timeit -s "import fileinput, glob; files = glob.glob('testdir/')" -- "f = fileinput.input(files)" "while f.readline(): pass" 10 loops, best of 3: 47.4 msec per loop $ ./python -m timeit -s "import fileinput, glob; files = glob.glob('testdir/')" -- "list(fileinput.input(files))" 10 loops, best of 3: 63.1 msec per loop The patch also fixes original issue. It also fixes yet one issue. Currently lines are buffered and you need to enter many lines first then get first line: >>> import fileinput >>> fi = fileinput.input() >>> line = fi.readline() qwerty asdfgh zxcvbn ^D >>> line 'qwerty\n' With the patch you get the line just as it entered.

Using readlines() instead of readline() was added in 4dbbf322a9df for performance. But it looks that now this is not needed. Naive implementation with readline() is about 2 times slower, but with careful optimization we can achieve the same performance (or better).

Here are results of benchmarks.

Unpatched:

$ mkdir testdir
$ for i in `seq 10`; do for j in `seq 1000`; do echo "$j"; done >"testdir/file$i"; done
$ ./python -m timeit -s "import fileinput, glob; files = glob.glob('testdir/*')" -- "f = fileinput.input(files)" "while f.readline(): pass"
10 loops, best of 3: 56.4 msec per loop
$ ./python -m timeit -s "import fileinput, glob; files = glob.glob('testdir/*')" -- "list(fileinput.input(files))"10 loops, best of 3: 68.4 msec per loop

Patched:

$ ./python -m timeit -s "import fileinput, glob; files = glob.glob('testdir/*')" -- "f = fileinput.input(files)" "while f.readline(): pass"
10 loops, best of 3: 47.4 msec per loop
$ ./python -m timeit -s "import fileinput, glob; files = glob.glob('testdir/*')" -- "list(fileinput.input(files))"
10 loops, best of 3: 63.1 msec per loop

The patch also fixes original issue.

It also fixes yet one issue. Currently lines are buffered and you need to enter many lines first then get first line:

>>> import fileinput
>>> fi = fileinput.input()
>>> line = fi.readline()
qwerty
asdfgh
zxcvbn
^D
>>> line
'qwerty\n'

With the patch you get the line just as it entered.

History
Date	User	Action	Args
2015-12-03 10:35:07	serhiy.storchaka	set	recipients: + serhiy.storchaka, jaraco, pitrou, kristjan.jonsson, Arfrever, r.david.murray, flox, docs@python, zach.ware, jgeralnik
2015-12-03 10:35:06	serhiy.storchaka	set	messageid: <1449138906.42.0.648784807131.issue15068@psf.upfronthosting.co.za>
2015-12-03 10:35:06	serhiy.storchaka	link	issue15068 messages
2015-12-03 10:35:05	serhiy.storchaka	create