Issue 20992: reading individual bytes of multiple binary files using the Python module fileinput

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/65191

classification

Title:	reading individual bytes of multiple binary files using the Python module fileinput
Type:	enhancement	Stage:	needs patch
Components:	Library (Lib)	Versions:	Python 3.5

process

Status:	closed	Resolution:	rejected
Dependencies:		Superseder:
Assigned To:		Nosy List:	Tommy.Carstensen, josh.r
Priority:	normal	Keywords:

Created on 2014-03-20 09:39 by Tommy.Carstensen, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (6)
msg214195 - (view)	Author: Tommy Carstensen (Tommy.Carstensen)	Date: 2014-03-20 09:39
This is my first post on bugs.python.org. I hope I abide to the rules. It was suggested to me on stackoverflow.com, that I request an enhancement to the module fileinput here: http://stackoverflow.com/questions/22510123/reading-individual-bytes-of-multiple-binary-files-using-the-python-module-filein I can read the first byte of a binary file like this: with open(my_binary_file,'rb') as f: f.read(1) But when I run this code: import fileinput with fileinput.FileInput(my_binary_file,'rb') as f: f.read(1) then I get this error: AttributeError: 'FileInput' object has no attribute 'read' I would like to propose an enhancement to fileinput, which makes it possible to read binary files byte by byte. I posted this solution to my problem: def process_binary_files(list_of_binary_files): for file in list_of_binary_files: with open(file,'rb') as f: yield f.read(1) return list_of_binary_files = ['f1', 'f2'] generate_byte = process_binary_files(list_of_binary_files) byte = next(generate_byte)
msg214739 - (view)	Author: Josh Rosenberg (josh.r) *	Date: 2014-03-24 21:44
fileinput's semantics are heavily tied to lines, not bytes. And processing binary files byte by byte is rather inefficient; can you explain why this feature would be of general utility such that it would be worth including it in the standard library? It's not hard to just get a byte at a time using existing parts: def bytefileinput(): return (bytes((b,)) for line in fileinput.input() for b in line) There are ways to do similar things without using fileinput at all. But it really depends on your use case. Giving fileinput a read() method isn't a bad idea assuming some reasonable behavior is defined for the various line oriented methods, but making it iterate binary mode input byte by byte would be a breaking change of limited utility in my view.
msg214741 - (view)	Author: Josh Rosenberg (josh.r) *	Date: 2014-03-24 21:48
That example should have included mode="rb" when using fileinput.input(); oops. Pretend I didn't forget it.
msg214752 - (view)	Author: Tommy Carstensen (Tommy.Carstensen)	Date: 2014-03-24 22:32
I read the fileinput code and realized how heavily tied it is to line input. Will reading individual bytes as suggested not be very memory intensive, if each line is billions of characters? def bytefileinput(): return (bytes((b,)) for line in fileinput.input() for b in line) I posted my workaround on stackoverflow (see link earlier in tread), which does not make use of the fileinput module at all. After having read through the fileinput code I agree that the module should only support reading lines and this enhancement request should be closed.
msg214758 - (view)	Author: Josh Rosenberg (josh.r) *	Date: 2014-03-24 23:18
On memory: Yeah, it could be if the file didn't include any newline characters. Same problem could apply if a text input file relied on word wrap in an editor and included very few or no newlines itself. There are non-fileinput ways of doing this, like I said; if you want consistent performance, you'd probably use one of them. For example, using the two arg form of iter: from functools import partial def bytefileinput(files): for file in files: with open(filename, "rb") as f: yield from iter(partial(f.read, 1), b'') Still kind of slow, but predictable on memory usage and not to complex.
msg214759 - (view)	Author: Josh Rosenberg (josh.r) *	Date: 2014-03-24 23:18
And of course, missed another typo. open's first arg should be file, not filename.

History
Date	User	Action	Args
2022-04-11 14:58:00	admin	set	github: 65191
2014-03-24 23:18:59	josh.r	set	messages: + msg214759
2014-03-24 23:18:06	josh.r	set	messages: + msg214758
2014-03-24 22:32:04	Tommy.Carstensen	set	status: open -> closed resolution: rejected messages: + msg214752
2014-03-24 21:48:56	josh.r	set	messages: + msg214741
2014-03-24 21:44:39	josh.r	set	nosy: + josh.r messages: + msg214739
2014-03-24 15:06:03	berker.peksag	set	stage: needs patch versions: + Python 3.5, - Python 3.3, Python 3.4
2014-03-20 09:39:18	Tommy.Carstensen	create