This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Add getsize() to io instances
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.0
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: christian.heimes, gvanrossum, loewis
Priority: low Keywords: patch

Created on 2007-10-28 11:09 by christian.heimes, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
py3k_sizeinfo.patch christian.heimes, 2007-10-28 11:09
Messages (8)
msg56877 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-10-28 11:09
I always missed a getsize() method on file objects. The patch adds a
method getsize() to all io instances. The method returns a SizeInfo
object which can print a human readable name or the bare size in bytes.
The method is using os.fstat and falls back to the seek(0,2), tell()
pattern.

>>> f = open("/etc/passwd")
>>> f.getsize()
<SizeInfo 1.7 KiB>
>>> int(f.getsize())
1721
>>> str(f.getsize())
'1.7 KiB'
>>> (f.getsize().sizeinfo())
(1.681, 1)

I'm going to provide unit tests and documentation if you like the feature.
msg56887 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2007-10-28 17:44
I'm skeptical:

- If you add getsize, why not getlastchangeddate, getowner, getpermissions?

- in general, streams (which really is the interface for file-like
objects) don't have the notion of "size"; only some do.

- what is the purpose of the f.tell fragment? ie. why could that work
when fstat doesn't?
msg56888 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-10-28 18:14
Martin v. Löwis wrote:
> I'm skeptical:
> 
> - If you add getsize, why not getlastchangeddate, getowner, getpermissions?

getowner() etc. work only with file based streams and not with memory
buffers. getsize() works with every concrete class in io.py

> - in general, streams (which really is the interface for file-like
> objects) don't have the notion of "size"; only some do.

I understand that getsize() doesn't make sense for e.g. a socket based
stream. However the implementation of getsize() works with memory
buffers and file descriptors

> - what is the purpose of the f.tell fragment? ie. why could that work
> when fstat doesn't?

The tell(), seek(0,2) is a generic fall back for io instances that
aren't based on a file descriptor. It's required for BytesIO and
StringIO. However I could come up with an implementation for BytesIO
that queries the buffer directly.
msg56928 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-29 20:49
I'm -1 myself.  I've rarely needed this -- if I wanted to know the size,
I was almost always going to read the data into memory anyway, so why
not just read it and then ask how much you got?  For files on the
filesystem there's os.path.getsize().

If I ever were to let this in, here's some more criticism:

(a) the SizeInfo class is overkill.  getsize() should just return an int.

(b) getsize() should check self.seekable() first and raise the
appropriate error if the file isn't seekable.

(c) os.fstat() is much less likely to work than the tell-seek-tell-seek
sequence, so why not use that everywhere?

(d) people will expect to use this on text files, but of course the
outcome will be in bytes, hence useless.
msg57253 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-11-08 14:59
> (a) the SizeInfo class is overkill.  getsize() should just return an int.

But I like overkill :)

> (b) getsize() should check self.seekable() first and raise the
appropriate error if the file isn't seekable.

That's easy to implement

> (c) os.fstat() is much less likely to work than the tell-seek-tell-seek
sequence, so why not use that everywhere?

fstat doesn't have concurrency problems in multi threaded apps. I
haven't profiled it but I would guess that fstat is also faster than
tell seek.

> (d) people will expect to use this on text files, but of course the
outcome will be in bytes, hence useless.

I could rename the method to getfssize, getbytesize, getsizeb ... to
make clear that it doesn't return the amount of chars but the amount of
used bytes.
msg57289 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-11-09 00:28
Sorry, I still don't like it.  You'll have to come up with a darned good
use case to justify this.
msg57309 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-11-09 15:16
Does "it's convenient and I'm too lazy to address it in my code whenever
the problem arises?" count as a darn good use case?

No?

Mh, I thought so :)
msg57341 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2007-11-09 23:55
Ok, I'm rejecting it now based on the YAGNI argument Guido brought up,
and based on my own concerns.
History
Date User Action Args
2022-04-11 14:56:27adminsetgithub: 45692
2008-01-06 22:29:45adminsetkeywords: - py3k
versions: Python 3.0
2007-11-09 23:55:06loewissetstatus: open -> closed
resolution: rejected
messages: + msg57341
2007-11-09 15:16:36christian.heimessetmessages: + msg57309
2007-11-09 00:28:55gvanrossumsetmessages: + msg57289
2007-11-08 14:59:56christian.heimessetpriority: low
keywords: + py3k, patch
messages: + msg57253
2007-10-29 20:49:11gvanrossumsetnosy: + gvanrossum
messages: + msg56928
2007-10-28 18:14:43christian.heimessetmessages: + msg56888
2007-10-28 17:44:25loewissetnosy: + loewis
messages: + msg56887
2007-10-28 11:09:43christian.heimescreate