Message 31552 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	maenpaa
Recipients
Date	2007-04-27.11:57:44
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to

Content
> import urllib > urlobj = urllib.urlopen("someurl") > header = urlobj.read(1) > # some other operations (no other urlobj.read()) > contents = header + urlobj.read() This is effectively buffering the output, which is a perfectly acceptable solution... although I'd write like this: import urllib class BufferedReader(object): def __init__(self, fileobj, buffsize = 8192): def mark(self, maxbytes = 8192): def seek(self): br = BufferedReader(urllib.urlopen()) br.mark() header = br.read(1) br.seek() contents = br.read() That way you store all bytes that have been read. Rather than hoping nobody calls read(). > On the contrary I'm pretty sure using a sequential access this can be done > without doing these workarounds. Right now sequential access is provided without keeping a copy in memory. The issue arises when you want random access, however; urlobjs have no indication as to whether you're going to call seek(). As such, to provide the method they must assume you will call it. Therefore, regardless of whether seek() is actually called or not, a copy must be kept to offer the possibility that it can be called. You can work around this by offering the degenerate seek() provided by BufferedReader, but that's functionality that belongs in it's own class anyway. > anyway I don't understand a thing: HTTP can't delegate the server to > seek() the file? For one thing, its not supported by the standard. For another, it would be a waste of server resources, bandwidth and to top it off it would be really slow... even slower than using StringIO. HTTP resources are not simply files served up by httpd, they can also be dynamically generated content... How is an HTTP server supposed to seek backward and forward in a page that is programatically generated? Go try an tell web developers that they need to keep a copy of every page requested indefinitely, in case you want to send a SEEK request. HTTP resources are not local. To treat them as local you must make then local by putting them in a container, such as StringIO, a buffer or a local file. It's that simple. To try and abstract this fact would result in major performance issues or unreliability, or both.

> import urllib

> urlobj = urllib.urlopen("someurl")
> header = urlobj.read(1)
> # some other operations (no other urlobj.read())

> contents = header + urlobj.read()

This is effectively buffering the output, which is a perfectly acceptable solution...  although I'd write like this:

import urllib

class BufferedReader(object):
   def __init__(self, fileobj, buffsize = 8192):
   
   def mark(self, maxbytes = 8192):

   def seek(self):

br = BufferedReader(urllib.urlopen())
br.mark()
header = br.read(1)

br.seek()
contents = br.read()

That way you store all bytes that have been read.  Rather than hoping nobody calls read().


> On the contrary I'm pretty sure using a sequential access this can be done
> without doing these workarounds.

Right now sequential access is provided without keeping a copy in memory.  The issue arises when you want random access, however; urlobjs have no indication as to whether you're going to call seek().  As such, to provide the method they must assume you will call it.  Therefore, regardless of whether seek() is actually called or not, a copy must be kept to offer the *possibility* that it can be called.

You can work around this by offering the degenerate seek() provided by BufferedReader, but that's functionality that belongs in it's own class anyway.


> anyway I don't understand a thing: HTTP can't delegate the server to
> seek() the file?

For one thing, its not supported by the standard.  For another, it would be a waste of server resources, bandwidth and to top it off it would be really slow... even slower than using StringIO.  HTTP resources are not simply files served up by httpd, they can also be dynamically generated content... How is an HTTP server supposed to seek backward and forward in a page that is programatically generated? Go try an tell web developers that they need to keep a copy of every page requested indefinitely, in case you want to send a SEEK request.




HTTP resources are not local.  To treat them as local you must make then local by putting them in a container, such as StringIO, a buffer or a local file. It's that simple.  To try and abstract this fact would result in major performance issues or unreliability, or both.

History
Date	User	Action	Args
2007-08-23 14:52:33	admin	link	issue1682241 messages
2007-08-23 14:52:33	admin	create