Message 31547 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	maenpaa
Recipients
Date	2007-03-21.01:39:53
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to

Content
> I use the method you wrote, but this must be done manually, > and I don't know why. read() is a stream processing method, whereas seek() is a random access processing method. HTTP resources are in essence streams so they implement read() but not seek(). Trying to shoehorn a stream to act like a random access file has some rather important technical implications. For example: what happens when an HTTP resource is larger than available memory and we try to maintain a full featured seek() implementation? > so what is urlopen() for? Fetching a webpage or RSS feed and feeding it to a parser, for example. StringIO is a class that was designed to implement feature complete, random access, file-like object behavior that can be wrapped around a stream. StringIO can and should be used as an adapter for when you have a stream that you need random access to. This allows designers the freedom to simply implement a good read() implementation and let clients wrap the output in a StringIO if needed. If in your application you always want random access and you don't have to deal with large files: def my_urlopen(args, kwargs): return StringIO.StringIO(urllib2.urlopen(args, **kwargs).read()) Python makes delegation trivially easy. In essence, urlfiles (the result of urllib2.urlopen()) and regular files (the result of open()) behave differently because they implement different interfaces. If you use the common interface (read), then you can treat them equally. If you use the specialized interface (seek, tell, etc.) you'll have trouble. The solution is wrap the general objects in a specialized object that implements the desired interface, StringIO.

> I use the method you wrote, but this must be done manually,
> and I don't know why.
read() is a stream processing method, whereas seek() is a random access processing method.  HTTP resources are in essence streams so they implement read() but not seek().  Trying to shoehorn a stream to act like a random access file has some rather important technical implications.  For example: what happens when an HTTP resource is larger than available memory and we try to maintain a full featured seek() implementation?

> so what is urlopen() for?
Fetching a webpage or RSS feed and feeding it to a parser, for example.

StringIO is a class that was designed to implement feature complete, random access, file-like object behavior that can be wrapped around a stream.  StringIO can and should be used as an adapter for when you have a stream that you need random access to.  This allows designers the freedom to simply implement a good read() implementation and let clients wrap the output in a StringIO if needed.

If in your application you always want random access and you don't have to deal with large files:
def my_urlopen(*args, **kwargs):
   return StringIO.StringIO(urllib2.urlopen(*args, **kwargs).read())

Python makes delegation trivially easy.

In essence, urlfiles (the result of urllib2.urlopen()) and regular files (the result of open()) behave differently because they implement different interfaces.  If you use the common interface (read), then you can treat them equally.  If you use the specialized interface (seek, tell, etc.) you'll have trouble.  The solution is wrap the general objects in a specialized object that implements the desired interface, StringIO.

History
Date	User	Action	Args
2007-08-23 14:52:32	admin	link	issue1682241 messages
2007-08-23 14:52:32	admin	create