Message 31550 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	maenpaa
Recipients
Date	2007-04-27.03:36:58
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to

Content
> In my opinion it's not complicated, it's convoluted. I must use two > object to handle one data stream. seek() is not a stream operation. It is a random access operation (file-like != stream). If you were only trying to use stream operations then you wouldn't have these problems. Each class provides a separate functionality, urllib gets the file while StringIO stores it. The fact that these responsibilities are given to different classes should not be surprising since the represent separately useful concepts that abstract different things. It's not convoluted, it's good design. If every class tried to do everything, pretty soon you're adding solve_my_business_problem_using_SOA() to __builtins__ and nobody wants that. > Furthermore it's a waste of resources. I must copy data to another > object. Luckily in my script I download and handle only little files. But what if > a python program must handle big files? This is exactly why urllib doesn't provide seek. Deep down in the networking library there's a socket with a 8KiB buffer talking to the HTTP server. No matter how big the file you're getting with urllib, once that buffer is full the socket starts dropping packets. To provide seek(), urllib would need to keep an entire copy of the file that was retrieved, (or provide mark()/seek(), but those have wildly different semantics from the seek()s were used to in python, and besides they're too Java). This works fine if you're only working with small files, but you raise a good point: "But what if a python program must handle big files?". What about really big files (say a Knoppix DVD ISO)? Sure you could use urlretrieve, but what if urlretrive is implemented in terms of urlopen? Sure urllib could implement seek (with the same semantics as file.seek()) but that would mean breaking urllib for any resource big enough that you don't want the whole thing in memory. >> You can check the type of the response content before you try >> to uncompress it via the Content-Encoding header of the >> response >It's not a generic solution The point of this suggestion is not that this is the be all and end all solution, but that code that needs seek can probably be rewritten so that it does not. Either that or you could implement BufferedReader with the methods mark() and seek() and wrap the result of urlopen.

> In my opinion it's not complicated, it's convoluted. I must use two
> object to handle one data stream.

seek() is not a stream operation. It is a random access operation (file-like != stream). If you were only trying to use stream operations then you wouldn't have these problems.   

Each class provides a separate functionality, urllib gets the file while StringIO stores it.  The fact that these responsibilities are given to different classes should not be surprising since the represent separately useful concepts that abstract different things.  It's not convoluted, it's good design.  If every class tried to do everything, pretty soon you're adding solve_my_business_problem_using_SOA() to __builtins__ and nobody wants that.


> Furthermore it's a waste of resources. I must copy data to another
> object. Luckily in my script I download and handle only little files. But what if
> a python program must handle big files?

This is exactly why urllib *doesn't* provide seek. Deep down in the networking library there's a socket with a 8KiB buffer talking to the HTTP server. No matter how big the file you're getting with urllib, once that buffer is full the socket starts dropping packets. 

To provide seek(), urllib would need to keep an entire copy of the file that was retrieved, (or provide mark()/seek(), but those have wildly different semantics from the seek()s were used to in python, and besides they're too Java).  This works fine if you're only working with small files, but you raise a good point: "But what if a python program must handle big files?".  What about really big files (say a Knoppix DVD ISO)?  Sure you could use urlretrieve, but what if urlretrive is implemented in terms of urlopen?

Sure urllib could implement seek (with the same semantics as file.seek()) but that would mean breaking urllib for any resource big enough that you don't want the whole thing in memory.


>> You can check the type of the response content before you try
>> to uncompress it via the Content-Encoding header of the
>> response

>It's not a generic solution

The point of this suggestion is not that this is the be all and end all solution, but that code that *needs* seek can probably be rewritten so that it does not.  Either that or you could implement BufferedReader with the methods mark() and seek() and wrap the result of urlopen.

History
Date	User	Action	Args
2007-08-23 14:52:32	admin	link	issue1682241 messages
2007-08-23 14:52:32	admin	create