Message 117371 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	hunteke
Recipients	amaury.forgeotdarc, georg.brandl, hunteke, pitrou
Date	2010-09-25.14:26:03
SpamBayes Score	3.215206e-13
Marked as misclassified	No
Message-id	<1285424764.69.0.831923069805.issue9942@psf.upfronthosting.co.za>
In-reply-to

Content
> Well, first, this would only work for large objects. [...] > Why do you think you might have such duplication in your workload? Some of the projects with which I work involve multiple manipulations of large datasets. Often, we use Python scripts as "first and third" stages in a pipeline. For example, in one current workflow, we read a large file into a cStringIO object, do a few manipulations with it, pass it off to a second process, and await the results. Meanwhile, the large file is sitting around in memory because we need to do more manipulations after we get results back from the second application in the pipeline. "Graphically": Python Script A -> External App -> Python Script A read large data process data more manipulations Within a single process, I don't see any gain to be had. However, in this one use-case, this pipeline is running concurrently with a number of copies with slightly different command line parameters.

> Well, first, this would only work for large objects. [...]
> Why do you think you might have such duplication in your workload?

Some of the projects with which I work involve multiple manipulations of large datasets.  Often, we use Python scripts as "first and third" stages in a pipeline.  For example, in one current workflow, we read a large file into a cStringIO object, do a few manipulations with it, pass it off to a second process, and await the results.  Meanwhile, the large file is sitting around in memory because we need to do more manipulations after we get results back from the second application in the pipeline.  "Graphically":

Python Script A    ->    External App    ->    Python Script A
read large data          process data          more manipulations

Within a single process, I don't see any gain to be had.  However, in this one use-case, this pipeline is running concurrently with a number of copies with slightly different command line parameters.

History
Date	User	Action	Args
2010-09-25 14:26:04	hunteke	set	recipients: + hunteke, georg.brandl, amaury.forgeotdarc, pitrou
2010-09-25 14:26:04	hunteke	set	messageid: <1285424764.69.0.831923069805.issue9942@psf.upfronthosting.co.za>
2010-09-25 14:26:03	hunteke	link	issue9942 messages
2010-09-25 14:26:03	hunteke	create