This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author larry
Recipients Dennis Sweeney, eric.smith, larry, pablogsal, tim.peters
Date 2022-03-30.06:29:14
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1648621754.87.0.937860195959.issue47145@roundup.psfhosted.org>
In-reply-to
Content
> Assuming we do want to be able to add() after a get_ready(), is there
> a reason that "forgetting" already-produced nodes is the correct
> behavior, as opposed to remembering all nodes ever added, and
> raising iff the addition creates a cycle among all nodes ever
> added or depends on an already-yielded node?

I'm not sure "correct" applies here, because I don't have a sense that one behavior is conceptually more correct than the other.  But in implementing my API change, forgetting about the done nodes seemed more practical.

The "benefit" to remembering done nodes: the library can ignore dependencies to them in the future, forever.  Adding a dependency to a node that's already been marked as "done" doesn't make much conceptual sense to me, but as a practical thing maybe it's useful?  I'm not sure, but it doesn't seem all that useful.

I can only come up with a marginal reason why remembering done nodes is useful.  Let's say all your tasks fan out from some fundamental task, like "download master list of work" or something.  One of your tasks might discover additional tasks that need to run, and conceptually those tasks might depend on your "download master list of work" task.  If the graph remembers the done list forever, then adding that dependency is harmless.  If the graph forgets about done nodes, then adding that dependency could re-introduce that task to the graph, which could goof things up.  So maybe it's a little bit of a footgun?  But on the other hand: you already know you're running, and you're a task that was dependent on the master list of work, which means you implicitly know that dependency has been met.  So just skip adding the redundant dependency and you're fine.

On the other hand, forgetting about the nodes has a definite practical benefit: the graph consumes less memory.  If you use a graph object for a long time, the list of done nodes it's holding references to would continue to grow and grow and grow.  If we forget about done nodes, we free up all that memory, and done membership testing maybe gets faster.

I guess I'm not married to the behavior.  If someone had a great conceptual or practical reason why remembering the done nodes forever was better, I'd be willing to listen to reason.
History
Date User Action Args
2022-03-30 06:29:14larrysetrecipients: + larry, tim.peters, eric.smith, pablogsal, Dennis Sweeney
2022-03-30 06:29:14larrysetmessageid: <1648621754.87.0.937860195959.issue47145@roundup.psfhosted.org>
2022-03-30 06:29:14larrylinkissue47145 messages
2022-03-30 06:29:14larrycreate