This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Graphlib documentation (edge direction)
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.11
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Dennis Sweeney, eric.smith, python-dev, rhettinger, tim.peters
Priority: normal Keywords: patch

Created on 2021-12-14 15:20 by dam1784, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 30102 closed python-dev, 2021-12-14 15:22
PR 30223 closed dam1784, 2021-12-21 15:00
Messages (17)
msg408535 - (view) Author: David Mc Dougall (dam1784) * Date: 2021-12-14 15:20
The documentation for graphlib encourages users to represent their graphs in a awkward format.

Graphs are currently represented using dictionaries of nodes, for example:
graph["end_node"] = ["start_node"]

And this is unintuitive because you can't use the graph traverse edges in their forward direction.
If you look up a node in the graph, you get all of the nodes that point to the key,
as opposed to all of the nodes that the key points to.

The solution is to rewrite the documentation such that all of the edge directions are reversed.
The final topologically sorted list will be in the "opposite" direction as a consequence.

This will cause no functional changes.
msg411047 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2022-01-20 21:57
For the purpose of topological sorting, yes, it's by far most natural to give, for each node, the collection of that node's predecessors. And that's the way topsort applications typically collect their data to begin with, like, "here, for each recipe, is a list of ingredients needed first" or "before we can begin this job, here are the other jobs that need to complete first". or, as in makefiles, "here's a list of the targets that need to be built before we can start building the current target".

I definitely don't want to change the wording, because it's bog standard as is. For example, Wikipedia's "topological sorting" entry is typical:

"""
In computer science, a topological sort or topological ordering of a directed graph is a linear ordering of its vertices such that for every directed edge uv from vertex u to vertex v, u comes before v in the ordering. 
"""

It's what topsort _means_. We did not intend to implement a "reverse topsort" algorithm, and I see no attraction to doing so, neither in pretending we did so.

If the way the user collects their data stores only successor links (which, as above, seems odd in applications that actually use topsorts), then they need something like this instead:

ts = TopologicalSorter()
for u, successors in my_forward_graph.items():
    for v in successors:
        ts.add(v, u)

But that's something I never needed.
msg411060 - (view) Author: David Mc Dougall (dam1784) * Date: 2022-01-20 23:42
The "reverse-toposort" is actually quite a good idea. The end-user is usually going to want to iterate over the sorted output in the "reverse" order anyways, especially if they're doing task ordering / dependency resolution.

Also, the underlying algorithm produces the "reverse" ordering by default. In my experience from writing and using my own topological sorting programs using the "correct" definition: the toposorter reverses the list, and then the users iterates over it in reverse order.
msg411062 - (view) Author: David Mc Dougall (dam1784) * Date: 2022-01-21 00:26
> If the way the user collects their data stores only successor links (which, as above, seems odd in applications that actually use topsorts), then they need something like this instead:

Actually they only need to do this:

ts = TopologicalSorter(my_forward_graph).static_order()
ts = reversed(ts)


I think part of the issue here is that the document uses two terms to describe the same thing: "predecessor" and "edge-direction". Everywhere it discusses predecessors it gets it right, but the edge direction is hopelessly confused because they tried to use the "normal" definition of topo-sort and the only way to make that work is to also reverse the direction of the graph's edges to compensate (and the two reversals cancel each other out).

The edge direction is only mentioned twice in the whole document, once to define topo-sort and again to define the graph format.

If the users problem fits into the task dependency paradigm, then this library makes a lot of sense. But for users who just have a directed graph, the documentation is really really confusing. 

I think it would be much better if the document explained that this was a "reversed" topological sort with a slightly different definition, and used a "bog standard" graph format like for example in this tutorial: https://www.python.org/doc/essays/graphs/
msg411068 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2022-01-21 01:37
I'm going to leave this to Pablo - adding the `graph` argument was his idea ;-)

It would, I think, have been better if this argument had been named, say, "preds" instead of "graph".

The docs, to my eyes, are entirely clear about that `graph` is a representation based on mapping a node to its predecessors:

"""
If the optional graph argument is provided it must be a dictionary representing a directed acyclic graph where the keys are nodes and the values are iterables of all predecessors of that node in the graph (the nodes that have edges that point to the value in the key). 
"""

but it could perhaps be usefully emphasized that "the usual" graph representation in Python maps a node to its successors instead.

The stuff about "direction" continues to make no sense to me, though. The class computes the (forward!) topsort of the graph passed to it, given that the graph is presented as mapping nodes to predecessors. It's only "reversed" if you refuse to believe the docs, and insist that `graph` is mapping a node to its successors. But it's not.

>>> import graphlib
>>> ts = graphlib.TopologicalSorter({'A' : ['B']})
>>> list(ts.static_order())
['B', 'A']

Nothing is "reversed". The argument passed is the predecessor form of the graph B -> A, and [B, A] is the forward topsort of that graph.
msg411074 - (view) Author: David Mc Dougall (dam1784) * Date: 2022-01-21 02:31
> The argument passed is the predecessor form of the graph B -> A

where graph = {'A' : ['B']}

This is part that I'm objecting to. The form of the graph should be A -> B, not B -> A.

The issue with the current form is that you can not traverse the graph, at least not forwards. When I say traverse forwards I mean that you follow the edges in the direction of the arrows. If you look up 'A' in the current graph you get  all of the nodes that point  *to* A, but that doesn't help you get *from* A to anywhere else.

There are two conventions:
1) Graphs should be traverse-able, by following the arrows.
2) Topological sorting makes the arrows point to the right.

Convention #1 was broken to satisfy convention #2.

What is important about the topo-sort is that it makes all of the edges point in the *same* direction. It doesn't actually matter which direction that is. And credit where due, the library picked the more-useful direction. It was a pragmatic choice, driven by the real use case of dependency resolution.

But having the graph with arrows pointing in the wrong direction is going to cause endless confusion.

For an example of the confusion this causes, look at the API itself. The "add" method explicitly calls out the fact that you can add nodes with no predecessors. This is obviously also possible with the graph argument as it currently is.

> It is possible to add a node with no dependencies (predecessors is not provided)
   https://docs.python.org/3/library/graphlib.html#graphlib.TopologicalSorter.add
msg411078 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2022-01-21 04:04
I think you should give up on this. But suit yourself ;-)

Exactly the same information is conveyed whether representing a graph by successor or predecessor dicts. Some operations are more convenient in one representation than the other, but each is easily derived from the other.

For your

> The issue with the current form is that you can not traverse
> the graph, at least not forwards.

is the equal and opposite complaint that with "your" (successor) form, you cannot traverse the graph, at least not backwards.

What of it? _Some_ packages maintain both predecessor and successor dicts in lockstep. For example, Kosaraju's elegant algorithm for finding strongly connected components requires efficient traversal in both directions.

The topsort algorithm couldn't care less how you want to represent your graphs. But it does have a specific representation it requires if you want to use the optional `graph=` constructor argument. It's documented and works fine if used as documented.

> What is important about the topo-sort is that it makes all of the
> edges point in the *same* direction.

Not following that. topsort applies to directed graphs, where the meanings of "predecessor" and "successor" are universally agreed upon. By definition, a topsort must begin with an element having no predecessors. The directions of the arrows are crucial to that.

> It doesn't actually matter which direction that is. And credit where due, the
> library picked the more-useful direction. It was a pragmatic choice, driven
> by the real use case of dependency resolution.

Not really. It was driven by the definition of what "topological sort" means. Most of the API was actually driven by making it easy to use correctly (race-free, deadlock-free) in a dynamic (there's no way to guess from one run to the next which order will end up being used) multiprocessing context, something few topsort packages even attempt to address.

> But having the graph with arrows pointing in the wrong direction is
> going to cause endless confusion.

Starting when? ;-) Seriously, this is the first complaint I've seen, and you're not actually confused.

> For an example of the confusion this causes, look at the API itself. The "add"
> method explicitly calls out the fact that you can add nodes with no predecessors.

And what about that is confusing? If every node has to have a predecessor, a topsort can't even exist!

More generally, any usable implementation of "directed graph" has to allow for isolated nodes too. Historically, the docs pointed that out to contrast it with other APIs under consideration that required special tricks to convince the function that a graph had isolated nodes. In this API, it's dead easy: just add the node with no predecessors.

> This is obviously also possible with the graph argument as it currently is.

Of course. Are you going to seriously contend this is "a problem", but the obvious workalike for successor dicts would not be "a problem"? (In a successor dict, it's equally easy to specify a node with no successors. Good! For a total ordering to exist, there must be at least one node with no predecessor and at least one with no successor - these aren't "problems", they're essentials.)

In any case, the docs say what was intended and implemented from the start: the optional `graph` argument is a predecessor dict representation. You're not required to like that decision, but there's no case here - to my eyes - made for saying "it's a bug".
msg411085 - (view) Author: David Mc Dougall (dam1784) * Date: 2022-01-21 05:30
> you're not actually confused.

I was when I first read it!


> the meanings of "predecessor" and "successor" are universally agreed upon

I disagree. The universally agreed upon terms are "directed edge u -> v". It's not obvious if the "predecessor" should be the start or the end point of the edge, and this is why the docs explicitly state the edge direction:

> If the optional graph argument is provided it must be a dictionary representing a directed acyclic graph where the keys are nodes and the values are iterables of all [...] the nodes that have edges that point to the value in the key.
msg411086 - (view) Author: Dennis Sweeney (Dennis Sweeney) * (Python committer) Date: 2022-01-21 05:35
> It's not obvious if the "predecessor" should be the start or the end point of the edge

https://en.wikipedia.org/wiki/Directed_graph#Basic_terminology :
"""If a path leads from x to y, then y is said to be a successor of x and reachable from x, and x is said to be a predecessor of y."""
msg411087 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2022-01-21 05:46
>> the meanings of "predecessor" and "successor" are
>> universally agreed upon

> I disagree.

I can post literally hundreds of citations that all agree: in u -> v, u is a direct predecessor of v, and v is a direct successor of u.

Here's one:

http://pages.cs.wisc.edu/~vernon/cs367/notes/13.GRAPH.html

Here's another from a Python context:

https://networkx.org/documentation/stable/reference/classes/generated/networkx.DiGraph.predecessors.html

"""
A predecessor of n is a node m such that there exists a directed edge from m to n.
"""

On & on & on. Hence my "universal".

Can you link to any that disagree?

As to the meaning of "point to", in "u -> v" it's obvious that the arrow points _from_ u _to_ v. I very strongly doubt you can find a credible source disputing that either.
msg411119 - (view) Author: David Mc Dougall (dam1784) * Date: 2022-01-21 12:32
I can post literally hundreds of examples of directed graphs that are traversable in the forward direction. This might be the only one which is *only* traversable backwards.


> As to the meaning of "point to"

Here is one: If I have a pointer in memory, I might represent that with an arrow,
Like: ADDRESS -> VALUE

Or if I have a dictionary I might write:
KEY -> VALUE

But in the graph the edges are directed the opposite way:
KEY <- VALUE

The edges in the graph point in the opposite direction as the underlying memory pointers. This is unexpected and confusing.
msg411161 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2022-01-21 17:36
Now you may be getting somewhere ;-) Complaining about the docs wasn't getting traction because they're using standard terminology with the standard meanings, and tell the plain truth about what the class requires and delivers.

You wish it required something else instead. Fine: be direct about that! Suggest a feature enhancement rather than contorting the docs to pretend the class "really" requires what you _wish_ it required and is computing "the reverse" of what it claims to be computing. That won't be accepted  because it's not true.

How about, e.g., suggesting instead a new optional constructor argument, like `successor_graph=False`, which can be passed as `True` to say that the `graph` argument is a successor dict rather than the default predecessor dict?

I wouldn't oppose that, and it would be easy to implement (indeed, way back when here I already posted code to do it).
msg411179 - (view) Author: David Mc Dougall (dam1784) * Date: 2022-01-21 20:06
No, the code works fine. I just wish the docs weren't so muddled.

I wish the docs started by saying:

> The graph is a dict of {'start_node': ['end_nodes',]}
> The topological sorter puts the end_nodes before their start_nodes.
   [note: this is what the code currently does]

Then the docs could introduce and use different terminology: "tasks" and their "dependencies".
Everyone understands what those are.


And nowhere does it need to say the word "predecessors".


But honestly, I think I'm going to throw in the towel and give up on this endeavor.
msg411183 - (view) Author: Dennis Sweeney (Dennis Sweeney) * (Python committer) Date: 2022-01-21 20:45
I think I can summarize:

Tim + Current Docs:

* static_order to return [A, B]
* Conceptual/abstract directed graph direction is A --> B
* A is a predecessor of B
* predecessor mapping to be passed is {B: [A]}

David suggests:

* static_order returns [A, B]
* Conceptual/abstract directed graph direction is B --> A
* mapping to be passed is {B: [A]}

It seems David places more value on the idea of the concrete mapping "pointing forwards" with respect to the abstract directed graph, while it seems Tim places more value on the idea of the abstract mapping direction corresponding to the final static order. These two goals are in conflict, assuming we don't want to change the behavior.

I think I agree more with Tim, since abstract graphs can be represented in lots of ways (set of pairs? adjacency matrix or its transpose? adjacency list/dict? Bi-directional mapping?), so the translation from the abstract to the concrete is a decent place for the "reversal".

It's very very standard for a topological sort to be conceptualized as taking a picture like
    A → B
    ↓   ↓
    C → D
and transforming it into a list/iterator like [A, B, C, D], and I think the conceptual consistency there should be preserved, even if the concrete representation of the abstract graph "points backwards".
msg411200 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2022-01-21 22:15
Perhaps you've overlooking something so obvious to the module authors that I haven't thought to mention it?

The purpose here is to compute a linear order. Now not even you ;-) can pretend to be confused about what "predecessor" and "successor" mean in a linear order. There's no graph involved in the final result (unless you mentally impose a trivial linear graph on the result sequence).

And that's how to view the `graph` argument _as intended_. It maps an element to the elements that must precede it _in the result_. The element's mandatory predecessors in any acceptable imposed total ordering.

From that intended view, this is just waaaay off base:

> I wish the docs started by saying:
>
> The graph is a dict of {'start_node': ['end_nodes',]}
> The topological sorter puts the end_nodes before their start_nodes.
>   [note: this is what the code currently does]

The words "start" and "end" there would be clearer as "banana" and "beeswax" ;-) End of what? Start of what? The "start" node has to appear _after_ all the "end" nodes? In what other context does the "start" appear after the "end"?

As is, the code puts the "start" node after all its declared mandatory predecessors. Exactly as the word "predecessors" suggests should happen.


[Dennis]
> David suggests:
>
> * static_order returns [A, B]
> * Conceptual/abstract directed graph direction is B --> A
> * mapping to be passed is {B: [A]}

IF that's what he's suggesting (I don't know), that's not a matter of preference, it's just plain wrong. The only possible topsort of the DAG  B --> A is the sequence [B, A]. For which see absolutely any text defining the terms.

There are many ways the "mapping to be passed" could be defined. David is certainly right that, by convention, most graphs in Python are represented by successor dicts. But he apparently doesn't want an option for the topsorter to accept such a dict either. (An irony in passing: under the covers, the class converts the predecessor-dict `graph` argument to a successor-dict representation.)

At this point, I not only don't know what he wants, his position gets less clear to me over time.
msg411215 - (view) Author: David Mc Dougall (dam1784) * Date: 2022-01-22 00:38
> It seems David places more value on the idea of the concrete mapping "pointing forwards" with respect to the abstract directed graph, while it seems Tim places more value on the idea of the abstract mapping direction corresponding to the final static order. These two goals are in conflict, assuming we don't want to change the behavior.

Yes, that's right. But the good news is that if you're willing to rewrite all of the documentation you probably can explain this in a clear and simple way, and without breaking compatibility.

I say *you* because I'm not going to argue with you all about this anymore, especially with Tim being...

---

Tim: you're conflating the words "predecessors" and "dependency".
In some contexts they can be synonymous, but they are not the same thing.
 * Predecessor refers to one of the sides of a directed edge.
 * Dependency refers to a semantic relationship between two of the users' things.



> The only possible topsort [...] For which see absolutely any text defining the terms.

From wiki: "Precisely, a topological sort is a graph traversal in which each node v is visited only after all its dependencies are visited."

This definition doesn't say anything about the "predecessors" or how the graph is stored,
or anything about "edge direction". I like this definition.



> that's not a matter of preference, it's just plain wrong

I know that there are many different ways to represent a graph, but your graph format *is just plain wrong.*

Goodbye
msg411216 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2022-01-22 01:16
> I know that there are many different ways to represent
> a graph, but your graph format *is just plain wrong.*

Yet when I offered to support an option to support the graph format you insist is uniquely "right", you poo-poo'ed the idea. So what is your real complaint? I don't know. Here you once again complain about the graph format, after rejecting an idea to address that directly.

Sorry, but you don't make sense to me. 

> if you're willing to rewrite all of the documentation you
> probably can explain this in a clear and simple way, and
> without breaking compatibility.

I don't see that happening. The docs are quite clear to me already, but I recognize that I'm too close to them to be objective about that. Instead I'm going on ample evidence from mailing lists, blogs, and StackOverflow that yours is the first complaint I've seen that the basic functionality here is any way unclear or ill-documented. People pick it up and use it productively at once. Nobody else has appeared to be confused.

Where things get unclear is limited to novel applications of multiprocessing. But people always stumble over multiprocessing, even for something as simple and bulletproof as a multiprocessing.Queue. Comes with the territory.

Finally, the distinction between "predecessors" and "dependencies" here strikes me as being uselessly pedantic. Nobody on the planet has an actual problem with recognizing that, e.g., "a" is predecessor of "n" in the string "pedantic". That's so in both technical language and plain English. Same here.

Still, if you want to rewrite the docs, go for out. I'll stay out of it completely.
History
Date User Action Args
2022-04-11 14:59:53adminsetgithub: 90229
2022-01-22 04:18:07dam1784setstatus: open -> closed
stage: patch review -> resolved
2022-01-22 01:16:15tim.peterssetmessages: + msg411216
2022-01-22 00:38:11dam1784setnosy: - dam1784
2022-01-22 00:38:03dam1784setmessages: + msg411215
2022-01-21 22:15:53tim.peterssetmessages: + msg411200
2022-01-21 20:45:33Dennis Sweeneysetmessages: + msg411183
2022-01-21 20:06:54dam1784setmessages: + msg411179
2022-01-21 17:36:19tim.peterssetmessages: + msg411161
2022-01-21 12:32:01dam1784setmessages: + msg411119
2022-01-21 05:46:25tim.peterssetmessages: + msg411087
2022-01-21 05:35:01Dennis Sweeneysetnosy: + Dennis Sweeney
messages: + msg411086
2022-01-21 05:30:24dam1784setmessages: + msg411085
2022-01-21 04:04:20tim.peterssetmessages: + msg411078
2022-01-21 02:31:16dam1784setmessages: + msg411074
2022-01-21 01:37:32tim.peterssetmessages: + msg411068
2022-01-21 00:26:51dam1784setmessages: + msg411062
2022-01-20 23:42:51dam1784setmessages: + msg411060
2022-01-20 21:57:14tim.peterssetmessages: + msg411047
2021-12-30 13:40:20dam1784settitle: Graphlib documentation -> Graphlib documentation (edge direction)
2021-12-21 15:00:56dam1784setpull_requests: + pull_request28446
2021-12-17 18:49:19eric.araujosetassignee: docs@python ->

nosy: - docs@python
components: + Library (Lib), - Documentation
versions: - Python 3.9, Python 3.10
2021-12-14 15:49:56eric.smithsetnosy: + tim.peters, rhettinger, eric.smith
2021-12-14 15:22:49python-devsetkeywords: + patch
nosy: + python-dev

pull_requests: + pull_request28324
stage: patch review
2021-12-14 15:20:03dam1784create