To **traverse** a graph is to visit every node and/or edge systematically.

This seems boring, but it’s actually an important part of many things we want to do with graphs: finding connected components, finding paths between nodes, calculating graph statistics, and much more. Even “finding paths between nodes” is useful for an incredible number of problems, from Google Maps to internet routing, and even tasks as plainly statistical as kernel density estimation can be phrased in terms of traversals of graphs (or trees).

There are several strategies for traversing a graph, each with different useful properties.

Start with an undirected, simple graph.

Q: If I showed you a large, undirected, simple graph and asked you to count the nodes exactly, how would you do it?

## Strategies #

A traversal strategy produces a sequence of nodes and edges, with each node and edge listed exactly once. There are two, very commonly used strategies and one general strategy:

**Breadth-First Search**(BFS)- visit all neighbors of the current node
before visiting any of
*their*neighbors. **Depth-First Search**(DFS)- visit all neighbors of the next visited node
before visiting the other neighbors of the
*current*node. **Priority Search**- visit nodes in priority order, adding neighbors at each stage

These strategies differ in how the sequence of nodes is handled.

## Aside: Stacks, Queues, and Priority Queues #

A brief refresher from our discussion of data structures.

### Stack #

A **stack** is a data structure for processing objects in a
*Last In-First Out* manner. The most recent object added is said to be
at the “top” of the stack.

Stacks support two primary (and one optional) operations:

`push(obj)`

- add object to the top of the stack
`pop()`

- remove the object from the top of the stack and return it
`peek()`

- return the object from the top of the stack without removing it

It is an error to pop an empty stack.

### Queue #

A **queue** is a data structure for processing objects in a *First In-First Out* manner.
The most recent object added is said to be at the “back” of the queue;
the next object to be processed is said to be at the “front” of the queue.

Queues support three primary operations:

`enqueue(obj)`

- add object to the back of the queue
`dequeue()`

- remove the object at the front of the queue and return it
`is_empty()`

- do any objects remain in the queue?

It is an error to dequeue and empty queue. (Don’t confuse dequeue with deque,
a different but related data structure.) Sometimes the `front()`

operation is
provided, which like `peek()`

, looks at the front object without removing it.

### Priority Queue #

A **priority queue** is a data structure for processing objects
in a specialized order determined by priorities attached
to the objects. Objects are processed from highest to lowest
priority. A priority queue generalizes stacks and queues.
A priority queue with priorities increasing as objects
are added gives a stack; a priority queue with priorities
decreasing as objects are added gives a queue.

Priority queues support three primary operations:

`extract_highest()`

- remove the object of highest priority and return it
`insert(obj, p)`

- add object with specified priority
`p`

`is_empty()`

- do any objects remain in the queue?
`peek_highest()`

- examine but do not remove highest priority object

## Traversal States #

Let’s describe a *general* strategy for traversing a graph. This general
strategy traverses a graph in any order, and can be adjusted to do whatever
task you want to do: find a graph, calculate statistics of the graph, do
coloring…

During traversal, we will assign nodes two possible states:

`fresh`

- the node has not yet been considered or processed
`processed`

- the node has been visited and its neighbors added to the queue/stack to consider

Because the state of an edge can be inferred from its surrounding nodes, it is usually sufficient to keep track of the node states.

We will maintain a **traversal state** object that contains the current state of
all nodes and a record of the history of traversal, including from what node
was each node visited, the “times” at which each node is processed, and an
indicator (which we can set) of whether the search should stop.

As part of this state, we will also track an arbitrary **accumulator**
object, which we will use to store any data or results that
we accumulate over the course of the traversal.

## Traversal Power-Ups #

To make the traversal flexible, we will pass two *processing
functions* to the algorithms:

`on_node(graph, node, state)`

- called on fresh nodes when they are visited
`on_edge(graph, from_node, to_node, state)`

- called on edge when first traversed

These will be able to access the current node or edge as well as the current traversal state. By specifying difference choices of these functions, we can make our graph traversal algorithm perform many different tasks.

## A Traversal Template #

```
from TraversalState import TraversalState
def noop(graph, node, state):
return state
def traverse(graph, start, acc, on_node=noop, on_edge=noop, state=None):
if state is None:
state = TraversalState(graph.nodes(), acc)
# We'll use a queue, stack, or priority queue, depending on the order of
# traversal we want to use
remaining_nodes = Queue_or_Stack()
remaining_nodes.insert(start)
while not remaining_nodes.is_empty() and not state.finished:
current_node = remaining_nodes.pop()
# Don't want to visit nodes twice, e.g. if there are several edges
# pointing in to them
if state.fresh(current_node):
state.process(current_node, time)
state = on_node(graph, current_node, state)
for neighbor in graph.neighbors(current_node):
if state.fresh(neighbor):
remaining_nodes.push(neighbor)
state = on_edge(graph, current_node, neighbor, state)
return state
```

## Breadth-First Search (BFS) #

We will maintain a **queue** of nodes, initialized with the start node.
We successively dequeue a node and process it and enqueue all
of that nodes fresh neighbors (processing all the edges along the
way).

Inputs:

`graph`

- an undirected graph
`start`

- a node at which to start the search
`acc`

- an accumulator object of arbitrary type
`on_node(graph, node, state)`

- a function called when a node is visited
`on_edge(graph, from_node, to_node, state)`

- a function called when an edge is traversed
`ts`

- a traversal state object (newly initialized if
`None`

)

Output:

- An updated traversal state
`ts'`

The algorithm is as above, using a **queue**.

Assume we also have a functions `bfs_all`

that calls `bfs`

on successive fresh nodes, updating the same traversal
state and returning it.

```
def bfs_all(graph, acc, on_node=None, on_edge=None):
ts = None
for node in graph.nodes():
if ts is None or ts.fresh(node):
ts = bfs(graph, node, acc, on_node, on_edge, ts)
return ts
```

Question: What if we used this as the `on_edge`

function, meaning we called `bfs`

with `on_edge=track_parents`

? Suppose that `state.parents`

starts as an empty
dictionary.

```
def track_parents(graph, from_node, to_node, state):
state.parents[to_node] = from_node
return state
```

### Exercises on using BFS #

You have a function

`def inc(graph, node, ts): ts.accumulator += 1`

and call

`tstate = bfs(start, 0, on_node=inc)`

. What is`tstate.accumulator`

after the call?- You have a function

`def inc_if_blue(graph, node, ts): props = graph.get_node_properties(node) if props['color'] == 'blue': ts.accumulator += 1`

and call

`tstate = bfs(start, 0, inc_if_blue)`

. What is`tstate.accumulator`

after the call?You have a function

`def parents(graph, node, ts): parent = ts.parent[node] my_name = graph.get_node_properties(node, 'label') if parent: p_name = graph.get_node_properties(parent, 'label') else: p_name = None ts.accumulator[my_name] = p_name`

and call

`tstate = bfs(start, {}, parents)`

. What is`tstate.accumulator`

after the call?- You have a function

`def blue_labels(graph, node, ts): props = graph.get_node_properties(node) if props['color'] == 'blue': ts.accumulator.append(props['label'])`

and call

`tstate = bfs(start, [], before_node=blue_labels)`

. What is`tstate.accumulator`

after the call?Write a function

`find_path(parents, start, end)`

that takes the BFS tree (through the parent pointers) and returns a list of node IDs giving a path from`start`

to`end`

, or`None`

if there is no such path.Configure BFS to find the

**connected components**of a graph, the sets of nodes such that within each set there is a path between any two nodes.Configure BFS to determine if the graph can be

*two-colored*, meaning that we can assign one of two colors to every node without two nodes of the same color sharing an edge between them. A two-colorable graph is said to be**bipartite**. Find the two coloring or return None/null/NA if the graph is not bipartite.

## Depth First Search (DFS) #

In contrast to BFS, in DFS we will maintain a **stack** of nodes,
initialized with the start node.

For our algorithm, we take the inputs:

`graph`

- an undirected graph
`start`

- a node at which to start the search
`acc`

- an accumulator object of arbitrary type
`on_node(graph, node, state)`

- a function called when a node is visited
`on_edge(graph, from_node, to_node, state)`

- a function called when an edge is traversed
`ts`

- a traversal state object (newly initialized if
`None`

)

We output an updated traversal state `ts'`

.

We successively pop a node and process it and push all of that node’s fresh neighbors (processing all the edges along the way). There is a recursive logic to DFS: for each fresh neighbor, call DFS on it (maintaining state).

```
DFS(start):
for neighbor in neighbors(start):
if neighbor is FRESH:
DFS(neighbor)
```

Wait, where’s the stack?

Alternately, we can explicitly use a stack, looping until the stack is empty. We reuse the template shown above, but with a stack.

Again, suppose we have `dfs_all`

which continues searching until no fresh nodes
are found:

```
def dfs_all(self, acc, on_node, on_edge):
ts = None
for node in self.nodes():
if ts is None or ts.fresh(node):
ts = self.dfs(node, acc, on_node, on_edge, ts)
return ts
```

### Example Configuring DFS #

Task: Print traversal history as DFS runs

Basic idea: Print each node as it is being visited and processed, and print each edge as it is being traversed. Here, we will use node labels to keep track.

Task: Detect cycles in a graph with DFS.

Basic Idea: In an undirected graph, if there are no back edges, we have a tree – hence, no cycles. But any back edge creates a cycle. So, look for back edges.

### DAGs and Topological Sort #

A **topological sort** of a DAG is a linear ordering of the DAG’s nodes
such that if \((u,v)\) is a directed edge in the graph, node \(u\) comes
before node \(v\) in the ordering.

Given a DAG, how do we use DFS to do a topological sort?

### Other Applications and Exercises #

- Configure
`dfs`

to count the number of “descendants” of a node. - Configure
`dfs`

to compute a path between two nodes.

### Application: Directed Graphs and Strongly Connected Components #

A directed graph is strongly connected if there is a *directed* path
between any two nodes.

The strongly connected components of a directed graph are its maximal, strongly connected subgraphs.

We can find the strongly connected components with two DFS’s.
For an arbitrary node *v*, the graph is strongly connected if
we can find a directed path from *v* to any other node *u* **and**
a directed path from any node *w* to *v*.

Let G be a directed graph and let G’ have the same nodes
and edges with all the edges *reversed*. Pick an arbitrary
node *v*.

The algorithm for *detecting* strong connectivity is
basically as follows:

- Do
`DFS(G, v)`

. - If the traversal does not contain all nodes, then
there are nodes we cannot reach from
*v*. Hence, G cannot be strongly connected. - Do
`DFS(G', v)`

. - If this traversal does not contain all nodes, then
there are nodes in G from which we cannot reach
*v*. Hence, G cannot be strongly connected.

To find the strongly connected components, we just do a little processing.

In step 1, record the `processed_times`

.
In step 3, do `DFS_ALL(G')`

with the nodes
ordered as the reversal of the `processing_times`

.

## Priority Search #

Priority search operates just like BFS and DFS, the only difference is that each time we visit a node, we add its neighbors to the queue with associated priorities

Exercise: How do we modify BFS/DFS to get priority search?