I wrote some notes for anyone implementing Breadth First Search as described on my Tower Defense Pathfinding page. However, these notes are out of date. See this code instead for both Python and C++ versions.

## Internal state#

The article uses *external* state for the visited/came_from/distance sets and hash tables. An alternative is to use *internal* state, where the data is kept inside the graph node data structure. Why use internal vs external data? Although internal is slightly faster (avoids the hash table), external is cleaner, by not modifying the graph data structure when you search. It also supports multiple simultaneous searches, whether multithreaded or with coroutines. Here’s an example node with an internal visited flag:

class Node: def __init__(self, name): self.visited = False self.name = name self._neighbors = [] def neighbors(self): return self._neighbors

And here’s a sample graph:

A = Node("A") B = Node("B") C = Node("C") D = Node("D") E = Node("E") A._neighbors = [B] B._neighbors = [A, C] C._neighbors = [D] D._neighbors = [E] E._neighbors = [] start = A

If we’re using internal state, we have to reset `visited`

back to False in order to run the algorithm again. We can do that *before* the algorithm runs:

ALL_NODES = [A, B, C, D, E] frontier = Queue() frontier.put(start)for node in ALL_NODES:node.visited = Falsestart.visited = True while not frontier.empty(): current = frontier.get() callback(current) for next in current.neighbors(): if not next.visited: next.visited = True frontier.put(next)

Or we can reset *after* the algorithm runs, by keeping a list of all nodes we’ve visited:

frontier = Queue() frontier.put(start) start.visited = Truevisited_nodes = [start]while not frontier.empty(): current = frontier.get() for next in current.neighbors(): if not next.visited: next.visited = Truevisited_nodes.append(next)frontier.put(next)for node in visited_nodes:node.visited = False

Why choose one or the other? It doesn’t matter much here because we visit all nodes. If you’re visiting all nodes, the first approach is slightly faster. However, if you’re using *early exit* then you’re not visiting all nodes, and the second approach can be much faster. The first way has to go through *all* nodes every time. The second way only goes through the nodes that were visited.

## Node IDs

(Note: most of you won’t need this optimization.) For the hash table approaches to work, your nodes need to be hashable. An alternative is to give each node an integer ID. You can then use a bit array to store `visited`

, and a regular array of ints to store `distance`

and `came_from`

.

If you’re using C or C++, you can even get away with leaving the distance and came_from arrays uninitialized (potentially a big win!). You only have to initialize the bit array, which squeezes 64 ids into a single (long) int. Only if the bit is set does the value in the distance or came_from array get initialized. You can either put the distance/came_from arrays on the stack, if you have enough stack space, or you can use statically allocated arrays that aren’t re-initialized after each search. Be careful though to weigh the cost of initializing the visited bit array versus the cost of using a hash table. If the fraction of nodes visited in each search is small, it may be better to use the hash table.