A* Heuristics

from Red Blob Games
DRAFT
13 Dec 2018

{{ *None of this is specific to grids but I'm using grid examples* :-( }}

The A* heuristic is an estimate on the remaining cost to the goal. A* uses it to decide which paths to evaluate first. When the heuristic is zero, A* has no information, and explores all directions equally. When the heuristic is exact, A* has complete information, and explores the best paths first. The higher the value, the fewer nodes A* will explore, but if it goes above the exact cost, A* may not find the shortest path. On game maps we usually choose distance because it's easy to implement and gives a reasonable estimate.

{{ need separate diagram for 0-1 on exact heuristic and 0-2 on distance heuristic }}

{{ if I want to show a chart with the tradeoff between explored space and path length, I think I need to precompute the results, which means I need to make the start and end point fixed. or maybe I can do it in webworkers }}
{{ can I "ladder up" by making the boolean explored/not into a color scale showing which areas are explored at each heuristic multiplier? }}

If the exact heuristic is so great, why don't we use it? We don't use it because it's expensive. First, we have to calculate it using Floyd-Warshall or Johnson's Algorithm, and then we have to store it in an N ✕ N table, where N is the number of nodes. If N is small, you wouldn't need to use A*. Instead, you'd save the pathfinding table. You only need A* when N is large. A large map might be over 100,000 nodes, which would require an array of 10,000,000,000 integers (40 GB).

Most of us using A* on maps will use distance heuristics. They're cheap and work reasonably well. They use 0 storage.

I'm going to write cost(A,B) for the cost of the shortest path from A to B. In simple cases, the cost is the length of the path, but we can also use A* for variable cost movement. For example, in Civilization, moving on grassland/desert tiles costs 1 and moving on forest/hill tiles costs 2.

I'm going to write distance(A,B) for the map distance between A and B. The main difference between map distance and cost is that map distance ignores obstacles and slow movement tiles (like forests). On a map with grid movement, we can use a grid distance function. For 4-way square tile movement, use Manhattan distance; for 8-way tile movement, use Diagonal distance; for 6-way hex grids, use Hex distance.

{{ diagram to show the difference between cost and distance – show N, G, the shortest path, its cost, and the distance }}

The graph cost and map distance may be measured in different "units". To construct a heuristic, we need to convert the map distance into the same units as the graph cost. Examples:

• If each step on a grid takes 10 movement points, and the map distance is measured in tiles, then the heuristic would be map_distance * (10 movement points / tile).
• If movement cost on a map is measured in hours, and the map distance is measured in kilometers, and the speed is 20 kilometers/hour, then the heuristic would be map_distance ÷ (20 kilometers/hour).

Keeping the units in mind helps me find bugs.

Grids#

In a game that uses grid movement, we can specialize the distance function to match the type of movement.

{{ diagram that lets you try different movement types (4-way, 8-way, 8-way with sqrt(2)) and different heuristics (manhattan, chebyshev, diagonal, euclidean, euclidean squared, exact}}

{{ I should probably color code the first chart so that slow vs optimal vs unoptimal are different colors, and then I can fill this table with the same colors }}

{{ chart for intuition about g/h relationship, and why they need to match: show x-axis as time on resulting path, y-axis being h, g, f values. The f value should rise over time. If you set heuristic to 0, h is always 0, f rises with g. If you set heuristic to exact cost, then f never rises. }}

{{ chart for intuition about performance: x-axis is g value, y-axis is number of nodes explored. When heuristic is 0, this will be y=x^2 I think, with few obstacles. When heuristic is exact, this will be a constant line x=1 or maybe x=4 depending on how I count. Keeping this line low is the key to performance, so the closer we get to exact, the better we do }}

TODO

1. Inadmissible heuristics, and whether you need to drop the reprioritize check
2. Tie breaking

Tricks

Tricks I used for these diagrams:

• Add a tiny random cost to each edge so that the shortest path is unique. Otherwise, when you change a parameter it might switch to a different path of the same cost. These ties occur a lot with grids. Normally it doesn't matter because all the paths are the same cost, but they do matter for aesthetics.
• Add a tiny bonus to the heuristic so instead of the priority being cost_so_far + heuristic, it's cost_so_far + heuristic*1.001. This means if there are lots of elements in the frontier with the same priority, it will favor the ones closest to the goal. Essentially it breaks ties by asking what greedy best first search would do.