In game programming we often want to work with x,y coordinates and transform them in various ways. This article covers:


introduce: x,y (use pixel coordinates)
In most systems, 0,0 is on the top left, but some systems, 0,0 is on the bottom left

Camera pan

introduce: translation

A translation in geometry slides the coordinates left, right, up, down. For example, if we want to make 0,0 the center of the screen we can slide the map down and right:

world → translate(      ,      ) → screen

The code for this is simple:

screen.x = world.x + 
screen.y = world.y +       

World and view coordinates

There can be more than one coordinate system in a game. Here there are world x-coordinates in blue and screen x-coordinates in red. They’re the same until you scroll the map:

For any position world.x in the world we can calculate the position on the screen:

world.x = 
screen.x = world.x + 

Camera scrolling with player

Here’s an example with a player:

As before, for any world position world.x we can calculate the position on the screen:

screen.x = world.x + 
screen.y = world.y

The interesting thing here is that when the player moves right, the world moves to the left. Why is this?

When the camera follows the player, the player’s screen position will remain the same. In this case I want the screen position of the player to be at 250. Let’s work through the math:

screen.x = 250      
screen.x = world.x + 

# therefore
world.x = 250 -  = 

Notice the minus sign? When you move the player to the right, that means world.x is increasing. We know screen.x is fixed at 250. { not sure how to explain this yet } When the world moves to the right, the player moves to the left!

This is tricky so let’s pause. {another example?}

{ “natural” scrolling on ipad vs the traditional scrolling on Windows reflects this difference -- are you moving the world or the view? will minimap help explain this? }

{minimap would be a reason to use “view” instead of “screen” -- or maybe we introduce view vs screen somewhere else -- whereas regular view is all of the view coordinates and a part of the world coordinates, the minimap shows all of the world coordinates and part of the screen is the view coordinates}

The fundamental relationship between the two is: view + camera = world. You can also express this as view = world - camera. In this demo, the camera is centered on the player character.

{ should the narrative introduce scrolling first, and then camera pan / recentering? } { is this a good time to introduce reversing a transform? } { separate diagram to focus on camera object? maybe better in the minimap }

Camera zoom

introduce: scale
world → scale(      ) → screen
screen.x = world.x * 
screen.y = world.y *       
{ addition becomes translate() for geometry; multiplication becomes scale() for geometry }

Object coordinates

May be useful to show translate, rotate, scale for an object on the map, or the player sprite

World and view coordinates, again

introduce: chaining
world → translate(      ,      ) → r → scale(      ) → screen
r.x = world.x + 
r.y = world.y + 
screen.x = r.x * 
screen.y = r.y * 
order matters! {show the other order}

Mouse clicks

introduce: reversal; reinforce: chaining
world ← translate(      ,      ) ← r ← scale(      ) ← screen

The last thing we did (scale) is the first thing we undo; the first thing we did is the last thing we undo. The order is reversed.

#Object zoom

reinforce: scaling, chaining

#Camera rotation

introduce: rotate
world → rotate(      ) → screen
{ but we might want to rotate around the center of the screen → let’s translate first } → reinforce chaining
q.x = p.x * cos() + p.y * sin()
q.y = p.y * cos() - p.x * sin()
{ How about reversal? }

Object rotation

reinforce: rotation, chaining

#Oblique projection

introduce: skew
world → skew(      ) → screen
screen.x = world.x + world.y * tan()
screen.y = world.y

(NOTE: it’s unclear to me exactly what the difference between skew and shear is but I’m guessing that skew is expressed as angles and shear is expressed with the tangent of the angle, and they become the same thing in the end.)

Isometric projection

reinforce: shearing, rotation, chaining, reversal;
world → rotate(   45°   ) → r → scaleY(      ) → screen
r.x = world.x * cos(45°) + world.y * sin(45°)
r.y = world.y * cos(45°) - world.x * sin(45°)
screen.x = r.x
screen.y = r.y *

Aspect ratio

With a wide variety of screen aspect ratios, how do you make your game world fit?

{Cover extending world, black bars, and inner/outer boundaries. Show them all visually}

#Aspect ratio

Especially for mobile games, your aspect ratio may not be the same as the aspect ratio of the screen (or window). You can either preserve the aspect ratio by adding black bars to the top/bottom (“letterboxing” in movies) or left/right (“pillarboxing). Alternatively, you can scale your content.

#OpenGL coordinates

OpenGL sets up the coordinate system to be -1 to +1 along both axes. Your screen/window likely isn’t square. How should we think about this?

OpenGL is applying a transform to the coordinates you give to it. It applies both scale and translate.

{ show the transform }

You need to take into account what OpenGL is doing to your coordinates. If you draw a square, it won’t look square on the screen because OpenGL is scaling your coordinates. You will want to scale your own coordinates first before passing them to OpenGL. { work through the math }

Hexagonal grids

All of these operations chain together. If you’re using hex grids, you can use a hex-to-cartesian operation (and its inverse, cartesian-to-hex) along with any of the others listed above. For example, if you want isometric hex grids, you would chain hex-to-cartesian with the isometric operations (rotate and scaleY). Mouse clicks would be processed in reverse: invert scaleY, then invert rotate, then invert hex-to-cartesian (by using cartesian-to-hex).


motivation: pattern for all of the transformations; introduce: matrix

Matrices aren’t transforms. Matrices are representations of transforms.

Matrices allow you to optimize a bit. A chain of transforms might be q = f(g(h(p))). In math we can “compose” functions. We can combine f, g, h ahead of time into q = (f o g o h)(p). We don’t have a way to compose functions at run time in most programming languages we use.

All of our transforms happen to be representable as matrix multiplies, q = F * G * H * p. Matrix multiply is associative so F * (G * (H * p)) = ((F * G) * H) * p. By representing our functions as matrix operations, we can compose them ahead of time. Instead of applying a long chain of 8 operations to every point p, we can first combine those 8 operations into 1, and then apply that to p. This is common in 3D programming, and there is both CPU and GPU acceleration for 4x4 matrix operations.

u,v vectors; show how they are transformed; show how they are in the matrix; show a unit circle too

More reading on matrices: