#+TITLE: Probability and Games: Loot Drops
#+begin_export html
Let's see what it looks like with N = chests and p = chance of finding a gem. How many gems k do we find?
#+end_export Play with this. What's the range of $k$? It can be as little as $0$ drops and as much as $N$ drops. What's the most common outcome? It will be $k = N p$. How much variation is there? Try doubling the number of chests, but keep the probability the same. Notice that the variation decreases. { *TODO*: figure out better chart scaling, because right now as you drag p around, it rescales and you /completely miss/ that p is affecting the peak -- maybe pick a scale range based on p=0.02, and maybe average the calculated max with the desired max so that you get some rescaling } ** Probability 2 :PROPERTIES: :CUSTOM_ID: probability-2 :END: In some situations you know how many chests you're going to open, and want to know how many gems you're going to get. In other situations you know how many gems you need, and want to know how many chests you have to open. In the code, all I've done is change =chests < 10= to =gems < 5=: #+begin_src python def roll(): if random.uniform(0, 1) < 0.2: return 'gem' else: return 'empty' gems = 0 chests = 0 while gems < 5: chests += 1 if roll() == 'gem': gems += 1 #+end_src { *TODO* make a diagram where you can click to generate a new random sample k=5, p=0.2; N is added to histogram; after several clicks, add a button to generate more of samples, and change histogram from discrete units to bars, and add a button to generate lots more, and then reveal this text: } Each player will have a different experience. Lucky players will get all the gems opening only $N=5$ chests; unlucky players might have to open $N=20$ or more. This histogram shows how many players will have each type of experience. The way to read it is to pick the number on the x-axis that represents the number of chests opened, and then read the value from the y-axis that represents how many players experienced that. For example, ??13%?? of players had to open 10 chests to get their 5 gems. ** Statistics 2 :PROPERTIES: :CUSTOM_ID: statistics-2 :END: Instead of running our simulation thousands of times, we can calculate the histogram exactly. As before, $p$ is the probability that you'll find a gem in a chest. Let's call $X$ the number of chests we need to open to get $k$ gems. Then the probability that you have to open $N$ chests is called a /negative binomial distribution/ (in the special case of one gem, it's also called a /geometric distribution/): \[ P(X=N) = {N-1\choose k-1}p^k (1-p)^{N-k} \] #+begin_export htmlLet's see what it looks like for k = gems and p = chance of finding a gem. How many chests N do we open?
#+end_export Play with this. What's the range of $N$? It can be as little as $k$ chests and as much as $\infty$ chests. What's the most common outcome? It will be $N = k/p$. How much variation is there? ** Problem :PROPERTIES: :CUSTOM_ID: problem :END: The big danger here is that $N$ has no limit. Players have a chance of having an extremely high $N$, which will lead to an unpleasant playing experience. If there are lots of players, then it's likely that /some/ players will have a bad experience. Let's try to quantify this. How many players will have to open at most $j$ chests? \[ \sum\limits_{N=k}^{j} P(X=N) \] { *TODO*: diagram showing cumulative distribution } For example, even though the most common scenario is that the player has to open 10 chests, 3% of players have to open more than 50 chests. Ugh. Can we do better? Yes. But it means we have to change something about the way we're picking random numbers. ** Probability 3 :PROPERTIES: :CUSTOM_ID: probability-3 :END: One option is to /cap/ the number of chests needed. How can we do this? Here's one way to think about it: Let's say we want a cap of $j=20$ treasure chests, and you want to find $k=5$ gems. Let's make a set of chests and randomly shuffle them: { *TODO* diagram showing gems and chests, with shuffle button, and show how many chests have to be opened, with histogram } ** Statistics 3 :PROPERTIES: :CUSTOM_ID: statistics-3 :END: How many chests $N$ have to be opened? There's a hard cap of 20, but most of the time you have to open close to 20 before you get 5 gems. Is this related to hypergeometric? It seems so but I'm not sure how. { *TODO* diagram with controls for p, j, k } How much variance is there? It's a lot more when $k$ is a lot smaller than $j$. ** Implementation 3 :PROPERTIES: :CUSTOM_ID: implementation-3 :END: We could implement this system by randomly shuffling an array: #+begin_src python outcomes = ['gem'] * k + ['empty'] * (j-k) random.shuffle(outcomes) def roll(): return outcomes.pop() #+end_src Those of you have taken a probability course have seen this as "sampling without replacement" which is slightly different from "sampling with replacement". However, this requires us to keep a potentially large array. Can we do better? Yes. Instead of deciding ahead of time what each chest contains, we can keep track of how many gems need to be in the remaining chests, and decide on gem placement one chest at a time. #+begin_src python remaining_chests = j remaining_gems = k def roll(): global remaining_chests, remaining_gems p = remaining_gems / remaining_chests if random.random() < p: outcome = 'gem' remaining_gems -= 1 else: outcome = 'chest' remaining_chsts -= 1 return outcome #+end_src The probability that /this/ chest contains a gem depends on how many gems and chests remain. This approach generates the same distribution of events as the randomly shuffled array, but we have to keep only two numbers. ** Better implementation :PROPERTIES: :CUSTOM_ID: better-implementation :END: Looking at the last implementation, when you open a chest and don't find a gem, =remaining_gems= stays the same but =remaining_chests= decreases, so the probability of a gem increases for the next chest. If you do find a gem, the probability decreases. Eventually, if you have enough bad luck, the probability will reach 100%. The probability of one chest having a gem is no longer /independent/ of another chest having a gem. This strategy could work in general. We can increase the probability over time, so that if you're having a streak of bad luck, the increased probability can help you get out. Are there other ways of changing the probability based on the kind of luck you've had? ** Spacing - a whole 'nuther topic :PROPERTIES: :CUSTOM_ID: spacing-a-whole-nuther-topic :END: So far we've only been looking at /how many/ gems drop in $N$ chests, and /how many/ chests are needed to get $k$ gems. For game design, we also want to consider how they're distributed. Both of these sequences produce 3 gems in 10 chests but they're probably not equally fun: #+begin_export html #+end_export Which of these is more fun? Will the player /enjoy/ the game less knowing that there's a nonuniform probability? Is there a /fair/ way to have the player experience more fun sequences? Do we need to /explain/ the nonuniformity in the game? If so, is there an in-game mechanism that would produce these sequences? For example, if you want more loot drops towards the end of the sequence, maybe the player's character is getting better at spotting treasures. If you want more loot drops towards the beginning of the sequence, maybe the monsters are getting better at hiding treasures. ** Repeating shuffles :PROPERTIES: :CUSTOM_ID: repeating-shuffles :END: If you have a never-ending selection, such as picking music from your music library, the shuffle approach runs into a problem: you might by chance shuffle the array [A, B, Z] and then next time [Z, B, A], so you play Z twice in a row. Another approach is to use /filtered randomness/, where you generate a sequence but then you remove things that you don't want. https://engineering.atspotify.com/2014/02/how-to-shuffle-songs/ #+begin_example TODO Sampling with and without replacement Hybrid with/out replacement with delay D D = 0 means with replacement D = max means without replacement array { a b c d e f g h } put a partition at array[D] pick random number, put it on the right partition cycle through right partition somehow (it's a circular queue) why not just two arrays? could be easier, except for the initial state at D = 0, D = max, does it become what we want? what are the properties of this? #+end_example * System libraries :PROPERTIES: :CUSTOM_ID: system-libraries :END: - [[https://en.cppreference.com/w/cpp/numeric/random][C++