A Simple Discrete-Event Simulation: Part 16

So far I’ve been discussing ways of generating arrivals according to a specific schedule. There’s another way to do it, based on the Poisson distribution. From Wikipedia: The Poisson distribution is an appropriate model if the following assumptions are true.

  • K is the number of times an event occurs in an interval and K can take values 0, 1, 2, …
  • The occurrence of one event does not affect the probability that a second event will occur. That is, events occur independently.
  • The rate at which events occur is constant. The rate cannot be higher in some intervals and lower in other intervals.
  • Two events cannot occur at exactly the same instant.
  • The probability of an event in an interval is proportional to the length of the interval.

If these conditions are true, then K is a Poisson random variable, and the distribution of K is a Poisson distribution.

The article gives a more complete history of the concept, but I encountered it when doing reliability engineering for the maintenance of groups of aircraft.

The (normalized, e.g., expressed as a value between 0 and 1) probability of k events occurring in a specified interval is:

P(k events in interval) = λke / k!

where:

  • λ is the average number of events per interval
  • e is the number 2.71828… (Euler’s number) the base of the natural logarithms
  • k takes values 0, 1, 2, …
  • k! = k × (k − 1) × (k − 2) × … × 2 × 1 is the factorial of k.

The sum of the probabilities of every possible result occurring is equal to one. I always find it helpful to show how this works in a table. For a lambda value of 3 we get:

k P(k) Cumulative
0 0.04978706837 0.04978706837
1 0.14936120510 0.19914827347
2 0.22404180766 0.42319008113
3 0.22404180766 0.64723188878
4 0.16803135574 0.81526324452
5 0.10081881344 0.91608205797
6 0.05040940672 0.96649146469
7 0.02160403145 0.98809549614
8 0.00810151179 0.99619700794
9 0.00270050393 0.99889751187
10 0.00081015118 0.99970766305
11 0.00022095032 0.99992861337
12 0.00005523758 0.99998385095
13 0.00001274713 0.99999659809
14 0.00000273153 0.99999932961
15 0.00000054631 0.99999987592
16 0.00000010243 0.99999997835
17 0.00000001808 0.99999999643
18 0.00000000301 0.99999999944
19 0.00000000048 0.99999999992
20 0.00000000007 0.99999999999
21 0.00000000001 1.00000000000

The series is formally infinite, but there are practical limits to how many rows you might want to use. This could be determined by the maximum results you might want to allow, the historical sample size (the total number of NFL games played is roughly 15,000, suggesting that we shouldn’t go beyond five 9s of accuracy), or the granularity of the random number generator. It may also be possible to set the upper bounds based on a known historical maximum of occurrences.

So how do we use this animal, anyway?

If we look at the table above, built from an average rate of occurrence of 3, we basically roll a die (generate a random number between zero and one (non-inclusive). The result is the determined by the first value in column 3 (the cumulative column) that is larger than the result of the die roll. For example, if the random number generated is 0.01, then the result is zero occurrences during the interval (0.01 < 0.04978706837). If the random number is 0.5, the result is 3 occurrences (0.5 < 0.64723188878). As you can see, the majority of results will be clustered close to the average rate, with fewer below, and a long tail of more arrivals happening far less often.

This makes sense if you think about it. A shopping center might serve a typical number of customers on most days, but very large numbers of customers on a few days (say, around Christmas and the day after Thanksgiving). Ports of entry on the southern border of the U.S. see very high traffic around major Mexican holidays. A major league baseball player usually hits zero home runs per game, might hit one, two or three will be increasingly rare, and only a few players have ever hit four in a single game. No player has ever hit five.

To use these concepts in code we have to do a couple of things. First, we have to build a lookup table before we run the simulation. This requires that we actually calculate values for the Poisson function probabilities. The Poisson function itself requires a function to calculate factorials.

The function uses the above to return an array representing column three of the table shown above.

This function uses a given cumulation array to generate a whole number result based on an input value between zero and one, which is usually generated randomly.

Call the functions and generate a result.

Note that if we use this form we cannot control the number of occurrences that are actually generated. Thus we might not want to use this method for generating arrivals according to a schedule. Alternatively, we could break an arrival interval into sub-intervals and generate individual Poisson arrival rates for each interval. That would allow for some variability with the majority of results close to the expected value over the entire interval.

The point of introducing randomness into simulations is to explore the range of results the modeled system might generate under different conditions. Very subtle variations can have major effects on the output. The randomness is what makes something a Monte Carlo simulation.

This entry was posted in Simulation and tagged , , , . Bookmark the permalink.

Leave a Reply