Suppose we have a collection of objects which can be classified into two distinct categories. Denote the categories:
Suppose that within these objects there exist of type and therefore of type . We choose objects without replacement, that is, we remove items in succession from the original set of . Let be a random variable representing the number of successes obtained. Then has a hypergeometric distribution.
We require three values. Refer to the primer on combinations for a discussion on counting rules and notation.
First, we require the total possible number of ways that distinct items can be chosen from without replacement and where order does not matter. This is ‘ Choose ’
Second, we wish to know the number of ways that successes can be drawn from a total of
Third, we are left to select failures from a total of
Then the probability of selecting successes is
A deck of playing cards contains 4 suits (hearts, spade, diamonds,clubs) for each of 13 types {Ace, 2, 3,…, 10, King, Queen, Jack}. Deal 5 cards from the deck. The probability of selecting up to four Aces from the deck follows a hypergeometric distribution
Consider an experiment in which we have two distinct types of outcomes that can be categorized as
Suppose the probability of success is and failure is . Repeat the experiment independent times. If is the number of successes then it has a binomal distribution denoted .
We require three values. Refer to the primer on combinations for a discussion on counting rules and notation.
First, we require the total possible number of ways that successes can be arranged within experiments which is
Then the probability of each arrangement is multiplied times for successes and likewise for failures
Therefore,
The mean of a binomial distribution of sample size and probability is
The variance is
The key difference between the hypergeometric and the binomial distribution is that the hypergeometric involves the probability of an event when selection is made without replacement. In other words, the hypergeometric setup assumes some dependence amongst the selection of successes and failures. For example, choosing an ace from a deck and removing it reduces the probability of selecting a remaining ace. In contrast, the binomial distribution assumes independence and can be viewed as appropriate when event selection is made with replacement.
There are limiting cases where the hypergeometric can be approximated by the binomial. Consider the hypergeometric case where the total number of possible successes and failures is large compared to the number of selections . Then the probability of success does not appreciably upon selection without replacement. That is, for
Toss a fair die 10 times and let be the number of sixes then .
Consider a limiting case of the binomial distribution as and but is fixed. This means that the event of interest is relatively rare. Then has a Poisson distribution .
Since then and
The mean of a Poisson distribution with some sampling size and probability is
The variance is
Suppose that 200 people are at a party. What is the probability that 2 of them were born on December 25th? In this case and assuming birthdays are independent then and the mean
Definition The gamma function os is
There are two nice properties of the gamma function that we will use.
Let be a non-negative continuous random variable. Then if the probability function is of the form
then has a gamma distribution . Typically, is called the ‘shape’ parameter and the ‘scale’ or ‘rate’.
The setup is very similar to the binomial. Consider an experiment in which we have two distinct types of outcomes that can be categorized as
Suppose the probability of success is and failure is . Repeat an experiment until a pre-specified number of failures have been obtained. Let be the number of successes before the failure. Then has a negative binomial distribution denoted .
There will be total trials but the last event is a failure so we really care about the first trials. There will be successes and failures in any order. Each order has a probability identical to a binomial trial .
The mean of a negative binomial distribution is
The variance is
Count data such as RNA sequencing mapped reads is often modeled with a Poisson distribution where the mean and variance are equal to . However, there are cases where the variance exceeds that specified by the mean. To account for this ‘overdispersed’ data, the negative binomial distribution can be utilized. As we will show below, the negative binomial arises as a Poisson distribution where the Poisson parameter is itself a random variable distributed according to a Gamma distribution.
Let us state this in a more precise fashion. Suppose that we have distribution of counts that follows a Poisson distribution indexed by the parameter . Now suppose that is itself some function of another random variable where . Then the conditional distribution of the random variable of counts is
Let follow a gamma distribution with shape and scale .
The joint density of and is
Derive the marginal distribution of by integrating over the values of .
The key here is to transform the integrand into a gamma distribution with shape parameter and scale and noting that the integral over all values is unity.
It is simple to see that this result is the negative binomial with and . In this case the moments can be stated using these new variables.
From the moments of the negative binomial stated above, the mean is
The variance is
This mixture model will be important in our discussion of RNA sequencing data differential expression testing. In this case the notation is altered where and is the ‘dispersion’ parameter for some counts of an RNA species . Also the gamma function is used to replace the binomial coefficients.
From the above discussion, we can restate the mean and variance.
Note that as the dispersion parameter approaches zero the negative binomial variance approaches the mean. Thus the dispersion parameter accounts for the extra variability over and above that expected with a Poisson.