Conditional Probability and Bayes Theorem

Probabity and Statistics

Author

Anushka Dhiman

Published

January 11, 2025

Uptill now in probability, we have discussed the methods of finding the probability of events. If we have two events from the same sample space, does the information about the occurrence of one of the events affect the probability of the other event?

Conditional Probability

Let us try to answer this question by taking up a random experiment in which the outcomes are equally likely to occur.

Consider the experiment of tossing three fair coins. The sample space of the experiment is S = {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}

Since the coins are fair, we can assign the probability \(\frac{1}{8}\) to each sample point. Let \(E\) be the event “at least two heads appear” and \(F\) be the event “first coin shows tail”. Then:

\(E = \{HHH, HHT, HTH, THH\}\)
\(F = \{THH, THT, TTH, TTT\}\)

Therefore:

The probability of \(E\) is the sum of the probabilities of its sample points:

\[ P(E) = P(\{HHH\}) + P(\{HHT\}) + P(\{HTH\}) + P(\{THH\})\]

\[= \frac{1}{8} + \frac{1}{8} + \frac{1}{8} + \frac{1}{8} = \frac{4}{8} = \frac{1}{2} \]

The reason for this calculation is that each of these outcomes is equally likely, and there are four outcomes where at least two heads appear.
The probability of \(F\) is the sum of the probabilities of its sample points:

\[ P(F) = P(\{THH\}) + P(\{THT\}) + P(\{TTH\}) + P(\{TTT\})\]

\[= \frac{1}{8} + \frac{1}{8} + \frac{1}{8} + \frac{1}{8} = \frac{4}{8} = \frac{1}{2} \]
The intersection of \(E\) and \(F\), denoted \(E \cap F\), contains only the outcome where the first coin shows a tail and at least two heads appear:

\[ E \cap F = \{THH\} \]

Therefore, the probability of \(E \cap F\) is:

\[ P(E \cap F) = P(\{THH\}) = \frac{1}{8}\]

Now, suppose we are given that the first coin shows tail, i.e. F occurs, then what is the probability of occurrence of E? With the information of occurrence of F, we are sure that the cases in which first coin does not result into a tail should not be considered while finding the probability of E. This information reduces our sample space from the set S to its subset F for the event E. In other words, the additional information really amounts to telling us that the situation may be considered as being that of a new random experiment for which the sample space consists of all those outcomes only which are favourable to the occurrence of the event F. Now, the sample point of F which is favourable to event E is THH.

Now, suppose we are given that the first coin shows tail, i.e. F occurs, then what is the probability of occurrence of E?

With the information of occurrence of F, we are sure that the cases in which first coin does not result into a tail should not be considered while finding the probability of E. This information reduces our sample space from the set S to its subset F for the event E.

In other words, the additional information really amounts to telling us that the situation may be considered as being that of a new random experiment for which the sample space consists of all those outcomes only which are favourable to the occurrence of the event F.

Now, the sample point of F which is favourable to event E is THH.

Thus, the probability of \(E\) considering \(F\) as the sample space is \(\frac{1}{4}\). The probability of \(E\) given that the event \(F\) has occurred is also \(\frac{1}{4}\). This probability of the event \(E\) is called the conditional probability of \(E\) given that \(F\) has already occurred, and is denoted by \(P(E|F)\). Thus,

\[ P(E|F) = \frac{1}{4} \]

Note that the elements of \(F\) which favor the event \(E\) are the common elements of \(E\) and \(F\), i.e., the sample points of \(E \cap F\). Thus, we can also write the conditional probability of \(E\) given that \(F\) has occurred as:

\[ P(E|F) = \frac{\text{Number of elementary events favorable to } E \cap F}{\text{Number of elementary events favorable to } F} \]

Dividing the numerator and the denominator by the total number of elementary events of the sample space, we see that \(P(E|F)\) can also be written as:

\[ P(E|F) = \frac{n(E \cap F)}{n(F)} = \frac{n(E \cap F) / n(S)}{n(F) / n(S)} = \frac{P(E \cap F)}{P(F)} \]

Note that this formula is valid only when \(P(F) \neq 0\), i.e., \(F \neq \phi\). This is because division by zero is undefined, and if \(P(F) = 0\), it means that the event \(F\) is impossible, so it cannot be used as a condition for calculating conditional probability.

Thus, we can define the conditional probability as follows:

If \(E\) and \(F\) are two events associated with the same sample space of a random experiment, the conditional probability of the event \(E\) given that \(F\) has occurred, i.e., \(P(E|F)\), is given by:

\[ P(E|F) = \frac{P(E \cap F)}{P(F)} \]

provided that \(P(F) \neq 0\).

Bayes’ Theorem

Consider that there are two bags I and II. Bag I contains 2 white and 3 red balls and Bag II contains 4 white and 5 red balls.

One ball is drawn at random from one of the bags.

We can find the probability of selecting any of the bags (i.e. 1/2 ) or probability of drawing a ball of a particular colour (say white) from a particular bag (say Bag I).

In other words, we can find the probability that the ball drawn is of a particular colour, if we are given the bag from which the ball is drawn. But, can we find the probability that the ball drawn is from a particular bag (say Bag II), if the colour of the ball drawn is given?

Here, we have to find the reverse probability of Bag II to be selected when an event occurred after it is known.

Famous mathematician, John Bayes’ solved the problem of finding reverse probability by using conditional probability. The formula developed by him is known as ‘Bayes theorem’ which was published posthumously in 1763. Before stating and proving the Bayes’ theorem, let us first take up a definition and some preliminary results.

A set of events \(E_1, E_2, \ldots, E_n\) is said to represent a partition of the sample space \(S\) if:

\(E_i \cap E_j = \phi\), for \(i \neq j\), where \(i, j = 1, 2, 3, \ldots, n\). This means that the events are pairwise disjoint.
\(E_1 \cup E_2 \cup \ldots \cup E_n = S\). This means that the events are exhaustive.
\(P(E_i) > 0\) for all \(i = 1, 2, \ldots, n\). This means that each event has a nonzero probability.

In other words, the events \(E_1, E_2, \ldots, E_n\) represent a partition of the sample space \(S\) if they are pairwise disjoint, exhaustive, and have nonzero probabilities.

As an example, we see that any nonempty event \(E\) and its complement \(E'\) form a partition of the sample space \(S\) since they satisfy \(E \cap E' = \phi\) and \(E \cup E' = S\).

From the Venn diagram, one can easily observe that if \(E\) and \(F\) are any two events associated with a sample space \(S\), then the set \(\{E \cap F', E \cap F, E' \cap F, E' \cap F'\}\) is a partition of the sample space \(S\).

It may be mentioned that the partition of a sample space is not unique. There can be several partitions of the same sample space. We shall now prove a theorem known as Theorem of total probability.

Theorem of Total Probability:

Let \(\{E_1, E_2, \ldots, E_n\}\) be a partition of the sample space \(S\), and suppose that each of the events \(E_1, E_2, \ldots, E_n\) has a nonzero probability of occurrence. Let \(A\) be any event associated with \(S\). Then, the probability of \(A\) can be expressed as:

\[ P(A) = P(E_1)P(A|E_1) + P(E_2)P(A|E_2) + \ldots + P(E_n)P(A|E_n) \]

Alternatively, using summation notation:

\[ P(A) = \sum_{i=1}^{n} P(E_i)P(A|E_i) \]

We shall now state and prove Bayes’ Theorem.

Bayes’ Theorem:

If \(E_1, E_2, \ldots, E_n\) are \(n\) non-empty events that constitute a partition of the sample space \(S\), i.e., \(E_1, E_2, \ldots, E_n\) are pairwise disjoint and \(E_1 \cup E_2 \cup \ldots \cup E_n = S\), and \(A\) is any event of nonzero probability, then:

\[ P(E_i|A) = \frac{P(E_i)P(A|E_i)}{\sum_{j=1}^{n} P(E_j)P(A|E_j)} \]

for any \(i = 1, 2, 3, \ldots, n\).

Proof:

By the formula of conditional probability, we know that:

\[ P(E_i|A) = \frac{P(A \cap E_i)}{P(A)} \]

Using the multiplication rule of probability, we have:

\[ P(A \cap E_i) = P(E_i)P(A|E_i) \]

Thus,

\[ P(E_i|A) = \frac{P(E_i)P(A|E_i)}{P(A)} \]

By the theorem of total probability, we know that:

\[ P(A) = \sum_{j=1}^{n} P(E_j)P(A|E_j) \]

Substituting this into the expression for \(P(E_i|A)\) gives:

\[ P(E_i|A) = \frac{P(E_i)P(A|E_i)}{\sum_{j=1}^{n} P(E_j)P(A|E_j)} \]

Remark:

The following terminology is generally used when Bayes’ theorem is applied. The events \(E_1, E_2, \ldots, E_n\) are called hypotheses. The probability \(P(E_i)\) is called the a priori probability of the hypothesis \(E_i\). The conditional probability \(P(E_i|A)\) is called the a posteriori probability of the hypothesis \(E_i\). Bayes’ theorem is also called the formula for the probability of “causes”. Since the \(E_i\)’s are a partition of the sample space \(S\), one and only one of the events \(E_i\) occurs (i.e., one of the events \(E_i\) must occur and only one can occur). Hence, the above formula gives us the probability of a particular \(E_i\) (i.e., a “cause”), given that the event \(A\) has occurred. Bayes’ theorem has its applications in a variety of situations, a few of which are illustrated in the following examples.

Let’s consider an example,

Bag I contains 3 red and 4 black balls, while another Bag II contains 5 red and 6 black balls. One ball is drawn at random from one of the bags and it is found to be red. We need to find the probability that it was drawn from Bag II.

Take a shot at solving it by yourself first!!!

The moment of truth: here’s the solution!

Let \(E_1\) be the event of choosing Bag I, \(E_2\) the event of choosing Bag II, and \(A\) be the event of drawing a red ball. Then:

\(P(E_1) = P(E_2) = \frac{1}{2}\)
\(P(A|E_1) = P(\text{drawing a red ball from Bag I}) = \frac{3}{7}\)
\(P(A|E_2) = P(\text{drawing a red ball from Bag II}) = \frac{5}{11}\)

Now, the probability of drawing a ball from Bag II, given that it is red, is \(P(E_2|A)\). By using Bayes’ theorem, we have:

\[ P(E_2|A) = \frac{P(E_2)P(A|E_2)}{P(E_1)P(A|E_1) + P(E_2)P(A|E_2)} \]

Substituting the given values:

\[ P(E_2|A) = \frac{\frac{1}{2} \times \frac{5}{11}}{\frac{1}{2} \times \frac{3}{7} + \frac{1}{2} \times \frac{5}{11}} \]

Simplifying:

\[ P(E_2|A) = \frac{\frac{5}{22}}{\frac{3}{14} + \frac{5}{22}} = \frac{\frac{5}{22}}{\frac{33 + 35}{154}} = \frac{\frac{5}{22}}{\frac{68}{154}} = \frac{5}{22} \times \frac{154}{68} = \frac{5 \times 7}{68} = \frac{35}{68}\]