
Bayes' rule for conditional probability
Bayes' rule is another essential tenet of probability theory in the machine learning sphere. It allows us to calculate the conditional probability of an event happening by inverting the conditions of the events. Bayes' rule is formally written as:
Let's use Bayes' rule to look at a simple conditional probability problem. In the following table, we see the likelihood of a patient contacting a disease:

How do we interpret this table? The x axis tells us the percentage of the population who have the disease; if you have it, you are firmly in the Disease column. Based on that condition, the y axis is the likelihood of you testing positive or negative, based on whether you actually have the disease or not.
Now, let's say that we have a positive test result; what is the chance that we actually have the disease? We can use Bayes' formula to figure solve:
Our answer comes out to 7.8%, the actual probability of having the disease given a positive test:
In the following code, we can see how Bayes' formula can model these conditional events based on likelihood. In machine learning and AI in general, this comes in handy when modeling situations or perhaps classifying objects. Conditional probability problems also play into discriminative models, which we will discuss in our section on Generative adversarial networks:
.p_diseasePos = 0.8 ## Chance of having the disease given a positive result
p_diseaseNeg = 0.2 ## Chance of having the disease given a negative result
p_noPos = 0.096
p_noNeg = 0.904
p_FalsePos = (.80 * .01) + (.096 * .99)
p_disease_given_pos = (.80 * .01) / p_FalsePos
print(p_disease_given_pos)
Remember: when conducting multiplication, the type of operation matters. We can use the Hadamard product to multiply two equally-sized vectors or matrices where the output will be another equally-sized vector or matrix. We use the dot product in situations where we need a single number as an output. The dot product is essential in machine learning and deep learning; with neural networks, inputs are passed to each layer as a matrix or vector, and these are then multiplied with another matrix of weights, which forms the core of basic network operations.
Probability distributions and the computations based on them rely on Bayesian thinking in the machine learning realm. As we'll see in later chapters, some of the most innovative networks in AI directly rely on these distributions and the core concepts of Baye's theorem. Recall that there are two primary forms of probability distribution: PMFs for discrete variables, and probability density functions PDFs for continuous variables; CDF, which apply to any random variables, also exist.
Baye's rule, in fact, has inspired an entire branch of statistics known as Bayesian statistics. Thus far, we have discussed frequent statistics, which measure probability based on an observed space of repeatable events. Bayesian probability, on the other hand, measures degrees of belief; how likely is an event to happen based on the information that is currently available? This will become important as we delve into ANNs in the following chapters.