Bayes Exercises

Updated on Oct 10, 2024 11 minutes to read lecture sheets

These are attempts at finding more diverse problems that make use of Bayes’ theorem.

Simple Questions with Solutions

Exercise 1: Email Spam Filter

An email service uses a spam filter that flags emails containing the word “offer” as potential spam. The following statistics are known:

5% of all emails are spam.
70% of spam emails contain the word “offer”.
10% of legitimate emails contain the word “offer”.

Question: If an email contains the word “offer”, what is the probability that it is spam?

Solution:

Let:

( S ) = Email is spam.
( O ) = Email contains the word “offer”.

We need to find ( P(S|O) ).

Step 1: Write down the known probabilities.

( P(S) = 0.05 )
( P(\overline{S}) = 0.95 ) (Email is not spam)
( P(O|S) = 0.70 )
( P(O|\overline{S}) = 0.10 )

Step 2: Calculate ( P(O) ) using the Law of Total Probability.

[ P(O) = P(O|S)P(S) + P(O|\overline{S})P(\overline{S}) = (0.70)(0.05) + (0.10)(0.95) = 0.035 + 0.095 = 0.13 ]

Step 3: Apply Bayes’ Theorem.

[ P(S|O) = \frac{P(O|S)P(S)}{P(O)} = \frac{(0.70)(0.05)}{0.13} = \frac{0.035}{0.13} \approx 0.2692 ]

Answer: Approximately 26.92% chance the email is spam.

Exercise 2: Medical Diagnosis

A certain disease is present in 1% of the population. A test for the disease has:

True positive rate (sensitivity): 99%.
False positive rate: 2%.

Question: If a person tests positive, what is the probability they actually have the disease?

Solution:

Let:

( D ) = Person has the disease.
( T ) = Test is positive.

We need to find ( P(D|T) ).

Step 1: Write down the known probabilities.

( P(D) = 0.01 )
( P(\overline{D}) = 0.99 )
( P(T|D) = 0.99 )
( P(T|\overline{D}) = 0.02 )

Step 2: Calculate ( P(T) ).

[ P(T) = P(T|D)P(D) + P(T|\overline{D})P(\overline{D}) = (0.99)(0.01) + (0.02)(0.99) = 0.0099 + 0.0198 = 0.0297 ]

Step 3: Apply Bayes’ Theorem.

[ P(D|T) = \frac{P(T|D)P(D)}{P(T)} = \frac{(0.99)(0.01)}{0.0297} = \frac{0.0099}{0.0297} \approx 0.3333 ]

Answer: Approximately 33.33% chance the person has the disease.

Exercise 3: Faulty Components in Manufacturing

A factory sources components from two suppliers:

Supplier A provides 60% of components with a defect rate of 1%.
Supplier B provides 40% of components with a defect rate of 3%.

Question: If a randomly selected component is defective, what is the probability it came from Supplier B?

Solution:

Let:

( B ) = Component is from Supplier B.
( D ) = Component is defective.

We need to find ( P(B|D) ).

Step 1: Write down the known probabilities.

( P(B) = 0.40 )
( P(A) = 0.60 )
( P(D|B) = 0.03 )
( P(D|A) = 0.01 )

Step 2: Calculate ( P(D) ).

[ P(D) = P(D|A)P(A) + P(D|B)P(B) = (0.01)(0.60) + (0.03)(0.40) = 0.006 + 0.012 = 0.018 ]

Step 3: Apply Bayes’ Theorem.

[ P(B|D) = \frac{P(D|B)P(B)}{P(D)} = \frac{(0.03)(0.40)}{0.018} = \frac{0.012}{0.018} \approx 0.6667 ]

Answer: Approximately 66.67% chance the defective component is from Supplier B.

Exercise 4: Network Intrusion Detection

In a computer network:

The probability of an intrusion attempt at any time is 0.2%.
An intrusion detection system (IDS) correctly detects an intrusion 98% of the time.
The IDS has a false alarm rate of 1%.

Question: If the IDS signals an intrusion, what is the probability that an intrusion is actually occurring?

Solution:

Let:

( I ) = Intrusion occurring.
( A ) = IDS signals an intrusion.

We need to find ( P(I|A) ).

Step 1: Write down the known probabilities.

( P(I) = 0.002 )
( P(\overline{I}) = 0.998 )
( P(A|I) = 0.98 )
( P(A|\overline{I}) = 0.01 )

Step 2: Calculate ( P(A) ).

[ P(A) = P(A|I)P(I) + P(A|\overline{I})P(\overline{I}) = (0.98)(0.002) + (0.01)(0.998) = 0.00196 + 0.00998 = 0.01194 ]

Step 3: Apply Bayes’ Theorem.

[ P(I|A) = \frac{P(A|I)P(I)}{P(A)} = \frac{(0.98)(0.002)}{0.01194} = \frac{0.00196}{0.01194} \approx 0.1641 ]

Answer: Approximately 16.41% chance an intrusion is actually occurring.

Exercise 5: Credit Card Fraud Detection

A bank uses an algorithm to detect fraudulent transactions:

0.1% of all transactions are fraudulent.
The algorithm correctly identifies fraudulent transactions 99% of the time.
It incorrectly flags legitimate transactions 0.5% of the time.

Question: If a transaction is flagged as fraudulent, what is the probability it is actually fraudulent?

Solution:

Let:

( F ) = Transaction is fraudulent.
( L ) = Transaction is flagged.

We need to find ( P(F|L) ).

Step 1: Write down the known probabilities.

( P(F) = 0.001 )
( P(\overline{F}) = 0.999 )
( P(L|F) = 0.99 )
( P(L|\overline{F}) = 0.005 )

Step 2: Calculate ( P(L) ).

[ P(L) = P(L|F)P(F) + P(L|\overline{F})P(\overline{F}) = (0.99)(0.001) + (0.005)(0.999) = 0.00099 + 0.004995 = 0.005985 ]

Step 3: Apply Bayes’ Theorem.

[ P(F|L) = \frac{P(L|F)P(F)}{P(L)} = \frac{(0.99)(0.001)}{0.005985} = \frac{0.00099}{0.005985} \approx 0.1654 ]

Answer: Approximately 16.54% chance the transaction is actually fraudulent.

Hard Questions with Solutions

Exercise 6: Machine Learning Classifier Performance

A machine learning model classifies images as “cat” or “not cat”. The following data is known:

20% of all images are actually of cats.
The model correctly identifies cat images 95% of the time.
It incorrectly labels non-cat images as “cat” 10% of the time.

Question: If the model labels an image as “cat”, what is the probability that it actually is a cat?

Solution:

Let:

( C ) = Image is of a cat.
( L ) = Model labels image as “cat”.

We need to find ( P(C|L) ).

Step 1: Write down the known probabilities.

( P(C) = 0.20 )
( P(\overline{C}) = 0.80 )
( P(L|C) = 0.95 )
( P(L|\overline{C}) = 0.10 )

Step 2: Calculate ( P(L) ).

[ P(L) = P(L|C)P(C) + P(L|\overline{C})P(\overline{C}) = (0.95)(0.20) + (0.10)(0.80) = 0.19 + 0.08 = 0.27 ]

Step 3: Apply Bayes’ Theorem.

[ P(C|L) = \frac{P(L|C)P(C)}{P(L)} = \frac{(0.95)(0.20)}{0.27} = \frac{0.19}{0.27} \approx 0.7037 ]

Answer: Approximately 70.37% chance the image actually is a cat.

Exercise 7: Software Bug Detection

In a software system:

5% of modules contain a bug.
Static code analysis detects bugs in buggy modules 90% of the time.
It incorrectly reports bugs in clean modules 2% of the time.

Question: If a module is flagged by the analysis, what is the probability it actually contains a bug?

Solution:

Let:

( B ) = Module contains a bug.
( F ) = Module is flagged.

We need to find ( P(B|F) ).

Step 1: Write down the known probabilities.

( P(B) = 0.05 )
( P(\overline{B}) = 0.95 )
( P(F|B) = 0.90 )
( P(F|\overline{B}) = 0.02 )

Step 2: Calculate ( P(F) ).

[ P(F) = P(F|B)P(B) + P(F|\overline{B})P(\overline{B}) = (0.90)(0.05) + (0.02)(0.95) = 0.045 + 0.019 = 0.064 ]

Step 3: Apply Bayes’ Theorem.

[ P(B|F) = \frac{P(F|B)P(B)}{P(F)} = \frac{(0.90)(0.05)}{0.064} = \frac{0.045}{0.064} \approx 0.7031 ]

Answer: Approximately 70.31% chance the module contains a bug.

Exercise 8: Legal Evidence Evaluation

In a court case:

1 in 1,000 people match a certain DNA profile.
The DNA evidence shows a match with the suspect.
The lab has an error rate of 1% (false matches).

Question: What is the probability that the suspect is actually the source of the DNA?

Solution:

Let:

( G ) = Suspect is guilty (source of DNA).
( M ) = DNA test shows a match.

We need to find ( P(G|M) ).

Step 1: Write down the known probabilities.

( P(G) ) is initially unknown but can be considered as the prior probability before DNA evidence. Since the suspect is randomly picked from the population, ( P(G) = \frac{1}{N} ), where ( N ) is the population size. However, since we’re given that 1 in 1,000 people match the DNA profile, we can consider:
( P(G) = \frac{1}{1000} = 0.001 )
( P(\overline{G}) = 0.999 )
( P(M|G) = 0.99 ) (since lab error rate is 1%)
( P(M|\overline{G}) = 0.01 ) (false positive rate)

Step 2: Calculate ( P(M) ).

[ P(M) = P(M|G)P(G) + P(M|\overline{G})P(\overline{G}) = (0.99)(0.001) + (0.01)(0.999) = 0.00099 + 0.00999 = 0.01098 ]

Step 3: Apply Bayes’ Theorem.

[ P(G|M) = \frac{P(M|G)P(G)}{P(M)} = \frac{(0.99)(0.001)}{0.01098} = \frac{0.00099}{0.01098} \approx 0.09 ]

Answer: Approximately 9% chance the suspect is the source of the DNA.

Exercise 9: Financial Risk Assessment

A bank assesses loan applicants for default risk:

2% of applicants are high risk (will default).
The risk assessment tool correctly identifies high-risk applicants 95% of the time.
It incorrectly labels low-risk applicants as high risk 5% of the time.

Question: If an applicant is identified as high risk, what is the probability they will default?

Solution:

Let:

( D ) = Applicant will default.
( H ) = Applicant identified as high risk.

We need to find ( P(D|H) ).

Step 1: Write down the known probabilities.

( P(D) = 0.02 )
( P(\overline{D}) = 0.98 )
( P(H|D) = 0.95 )
( P(H|\overline{D}) = 0.05 )

Step 2: Calculate ( P(H) ).

[ P(H) = P(H|D)P(D) + P(H|\overline{D})P(\overline{D}) = (0.95)(0.02) + (0.05)(0.98) = 0.019 + 0.049 = 0.068 ]

Step 3: Apply Bayes’ Theorem.

[ P(D|H) = \frac{P(H|D)P(D)}{P(H)} = \frac{(0.95)(0.02)}{0.068} = \frac{0.019}{0.068} \approx 0.2794 ]

Answer: Approximately 27.94% chance the applicant will default.

Exercise 10: Bayesian Network in Diagnostics

A complex system can fail due to three independent components A, B, and C. The failure probabilities are:

( P(A) = 0.02 )
( P(B) = 0.03 )
( P(C) = 0.05 )

When the system fails, diagnostics are run:

If A fails, diagnostics detect it 90% of the time.
If B fails, diagnostics detect it 80% of the time.
If C fails, diagnostics detect it 70% of the time.
If no component fails, diagnostics falsely indicate a failure 1% of the time.

Question: If diagnostics indicate a failure in component B, what is the probability that B actually failed?

Solution:

Let:

( B_f ) = Component B failed.
( D_B ) = Diagnostics indicate failure in component B.

We need to find ( P(B_f|D_B) ).

Step 1: Since components fail independently, calculate ( P(B_f) = 0.03 ).

Step 2: Write down detection probabilities.

( P(D_B|B_f) = 0.80 )
( P(D_B|\overline{B_f}) ):
- If B did not fail, diagnostics could indicate failure due to:
  - False positive: 1% chance.

However, since components are independent, the overall false positive for B is the probability that diagnostics falsely indicate B failed when it didn’t. Since false positives occur 1% of the time:

( P(D_B|\overline{B_f}) = 0.01 )

Step 3: Calculate ( P(D_B) ).

[ P(D_B) = P(D_B|B_f)P(B_f) + P(D_B|\overline{B_f})P(\overline{B_f}) = (0.80)(0.03) + (0.01)(0.97) = 0.024 + 0.0097 = 0.0337 ]

Step 4: Apply Bayes’ Theorem.

[ P(B_f|D_B) = \frac{P(D_B|B_f)P(B_f)}{P(D_B)} = \frac{(0.80)(0.03)}{0.0337} = \frac{0.024}{0.0337} \approx 0.7128 ]

Answer: Approximately 71.28% chance that component B actually failed.

Additional Exercises Without Solutions

Exercise 11: Drug Use Testing

In a company:

4% of employees use a certain prohibited substance.
A test detects users 95% of the time (true positive).
It has a 10% false positive rate.

Question: What is the probability that an employee who tests positive actually uses the substance?

Exercise 12: Email Classification with Multiple Features

An email classifier uses two features:

Feature X is present in 70% of spam emails and 20% of non-spam emails.
Feature Y is present in 80% of spam emails and 10% of non-spam emails.
Overall, 5% of emails are spam.

Question: If an email contains both features X and Y, what is the probability it is spam?

Exercise 13: Sensor Reliability in Robotics

A robot uses a sensor that detects obstacles:

The sensor correctly detects obstacles 98% of the time.
It has a 2% false positive rate.
Obstacles are present 5% of the time.

Question: If the sensor indicates an obstacle, what is the probability that there is actually an obstacle?

Exercise 14: Market Research Survey

In a survey:

30% of people prefer Brand A.
An ad campaign increases the likelihood of choosing Brand A by 50%.
After the campaign, a person is selected and prefers Brand A.

Question: What is the probability the person was influenced by the ad?

Exercise 15: Reliability of Parallel Systems

A system uses two identical components in parallel:

Each component has a failure rate of 1%.
The system fails only if both components fail.

Question: If the system fails, what is the probability that component A failed?

Exercise 16: Fraud Detection in Transactions

An online store detects fraudulent transactions:

1% of transactions are fraudulent.
A fraud detection algorithm catches frauds 99% of the time.
It has a 0.5% false positive rate.

Question: If a transaction is flagged, what is the probability it is fraudulent?

Exercise 17: Medical Screening for Multiple Diseases

A test screens for two diseases D1 and D2:

Prevalence of D1: 1%, D2: 0.5%.
Test detects D1 with 95% accuracy and D2 with 90% accuracy.
False positives: 2% for D1, 1% for D2.

Question: If a patient tests positive for D1 but negative for D2, what is the probability they have D1?

Exercise 18: Investment Risk Assessment

An investment firm assesses clients:

10% are high-risk investors.
High-risk clients default on investments 20% of the time.
Low-risk clients default 1% of the time.

Question: If a client defaults, what is the probability they were a high-risk investor?

Exercise 19: Machine Failure Prediction

A machine has three sensors predicting failure:

Sensor A: 90% accurate.
Sensor B: 85% accurate.
Sensor C: 80% accurate.
Failure rate is 5%.

Question: If all three sensors predict failure, what is the probability the machine will actually fail?

Exercise 20: Diagnostic Test with Multiple Stages

A disease requires two positive tests for diagnosis:

Test 1: Sensitivity 95%, specificity 90%.
Test 2: Sensitivity 98%, specificity 95%.
Disease prevalence: 0.5%.

Question: If a patient tests positive on both tests, what is the probability they have the disease?

These exercises are designed to illustrate the power and versatility of Bayes’ Theorem across various fields and scenarios. They provide practical applications that are relevant and engaging for students studying discrete mathematics and computer science.