Chapter 7: Advanced Probability and Statistical Modeling
7.1 Probability Density Functions and Cumulative Distributions
Probability Density Functions (PDFs) and Cumulative Distribution Functions (CDFs) are fundamental in probability theory and statistics. They describe the likelihood of a continuous random variable taking specific values or falling within a certain range.
Probability Density Function (PDF)
A PDF describes the relative likelihood of a continuous random variable taking on a particular value.
Mathematically, for a continuous random variable X with PDF f(x):
P(a ≤ X ≤ b) = ∫[a,b] f(x) dx
Example: The normal (Gaussian) distribution with mean μ and standard deviation σ:
f(x) = (1 / (σ√2π)) * e^(-(x-μ)² / (2σ²))
Cumulative Distribution Function (CDF)
The CDF gives the probability that a random variable X is less than or equal to a given value x.
F(x) = P(X ≤ x) = ∫[-∞,x] f(t) dt
Relation Between PDF and CDF
- The PDF is the derivative of the CDF: f(x) = d/dx F(x)
- The CDF is the integral of the PDF: F(x) = ∫[-∞,x] f(t) dt
7.2 Markov Chains and Hidden Markov Models
Markov Chains
A Markov Chain is a stochastic process where the future state depends only on the present state.
Example: Weather Prediction with states (Sunny, Rainy, Cloudy) and transition matrix:
[0.7 0.2 0.1] [0.3 0.5 0.2] [0.4 0.4 0.2]
Hidden Markov Models (HMMs)
HMMs extend Markov Chains by incorporating hidden states that generate observable outputs.
- States: Hidden states S.
- Observations: Emitted outputs O.
- Transition probabilities: P(St∣St−1).
- Emission probabilities: P(Ot∣St).
Example: Speech Recognition
- Hidden states = Phonemes (speech sounds).
- Observations = Audio signals.
7.3 Bayesian Inference and Applications
Bayesian inference applies Bayes' theorem to update the probability of a hypothesis given new evidence.
P(H|E) = (P(E|H) P(H)) / P(E)
- P(H|E) = posterior probability (updated belief).
- P(E|H) = likelihood (evidence given the hypothesis).
- P(H) = prior probability (initial belief).
- P(E) = marginal likelihood (total probability of evidence).
Example: Medical Diagnosis
Suppose a test for a disease has:
- Sensitivity = 0.99.
- False positive rate = 0.05.
- Disease prevalence = 0.001.
If a patient tests positive:
P(Disease∣Positive) = (0.99 × 0.001) / ((0.99×0.001) + (0.05×0.999) )
7.4 Monte Carlo Simulations
Monte Carlo methods use randomness to approximate solutions to problems that may be deterministic in principle.
Steps in Monte Carlo Simulation
- Define the problem mathematically.
- Generate random samples.
- Evaluate the function for each sample.
- Estimate the expected value using averages.
Example: Estimating Pi
To estimate π, we use a unit circle inside a square:
- Generate N random points in [0,1] × [0,1].
- Count points inside the circle (x² + y² ≤ 1).
- Estimate: π ≈ 4 × (points inside circle / N).
Applications of Monte Carlo Methods
- Finance: Portfolio risk analysis.
- Physics: Particle simulations.
- Machine Learning: Bayesian deep learning.
Summary
- PDFs and CDFs describe probability distributions.
- Markov Chains and HMMs model sequential dependencies.
- Bayesian Inference updates probabilities using evidence.
- Monte Carlo Simulations approximate solutions using randomness.