Chapter 5: Inferential Statistics
Sampling Methods and Sample Size Determination
Inferential statistics involve making predictions or generalizations about a population based on a sample. A well-chosen sample can provide meaningful insights while minimizing biases and errors.
Sampling Methods
- Simple Random Sampling (SRS): Every individual in the population has an equal chance of being selected.
Example: Selecting 100 students randomly from a university enrollment list. - Stratified Sampling: The population is divided into subgroups (strata) based on shared characteristics, and samples are taken from each stratum.
Example: Selecting participants from different income brackets proportionally. - Cluster Sampling: The population is divided into clusters, and a few clusters are randomly selected.
Example: A national survey on education selects random schools instead of individuals. - Systematic Sampling: Every nth member of the population is chosen.
Example: Selecting every 10th person from a customer list. - Convenience Sampling: Samples are taken based on ease of access, though they may introduce bias.
Example: A mall survey where participants are chosen based on their presence. - Snowball Sampling: Used for hard-to-reach populations where existing subjects recruit new subjects.
Example: Researching drug users where one participant refers another.
Sample Size Determination
The formula for determining sample size for estimating a population mean is:
n = (Z² * σ²) / E²
where:
- Z = Z-score (confidence level)
- σ = Population standard deviation
- E = Margin of error
Example: If Z = 1.96, σ = 10, and E = 2, then:
n = (1.96² * 10²) / 2² = 96.04 (almost 97)
Confidence Intervals and Hypothesis Testing
Confidence Intervals (CIs)
A confidence interval provides a range where we expect the population parameter to lie with a given probability.
Hypothesis Testing
- Null Hypothesis (H₀): No effect or no difference exists.
- Alternative Hypothesis (H₁): A difference or effect exists.
- Test Statistic: Measures how far the sample statistic is from the null hypothesis.
- P-value: Probability of obtaining results as extreme as observed under H₀.
- Decision Rule: Reject H₀ if p-value < significance level.
T-tests, Chi-Square Tests, and ANOVA
T-tests
- One-Sample T-Test: Tests if the sample mean differs from a known population mean.
- Independent Two-Sample T-Test: Compares means between two independent groups.
- Paired T-Test: Compares means before and after treatment on the same subjects.
Example: Comparing test scores of two different teaching methods.
Chi-Square Test
Tests for independence or goodness of fit for categorical data.
Example: Checking if voting preference is independent of gender.
χ² = Σ [(O - E)² / E]
ANOVA (Analysis of Variance)
Compares means across multiple groups.
Example: Testing if different diets lead to different weight losses.
P-values, Significance Levels, and Practical Significance
P-values
- P < 0.05: Reject H₀ (statistically significant result).
- P > 0.05: Fail to reject H₀ (no strong evidence against it).
Significance Levels
- Common values: 0.01, 0.05, 0.10
- Lower values reduce Type I errors (false positives) but increase Type II errors (false negatives).
Practical Significance
A result may be statistically significant but not practically meaningful.
Example: A new drug reduces recovery time by 1 hour but costs significantly more.
Sample Hypothesis Tests: Users input sample data, and the system calculates test statistics and p-values.
Coding Walkthroughs: Python Implementation: Step-by-step coding examples for hypothesis testing.
from scipy.stats import ttest_ind
group1 = [23, 25, 28, 22, 30]
group2 = [32, 34, 29, 31, 28]
stat, p_value = ttest_ind(group1, group2)
print(f"T-statistic: {stat}, P-value: {p_value}")