Chapter 5: Inferential Statistics

Sampling Methods and Sample Size Determination

Inferential statistics involve making predictions or generalizations about a population based on a sample. A well-chosen sample can provide meaningful insights while minimizing biases and errors.

Sampling Methods

  • Simple Random Sampling (SRS): Every individual in the population has an equal chance of being selected.
    Example: Selecting 100 students randomly from a university enrollment list.
  • Stratified Sampling: The population is divided into subgroups (strata) based on shared characteristics, and samples are taken from each stratum.
    Example: Selecting participants from different income brackets proportionally.
  • Cluster Sampling: The population is divided into clusters, and a few clusters are randomly selected.
    Example: A national survey on education selects random schools instead of individuals.
  • Systematic Sampling: Every nth member of the population is chosen.
    Example: Selecting every 10th person from a customer list.
  • Convenience Sampling: Samples are taken based on ease of access, though they may introduce bias.
    Example: A mall survey where participants are chosen based on their presence.
  • Snowball Sampling: Used for hard-to-reach populations where existing subjects recruit new subjects.
    Example: Researching drug users where one participant refers another.

Sample Size Determination

The formula for determining sample size for estimating a population mean is:

n = (Z² * σ²) / E²

where:

  • Z = Z-score (confidence level)
  • σ = Population standard deviation
  • E = Margin of error

Example: If Z = 1.96, σ = 10, and E = 2, then:

n = (1.96² * 10²) / 2² = 96.04 (almost 97)

Confidence Intervals and Hypothesis Testing

Confidence Intervals (CIs)

A confidence interval provides a range where we expect the population parameter to lie with a given probability.

Hypothesis Testing

  • Null Hypothesis (H₀): No effect or no difference exists.
  • Alternative Hypothesis (H₁): A difference or effect exists.
  • Test Statistic: Measures how far the sample statistic is from the null hypothesis.
  • P-value: Probability of obtaining results as extreme as observed under H₀.
  • Decision Rule: Reject H₀ if p-value < significance level.

T-tests, Chi-Square Tests, and ANOVA

T-tests

  • One-Sample T-Test: Tests if the sample mean differs from a known population mean.
  • Independent Two-Sample T-Test: Compares means between two independent groups.
  • Paired T-Test: Compares means before and after treatment on the same subjects.

Example: Comparing test scores of two different teaching methods.

Chi-Square Test

Tests for independence or goodness of fit for categorical data.

Example: Checking if voting preference is independent of gender.

χ² = Σ [(O - E)² / E]

ANOVA (Analysis of Variance)

Compares means across multiple groups.

Example: Testing if different diets lead to different weight losses.

P-values, Significance Levels, and Practical Significance

P-values

  • P < 0.05: Reject H₀ (statistically significant result).
  • P > 0.05: Fail to reject H₀ (no strong evidence against it).

Significance Levels

  • Common values: 0.01, 0.05, 0.10
  • Lower values reduce Type I errors (false positives) but increase Type II errors (false negatives).

Practical Significance

A result may be statistically significant but not practically meaningful.

Example: A new drug reduces recovery time by 1 hour but costs significantly more.

Sample Hypothesis Tests: Users input sample data, and the system calculates test statistics and p-values.

Coding Walkthroughs: Python Implementation: Step-by-step coding examples for hypothesis testing.


from scipy.stats import ttest_ind

group1 = [23, 25, 28, 22, 30]
group2 = [32, 34, 29, 31, 28]
stat, p_value = ttest_ind(group1, group2)
print(f"T-statistic: {stat}, P-value: {p_value}")