Home › USMLE Step 1 › Biostatistics & Epidemiology

Master Biostatistics & Epidemiology
for USMLE Step 1

Access 30+ high-yield questions tailored for the 2026 syllabus. Includes AI-powered explanations and performance tracking.

Start Free Practice View Full Syllabus

HIGH YIELD NOTES ~5 min read

Core Concepts

Study Designs:
- Meta-analysis/Systematic Review: Highest evidence, combines multiple studies.
- Randomized Controlled Trial (RCT): Gold standard for intervention, minimizes confounding via randomization.
- Cohort Study: Observational, follows exposed/unexposed groups forward. Measures Incidence, RR.
- Case-Control Study: Observational, retroactively compares exposure in cases vs. controls. Measures OR. Prone to recall bias.
- Cross-Sectional Study: Observational, measures exposure & outcome at single point. Measures Prevalence.
Measures of Association:
- Relative Risk (RR): (Risk in exposed) / (Risk in unexposed). Cohort studies.
- Odds Ratio (OR): (Odds of exposure in cases) / (Odds of exposure in controls). Case-Control studies.
- Hazard Ratio (HR): Ratio of hazard rates, survival analysis.
- RR, OR, HR > 1: Increased risk; < 1: Decreased risk; = 1: No association.
Measures of Disease Frequency:
- Incidence: New cases / population at risk over a period.
- Prevalence: All existing cases / total population at a point. (P = I x Duration).
Measures of Effect/Impact:
- Absolute Risk Reduction (ARR): Risk_unexposed - Risk_exposed.
- Relative Risk Reduction (RRR): (Risk_unexposed - Risk_exposed) / Risk_unexposed = 1 - RR.
- Number Needed to Treat (NNT): 1 / ARR. Round UP.
- Number Needed to Harm (NNH): 1 / AR_increase. Round DOWN.
Screening Tests:
- Sensitivity: TP / (TP + FN). SnNout (high Sens, Neg rules OUT disease).
- Specificity: TN / (TN + FP). SpPin (high Spec, Pos rules IN disease).
- Positive Predictive Value (PPV): TP / (TP + FP). Probability of disease given positive test. Varies with prevalence.
- Negative Predictive Value (NPV): TN / (TN + FN). Probability of no disease given negative test. Varies with prevalence.
- Likelihood Ratios: +LR = Sens / (1-Spec); -LR = (1-Sens) / Spec. Less prevalence-dependent.
Bias: Systematic error.
- Selection Bias: Differences in comparison groups (e.g., attrition, healthy worker effect).
- Information Bias: Errors in measurement (e.g., recall bias, observer bias).
- Confounding Bias: Third variable associated with both exposure & outcome, not on causal pathway. Control via randomization, matching, stratification, multivariate analysis.
- Effect Modification: Exposure effect differs across subgroups. Not a bias.
Statistical Errors:
- Type I Error (Alpha, α): False Positive (rejecting true H0). P-value is probability of Type I error.
- Type II Error (Beta, β): False Negative (failing to reject false H0).
- Power (1-β): Probability of correctly rejecting false H0. Increases with sample size, effect size, decreased α.
Hypothesis Testing:
- Null Hypothesis (H0): No difference/association.
- P-value: Probability of observing data if H0 is true. If p < α, reject H0.
- Confidence Interval (CI): Range of plausible values. If CI for RR/OR/HR includes 1, or for mean difference includes 0, NOT statistically significant.
Ethics: Beneficence, Non-maleficence, Autonomy (informed consent), Justice. Institutional Review Board (IRB).
Causality: Bradford Hill Criteria (Temporality essential).

Clinical Presentation

USMLE questions present a research scenario, study design, or statistical results.
Vignettes describe new drugs, screening tests, or risk factors in a population context.
Focus on identifying study design, key variables (exposure, outcome), and population.
Requires interpreting 2x2 tables, Kaplan-Meier curves, forest plots, and statistical outputs (p-values, CIs, RR, OR).

Diagnosis (Gold Standard)

Study Validity: RCTs with appropriate blinding and power for interventions. Prospective cohort for prognosis/risk factors.
Bias Identification: Analyze methodology for selection (e.g., differential attrition) or information bias (e.g., recall bias). Identify confounding variables.
Causality: Requires strong evidence, especially temporality, and ideally fulfillment of Bradford Hill Criteria.
Statistical Significance: P-value < 0.05 (or specified alpha) AND CI not crossing the null value (1 for RR/OR/HR, 0 for mean difference).
Clinical Significance: A statistically significant finding must also be clinically meaningful (e.g., NNT, magnitude of ARR).

Management (First Line)

Identify Study Design: Crucial first step to determine appropriate measures, common biases, and generalizability.
Identify Exposure and Outcome: Clearly define the variables of interest.
Evaluate Internal Validity: Check for bias and confounding. Was the study conducted correctly?
Evaluate External Validity: Can results be generalized to the broader population?
Interpret Statistical Results: Look at p-values AND Confidence Intervals for significance. Interpret RR/OR/HR values.
Screening Tests: Understand how sensitivity, specificity, PPV, NPV, and likelihood ratios behave with changing prevalence/thresholds.
Ethics Questions: Apply Beneficence, Non-maleficence, Autonomy, Justice.
Calculations: Be prepared for 2x2 table calculations (RR, OR, Sensitivity, Specificity, etc.).

Exam Red Flags

Confounding vs. Effect Modification: Confounder *explains* association; effect modifier *changes magnitude/direction*.
P-value vs. Clinical Significance: Small p ≠ always important. Large p ≠ no effect (low power).
Correlation ≠ Causation: Crucial trap. Observational studies rarely prove causation.
Bias identification: Differential loss-to-follow-up (selection bias), retrospective exposure assessment (recall bias), unequal measurement between groups (observer/interviewer bias).
Confidence Interval crossing the null: If CI for RR/OR/HR includes 1, or for mean difference includes 0, the result is NOT statistically significant.
Low Power Signs: Small sample size, wide CI, high p-value despite a plausible effect.
Screening Biases:
- Lead-time bias: Earlier detection without prolonging life.
- Length-time bias: Screening detects slower-progressing diseases, making outcomes look better.
- Volunteer bias: Screened population is often healthier.
Violation of Ethical Principles: Lack of informed consent, coercion, exploitation, inadequate risk-benefit assessment, unjust distribution.

Sample Practice Questions

Question 1

A cohort study follows 10,000 participants for 10 years to assess the long-term risk of cardiovascular disease (CVD) associated with different dietary patterns. Over the study period, 800 participants are lost to follow-up. These individuals were more likely to be from lower socioeconomic backgrounds and have pre-existing risk factors for CVD compared to those who completed the study. The researchers used Kaplan-Meier survival analysis to estimate the cumulative incidence of CVD. What is the most significant methodological challenge introduced by the loss to follow-up in this study, and how might it bias the results?

A) Information bias, leading to an overestimation of CVD risk.

B) Confounding, leading to an underestimation of CVD risk.

C) Selection bias (attrition bias), leading to an underestimation of CVD risk.

D) Measurement bias, leading to an overestimation of CVD risk.

Explanation: This area is hidden for preview users.

Question 2

In a city with a population of 800,000, 16,000 individuals are known to have a specific chronic neurological condition at the beginning of 2023. Throughout 2023, 400 new cases of this condition are diagnosed among the population that was initially free of the disease.

A) 20 per 100,000

B) 500 per 100,000

C) 50 per 100,000

D) 2,000 per 100,000

Explanation: This area is hidden for preview users.

Question 3

A retrospective study investigated the association between a specific childhood vaccination and the development of an autoimmune disease in adulthood. Researchers found a statistically significant positive association. However, a critic pointed out that parents of children who developed the autoimmune disease were more likely to accurately recall and report all vaccinations their child received, compared to parents of healthy children. What type of bias is most likely affecting the results of this study?

A) Selection bias

B) Confounding bias

C) Lead-time bias

D) Recall bias

Explanation: This area is hidden for preview users.