HomeUSMLE Step 1Biostatistics & Epidemiology

Master Biostatistics & Epidemiology
for USMLE Step 1

Access 30+ high-yield questions tailored for the 2026 syllabus. Includes AI-powered explanations and performance tracking.

Start Free Practice View Full Syllabus
K
Medically reviewed by Dr. Kainat Bashir — MBBS, MCPS (Emergency Medicine), MRCP (UK)
GMC,AMC,Board Certified · Reviewed Jun 2026 · Editorial policy
HIGH YIELD NOTES Updated June 2026 · ~5 min read

What the USMLE Step 1 Tests in Biostatistics & Epidemiology

USMLE Step 1 Biostatistics & Epidemiology tests the application of statistical reasoning to clinical decision-making. Candidates must interpret study designs (RCT, cohort, case-control, cross-sectional) and their biases (selection, information, confounding). They calculate and apply sensitivity, specificity, predictive values, likelihood ratios, and number needed to treat/harm. They identify measures of disease frequency (incidence, prevalence) and association (relative risk, odds ratio, attributable risk). They understand statistical significance (p-value, confidence intervals), power, and Type I/II errors. Candidates also evaluate diagnostic test performance using ROC curves and Bayes' theorem. Clinical scenarios involve screening (e.g., mammography for breast cancer, PSA for prostate cancer), outbreak investigations, and therapeutic trials (e.g., statins for MI prevention). They must apply intention-to-treat analysis, survival curves, and Kaplan-Meier estimates. Knowledge of ethical principles (informed consent, IRB) and bias reduction (randomization, blinding, stratification) is essential.

High-Yield Concepts

  • Sensitivity and Specificity: Sensitivity = TP/(TP+FN) – the ability to detect disease when present; high sensitivity rules out disease (SnNOut). Specificity = TN/(TN+FP) – the ability to rule out disease when absent; high specificity rules in disease (SpPIn). Example: D-dimer for DVT – high sensitivity (low D-dimer rules out DVT), but poor specificity. Mammography for breast cancer: sensitivity ~85%, specificity ~90%.
  • Positive and Negative Predictive Values: PPV = TP/(TP+FP) – probability disease is present given a positive test; depends on disease prevalence. NPV = TN/(TN+FN) – probability disease is absent given a negative test. For a test with 95% sensitivity and 95% specificity in a population with 1% prevalence (e.g., HIV screening), PPV is only ~16%, meaning many positives are false. NPV remains high (~99.9%).
  • Likelihood Ratios: LR+ = sensitivity/(1-specificity) – how much a positive test increases the odds of disease. LR- = (1-sensitivity)/specificity – how much a negative test decreases odds. LR+ >10 or LR- <0.1 provide strong diagnostic evidence. Example: LR+ of 10 for elevated troponin in MI means post-test odds are 10 times pre-test odds. Use Fagan nomogram for quick calculation.
  • Number Needed to Treat (NNT) and Number Needed to Harm (NNH): NNT = 1/absolute risk reduction (ARR). Example: Statin therapy for primary prevention of MI: 5-year risk of MI 10% in placebo vs 7% in treatment → ARR = 3% → NNT = 33 (treat 33 patients for 5 years to prevent 1 MI). NNH = 1/attributable risk increase (e.g., major bleeding with warfarin). Lower NNT indicates more effective treatment.
  • Relative Risk and Odds Ratio: Relative risk (RR) = risk in exposed/risk in unexposed – used in cohort studies. RR = 1 (no association), >1 (risk factor), <1 (protective). Odds ratio (OR) = odds of exposure in cases/odds in controls – used in case-control studies. For rare diseases (<10%), OR approximates RR. Example: Smoking and lung cancer – OR ~20 in case-control studies, RR ~20 in cohort studies.
  • Confounding and Bias: Confounding: a third variable associated with both exposure and outcome (e.g., age confounding the link between grey hair and heart disease). Control via randomization, stratification, or multivariable analysis. Selection bias: e.g., Berkson's bias in hospital-based case-control studies. Information bias: recall bias (cases remember exposures better) or observer bias. Minimize with blinding and standardized data collection.
  • Study Designs and Hierarchy of Evidence: RCT (gold standard for efficacy) – double-blind, placebo-controlled, e.g., randomized trial of ACE inhibitors in heart failure. Cohort study – follow exposed and unexposed over time (e.g., Framingham Heart Study). Case-control – compare past exposures (e.g., thalidomide and phocomelia). Cross-sectional – prevalence survey (e.g., NHANES). Systematic reviews/meta-analyses (e.g., Cochrane reviews) sit atop the hierarchy.
  • Survival Analysis and Kaplan-Meier Curves: Kaplan-Meier estimates survival probability over time, accounting for censoring (e.g., patients lost to follow-up). Log-rank test compares survival curves between groups (e.g., chemotherapy vs placebo for ovarian cancer). Hazard ratio = ratio of hazard rates (instantaneous risk of event). Example: HR of 0.7 for new drug means 30% reduction in hazard of death over study period.

Common Traps in Biostatistics & Epidemiology Questions

  • Confusing prevalence with incidence: prevalence = existing cases/total population at a point in time; incidence = new cases/person-time at risk.
  • Assuming PPV is constant: PPV decreases as disease prevalence decreases, even if sensitivity and specificity are high.
  • Mistaking odds ratio for relative risk: OR overestimates RR when disease prevalence >10%.
  • Thinking p-value <0.05 means the null hypothesis is false: it only indicates probability of observed data if null were true; it does not measure clinical significance.
  • Forgetting that intention-to-treat analysis includes all randomized patients regardless of protocol adherence; per-protocol analysis may overestimate treatment effect.
  • Believing a test with high sensitivity is always good for screening: it must also have reasonable specificity to avoid excessive false positives (e.g., low-specificity tests cause harm from overdiagnosis).

How to Revise Biostatistics & Epidemiology for the USMLE Step 1

Focus on applying formulas to clinical vignettes: calculate sensitivity, specificity, PPV, NPV, NNT, ARR, RR, OR from 2x2 tables. Practice interpreting confidence intervals (if CI includes 1 for RR/OR, not significant; if includes 0 for risk difference, not significant). Know the hierarchy of evidence and common biases (e.g., recall bias in case-control, lead-time bias in screening). Questions often present a study design and ask to identify the type or a flaw. Memorize key cut-offs: p<0.05, CI 95%, alpha=0.05, beta=0.20 (power=80%). Use the mnemonic 'Definitive Diagnosis' for SpPIn/SnNOut. Review screening criteria (Wilson & Jungner) and ethical principles (Belmont Report). Practice with USMLE-style vignettes that mix epidemiology with clinical decision-making (e.g., what test to order next based on LR). Prioritize understanding Bayes' theorem and how pre-test probability shifts with test results.

Practise it: MedLumen has 30 Biostatistics & Epidemiology questions for the USMLE Step 1, each with a full explanation and references.

Sample Practice Questions

Question 1 FULLY WORKED EXAMPLE

A 65-year-old male presents with chronic obstructive pulmonary disease (COPD). A local health department report indicates that 15% of individuals over 60 years old in the community currently have COPD. Which epidemiological measure is best described by this report?

A) Mortality rate
B) Incidence rate
C) Relative risk
D) Prevalence ✓ Correct
Explanation:
Correct Answer Analysis: Prevalence refers to the proportion of a population that has a specific disease or attribute at a given point in time or over a specified period. The statement '15% of individuals over 60 years old in the community currently have COPD' directly describes the point prevalence of COPD in that age group.

Incorrect Options:
  • A: Incidence rate measures the rate at which new cases of a disease occur in a population at risk over a specified period. The report describes existing cases, not new ones.
  • C: Mortality rate measures the frequency of death in a defined population during a specified interval. The report discusses the presence of a disease, not deaths.
  • D: Relative risk (or risk ratio) compares the risk of an event in an exposed group to the risk in an unexposed group. This report provides a single proportion for a population, not a comparison between groups.
Question 2 TRY IT — TAP AN ANSWER

A 45-year-old female presents to her physician with persistent fatigue and joint pain. To investigate a potential autoimmune condition, a new serological test is performed. The test has a sensitivity of 90% and a specificity of 80%. In her community, the prevalence of this autoimmune condition is 5%. Her test result returns positive. What is the most important value to consider when interpreting her positive test result in this clinical context?

A) Specificity
B) Positive Predictive Value
C) Sensitivity
D) Negative Predictive Value
💡 Pick an answer above to see if you're right — the full explanation unlocks instantly.
Question 3 TRY IT — TAP AN ANSWER

A study aims to investigate the association between caffeine consumption and the risk of developing hypertension in a cohort of young adults. Researchers find a significant positive association. However, they realize that many participants with high caffeine intake also tend to smoke more, and smoking is a known risk factor for hypertension. Smoking status was not accounted for in their initial analysis. This unmeasured factor (smoking) in the study described above is most likely acting as what?

A) Confounder
B) Selection bias
C) Effect modifier
D) Information bias
💡 Pick an answer above to see if you're right — the full explanation unlocks instantly.
Question 4 TRY IT — TAP AN ANSWER

A randomized controlled trial investigates a new oral hypoglycemic agent for type 2 diabetes. The study reports that the mean reduction in HbA1c in the treatment group was 0.8% (95% Confidence Interval: 0.5% to 1.1%) compared to placebo, with a p-value of 0.001. A clinically significant reduction in HbA1c is generally considered to be 0.7% or greater. Based on these results, which of the following is the most accurate conclusion?

A) The new agent is statistically and clinically effective.
B) The new agent is neither statistically nor clinically effective.
C) The p-value indicates no real difference in HbA1c reduction.
D) The new agent is statistically effective, but not clinically significant.
💡 Pick an answer above to see if you're right — the full explanation unlocks instantly.
Question 5 TRY IT — TAP AN ANSWER

A cohort study follows 10,000 participants for 10 years to assess the long-term risk of cardiovascular disease (CVD) associated with different dietary patterns. Over the study period, 800 participants are lost to follow-up. These individuals were more likely to be from lower socioeconomic backgrounds and have pre-existing risk factors for CVD compared to those who completed the study. The researchers used Kaplan-Meier survival analysis to estimate the cumulative incidence of CVD. What is the most significant methodological challenge introduced by the loss to follow-up in this study, and how might it bias the results?

A) Information bias, leading to an overestimation of CVD risk.
B) Confounding, leading to an underestimation of CVD risk.
C) Selection bias (attrition bias), leading to an underestimation of CVD risk.
D) Measurement bias, leading to an overestimation of CVD risk.
💡 Pick an answer above to see if you're right — the full explanation unlocks instantly.

Want 30+ more Biostatistics & Epidemiology questions?

Start Free — No Card Needed

USMLE Step 1

  • ✓ 30+ Biostatistics & Epidemiology Questions
  • ✓ AI Tutor Assistance
  • ✓ Detailed Explanations
  • ✓ Performance Analytics
Get Full Access

Biostatistics & Epidemiology Questions for USMLE Step 1 — FAQ

How many Biostatistics & Epidemiology questions does MedLumen have for USMLE Step 1?

MedLumen currently has 30+ Biostatistics & Epidemiology practice questions for USMLE Step 1, each with a detailed explanation so you understand the reasoning behind every answer.

Are the Biostatistics & Epidemiology questions updated for the 2026 USMLE Step 1 syllabus?

Yes. Our Biostatistics & Epidemiology questions are mapped to the latest USMLE Step 1 blueprint and reviewed regularly so they stay aligned with the current 2026 syllabus.

Can I practise Biostatistics & Epidemiology questions for free?

You can preview sample Biostatistics & Epidemiology questions for free. A MedLumen subscription unlocks all 30+ Biostatistics & Epidemiology questions, full answer explanations, and performance analytics for USMLE Step 1.

How should I revise Biostatistics & Epidemiology for USMLE Step 1?

Practise Biostatistics & Epidemiology questions in timed blocks, read the explanation for every answer (right or wrong), and use MedLumen's analytics to revisit your weak areas until your accuracy is consistently high.

Prepare for USMLE Step 1 with MedLumen →