🧭 Overview
🧠 One-sentence thesis
Epidemiologists measure disease frequency through incidence (new cases) and prevalence (existing cases), using different study designs to calculate measures of association that quantify exposure-disease relationships while accounting for random error, bias, and confounding.
📌 Key points (3–5)
- Incidence vs. prevalence: Incidence counts new cases over time (requires at-risk population); prevalence counts all existing cases at one point (includes both new and old cases).
- Study designs differ by timing: Cohorts and RCTs follow people forward (incident cases → risk/rate ratios); case-control and cross-sectional look backward or at one time point (prevalent cases → odds ratios).
- Threats to validity: Random error (quantified by p-values/confidence intervals), bias (systematic measurement/selection errors), and confounding (third variables distorting associations) all affect results.
- Common confusion: Don't confuse incidence proportion (people at risk in denominator) with incidence rate (person-time at risk in denominator); both measure new cases but handle varying follow-up times differently.
- Causality requires caution: Statistical association doesn't prove causation—diseases have multiple causes, and determining causality requires considering temporality, biological plausibility, consistency across studies, and potential confounders.
📊 Measuring disease frequency
📈 Incidence: counting new cases
Incidence: The occurrence of new cases of disease during a specified time period in a population at risk.
- Requires knowing who is at risk (disease-free at baseline)
- Always includes a time component (per year, per month, etc.)
- Two main types: incidence proportion and incidence rate
Incidence proportion (also called cumulative incidence or risk):
- Numerator: new cases during follow-up period
- Denominator: number of people at risk at the start
- Formula: (new cases) / (people at risk at baseline)
- Range: 0 to 1 (it's a proportion)
- Example: If 10 students in a 100-student dorm develop flu over one month, incidence proportion = 10/100 = 0.10 or 10% per month
Incidence rate (also called incidence density):
- Numerator: new cases during follow-up
- Denominator: sum of person-time at risk contributed by all participants
- Accounts for people entering/leaving study at different times
- More realistic but more complex to calculate
- Example: If 3 people develop disease over 87 person-months of observation, rate = 3/87 = 0.0345 per person-month
🔄 Person-time at risk calculations
- Each participant contributes time only while at risk
- Stop counting when: person develops disease, dies from competing cause, or is lost to follow-up
- Example from excerpt: Person 2 enrolled January 1, developed disease end of August → contributed 8 person-months at risk (then became a prevalent case)
Don't confuse: Person-time before study enrollment cannot be counted—we'd miss all the prevalent cases that occurred before the study started, artificially lowering our incidence estimate.
📸 Prevalence: counting existing cases
Prevalence: The proportion of a population that has a disease at a specific point in time.
- Includes both new and long-standing cases
- No time dimension (it's a snapshot)
- Formula: (all current cases) / (total population) at one time point
- Used for resource allocation, not for studying disease causes
- Example: If 80% of nursing home residents have dementia, this informs staffing needs
Relationship to incidence:
- Prevalence ≈ Incidence × Average disease duration
- If incidence rises but people die quickly, prevalence may not rise much
- If treatment improves (people live longer with disease), prevalence rises even if incidence stays flat
- HIV example from excerpt: After 1996, antiretroviral therapy allowed people to "live with HIV" → prevalence increased while incidence remained steady
⚖️ When to use each measure
Use incidence when:
- Studying causes of disease (need to know exposure came before disease)
- Evaluating prevention programs
- Disease has short duration or high fatality
Use prevalence when:
- Assessing disease burden for resource planning
- Screening program planning
- Disease is chronic with long duration
Don't confuse: Prevalence studies cannot establish causality because we don't know if exposure or disease came first—the disease may have caused the exposure rather than vice versa.
🔬 Study designs for measuring associations
🏃 Cohort studies: following people forward
Procedure:
- Select non-diseased sample from target population (at-risk individuals only)
- Measure exposure status at baseline
- Follow over time, observing incident disease
- Calculate risk ratio or rate ratio
Strengths:
- Exposure measured before disease → clear temporality
- Can study rare exposures (deliberately sample exposed individuals)
- Can assess multiple outcomes in same cohort
- Less prone to recall bias than retrospective designs
Weaknesses:
- Expensive and time-consuming
- Cannot study rare diseases (would need huge sample)
- Cannot study diseases with very long latent periods (decades of follow-up impractical)
- Loss to follow-up creates selection bias
Example: Framingham Heart Study enrolled 5,000+ adults in 1948, measured numerous exposures (diet, smoking, exercise), followed for decades, documented heart disease and stroke incidence.
🎲 Randomized controlled trials: experimental cohorts
Procedure:
- Same as cohort, but investigator randomly assigns exposure instead of measuring existing exposure
- Half get intervention, half get control/placebo
- Follow forward, measure incident disease
Strengths:
- Randomization eliminates all confounding (known, unknown, measured, unmeasured)—confounders cannot be associated with exposure if exposure is randomly assigned
- Strongest evidence for causality
- Required by FDA for drug approval
Weaknesses:
- Even more expensive than observational cohorts
- Ethical constraints (cannot randomize harmful exposures like smoking)
- Generalizability issues (people willing to be randomized may differ from general population)
- Must precisely specify the intervention (what if you choose wrong dose/duration?)
Example: Ridker trial (2005) randomized 20,000 women to aspirin vs. placebo—found aspirin prevents heart attacks in men but not women (gender is an effect modifier).
🔍 Case-control studies: working backward
Procedure:
- Select cases (people with disease) from target population
- Select controls (people without disease) from same target population
- Look backward to assess past exposures
- Calculate odds ratio
Strengths:
- Fast and cheap (no waiting for disease to develop)
- Efficient for rare diseases
- Can study diseases with long latent periods
- Can assess multiple exposures
Weaknesses:
- Prone to recall bias (cases may remember past exposures differently than controls)
- Selecting appropriate controls is difficult—must come from same population as cases
- Cannot calculate incidence (only odds ratios)
- Temporality less clear (did exposure really precede disease?)
Example: Doll and Hill's 1950 smoking/lung cancer study—recruited lung cancer patients (cases) and comparable hospital patients without lung cancer (controls), asked about past smoking.
Don't confuse: Cases are people with disease (not "people with disease who are exposed"). Both cases and controls are recruited without regard to exposure status.
📷 Cross-sectional studies: single snapshot
Procedure:
- Draw sample from target population
- Measure exposure and disease status simultaneously (at one point in time)
- Calculate odds ratio
Strengths:
- Fastest, cheapest design
- Used for surveillance (NHANES, BRFSS, PRAMS)
- Good for hypothesis generation
Weaknesses:
- Cannot determine temporality (which came first?)
- Cannot study rare exposures or rare diseases (you "get what you get")
- Limited to hypothesis generation, not causal inference
- Uses prevalent cases (mixes disease onset with disease duration)
Example: Survey asking current physical activity levels and current dementia status—cannot tell if inactivity led to dementia or dementia led to inactivity.
📐 Measures of association
🔢 Risk ratio and rate ratio (RR)
- Used for cohort studies and RCTs (designs that measure incidence)
- Formula: (Incidence in exposed) / (Incidence in unexposed)
- Null value = 1.0 (no association)
- RR > 1 means exposure increases disease risk
- RR < 1 means exposure decreases disease risk (protective)
Interpretation template: "The risk [or rate] of [disease] was [RR] times as high in [exposed] compared to [unexposed] over [time period]."
Example: RR = 2.27 → "The risk of hypertension was 2.27 times as high in smokers compared to nonsmokers over 10 years."
Don't confuse: "2 times as high" ≠ "2 times higher" (the latter would be RR = 3.0, since null is 1.0 not 0).
🎲 Odds ratio (OR)
- Used for case-control and cross-sectional studies (designs using prevalent cases)
- Formula: (odds of disease in exposed) / (odds of disease in unexposed)
- For 2×2 table with cells A, B, C, D: OR = (A×D) / (B×C)
- Null value = 1.0
- Interpretation same as RR: "The odds of [disease] were [OR] times as high in [exposed] compared to [unexposed]."
OR vs. RR:
- OR always further from null than RR for same data
- When disease is rare (<5% prevalence), OR approximates RR
- When disease is common, OR exaggerates the association
- Some cohort studies incorrectly report OR instead of RR (due to statistical software defaults)
➖ Risk difference (absolute measure)
- Formula: (Incidence in exposed) − (Incidence in unexposed)
- Keeps units (e.g., "42 per 100 per 10 years")
- Shows absolute impact, not relative
- Interpretation: "Over [time], the excess number of cases attributable to [exposure] is [RD]; the remaining [baseline incidence] would have occurred anyway."
Why it matters: A RR of 0.5 sounds impressive, but if baseline risk is 1 in a million and exposed risk is 2 in a million, the absolute difference is tiny—probably not worth a public health intervention.
🎯 Threats to validity
🎲 Random error
Random error: Unpredictable measurement variability present in all data; quantified by p-values and confidence intervals.
- Not the same as bias (which is systematic)
- Comes from imperfect measurement tools, human error, natural variability
- Cannot be eliminated, only quantified
p-values:
- Probability of getting your data (or more extreme) if the null hypothesis is true
- p ≤ 0.05 = "statistically significant" (arbitrary cutoff)
- Does NOT tell you the probability that the null hypothesis is true
- Only meaningful when working with samples (not whole populations)
Confidence intervals (95% CI):
- If you repeated the study 100 times, 95 of those CIs would contain the true population value
- If CI excludes null value (0 for differences, 1.0 for ratios) → p < 0.05
- Narrower CI = more precise estimate (usually from larger sample)
Type I error (α): False positive—rejecting null when it's actually true (set at 5% by using p < 0.05 cutoff)
Type II error (β): False negative—failing to reject null when it's actually false
Power = 1 − β: Probability of detecting an association if one truly exists (ideally ≥90%)
🔀 Bias (systematic error)
Selection bias:
- Sample not representative of target population (affects external validity—who can you generalize to?)
- Exposed and unexposed come from different populations (affects internal validity—results are wrong)
- Different participation or loss-to-follow-up rates between groups
- Healthy worker bias: workers are healthier than general population
Misclassification bias:
- Measuring exposure or disease incorrectly → people in wrong boxes of 2×2 table
- Nondifferential: Same error rate in all groups (usually biases toward null, but not always)
- Differential: Error rate differs by group (fatal to internal validity)
- Includes recall bias, social desirability bias, interviewer bias
Assessing misclassification: Ask "Can people tell me this?" (if no, stop). Then ask "Will people tell me this?" (if no, expect bias).
Publication bias: Studies with exciting results more likely to be published → literature review shows biased picture.
Don't confuse: Bias cannot be fixed with statistics (unlike random error). Must prevent through good study design.
🔗 Confounding
Confounder: A third variable (not exposure, not outcome) that distorts the true exposure-disease association.
Three criteria for potential confounder:
- Statistically associated with exposure (disproportionately distributed between exposed/unexposed)
- Causes the outcome (or plausibly could)
- NOT on causal pathway (exposure doesn't cause confounder)
Example: In study of foot size and reading ability among elementary students, grade level is a confounder—higher grades have bigger feet (criterion 1) and better reading (criterion 2), but foot size doesn't cause grade level (criterion 3).
Controlling for confounding:
Design phase:
- Restriction (limit sample to one level of confounder—e.g., only 3rd graders)
- Matching (in case-control studies, match each case to control with same confounder value)
- Randomization (in RCTs, random assignment breaks confounder-exposure link)
Analysis phase:
- Stratification (make separate 2×2 tables for each confounder level, calculate stratum-specific measures)
- Regression (automated stratification accounting for all confounder levels)
Detecting confounding: If crude and adjusted measures differ by >10%, and crude doesn't fall between stratum-specific measures → confounding present → report adjusted measure.
🔄 Effect modification
Effect modification (interaction): The exposure-disease association differs across levels of a third variable.
Detecting effect modification: Stratum-specific measures are different from each other, AND crude measure falls between them.
Example: Sleep and GPA study—among men, <8 hours sleep associated with higher GPA (RR=0.68); among women, <8 hours sleep associated with lower GPA (RR=1.7). Gender is an effect modifier.
Reporting: Present stratum-specific measures separately (don't calculate adjusted measure—the interesting finding IS that groups differ).
Confounding vs. effect modification:
| Aspect | Confounding | Effect modification |
|---|
| Stratum-specific measures | Similar to each other | Different from each other |
| Crude measure | Outside stratum-specific range | Between stratum-specific measures |
| What to report | Adjusted measure | Stratum-specific measures |
| Interpretation | Covariable distorts association | Association truly differs by covariable |
Don't confuse: Same variable can theoretically be both confounder and effect modifier (rare in practice).
🧬 Causality
🥧 Multiple causes of disease
Three tenets:
- All diseases have multiple causes (no single "the" cause)
- Not all causes act at the same time
- Many different combinations of causes can produce the same disease
Jar analogy: Each person starts with a jar (size determined by genetics, early life). Harmful exposures add liquid; protective exposures drain liquid. Disease begins when jar fills to top. Different people have different-sized jars and encounter different exposures → many paths to same disease.
Implications:
- Don't need to identify all causes before taking action (eliminating one cause prevents some cases)
- "Strength" of causes is population-dependent (if we eliminate smoking, radon will look like stronger cause of lung cancer)
- Attributable fractions sum to >100% (not useful)
✅ Determining causality in epidemiology
Process:
- First establish association is real (not due to bias, confounding, or random error)
- Then assess causality using considerations like:
- Temporality (exposure preceded disease?)
- Biological plausibility (known mechanism?)
- Consistency (multiple studies reach same conclusion?)
- Dose-response (more exposure → more disease?)
- Experimental evidence (RCT results?)
Hill's considerations: Not a checklist, but things to think about. Some apply better to infectious diseases (specificity) than chronic diseases (smoking causes multiple diseases, not just one).
RCTs provide strongest causal evidence (if well-conducted) because randomization eliminates confounding. But many research questions cannot be studied with RCTs (ethical/practical constraints) → rely on converging evidence from multiple observational studies.
Don't confuse: Statistical association ≠ causation. Epidemiologists carefully use non-causal language ("associated with," "evidence suggests") until field reaches consensus.
🩺 Screening and diagnostic testing
🔍 Screening vs. diagnosis
Screening: Testing asymptomatic population to find early disease
- Requires test that detects pre-symptomatic disease
- Disease must be common enough or serious enough to justify cost
- Most useful when critical point (point beyond which treatment doesn't help) falls between screening detection and symptom onset
Diagnostic testing: Testing symptomatic patient to determine which condition they have
- Part of differential diagnosis process
- Order of tests depends on disease severity, test costs, disease prevalence
Same test, different context: Mammogram is screening test if no symptoms; diagnostic test if patient found a lump.
📊 Test characteristics (fixed)
Sensitivity (Sn): Probability test is positive given person has disease
- Formula: (True positives) / (True positives + False negatives)
- High sensitivity → few false negatives
- SnOUT: High sensitivity test, when negative, rules OUT disease
- Screening programs use high-sensitivity tests (don't want to miss cases)
Specificity (Sp): Probability test is negative given person doesn't have disease
- Formula: (True negatives) / (False positives + True negatives)
- High specificity → few false positives
- SpIN: High specificity test, when positive, rules IN disease
Don't confuse: Sensitivity and specificity are fixed (don't change with disease prevalence).
🎯 Predictive values (vary with prevalence)
Positive predictive value (PPV): Probability person has disease given positive test
- Formula: (True positives) / (True positives + False positives)
- Used to interpret positive test result
- Decreases as prevalence decreases
Negative predictive value (NPV): Probability person doesn't have disease given negative test
- Formula: (True negatives) / (False negatives + True negatives)
- Used to interpret negative test result
- Increases as prevalence decreases
Example: Mammography in 40-year-old women—breast cancer prevalence is 0.98%, so PPV is <1% → >99% of women sent for biopsy are false positives. In high-risk women (BRCA carriers), prevalence is higher → PPV is higher → screening more useful.
Don't confuse: Must know disease prevalence in the relevant population to interpret PPV/NPV. Same test result means different things in different populations.
This document covers the foundational concepts of epidemiology including disease frequency measurement, study design selection, statistical inference, bias assessment, confounding control, and causal reasoning—all essential for critically reading and applying epidemiologic research to public health practice.