Foundations of Epidemiology

1

Distribution in Epidemiology

Distribution

🧭 Overview

🧠 One-sentence thesis

Epidemiologists study the distribution of disease across person, place, and time because disease is not random, and understanding these patterns enables targeted prevention and diagnosis.

📌 Key points (3–5)

  • What distribution means: describing the pattern of illness in terms of person (who gets sick), place (where), and time (when/trends).
  • Why distribution matters: disease is not randomly distributed, so identifying patterns reveals who is at greatest risk and guides prevention efforts.
  • Person patterns: examining characteristics like age, sex, and shared traits among those who do or do not experience the health outcome.
  • Place and time patterns: geographic variation (e.g., proximity to pollution, climate) and temporal trends (e.g., rising cesarean rates) inform resource allocation and intervention priorities.
  • Common confusion: distribution is descriptive (the "who/where/when"), while determinants address causation (the "why")—both are essential but distinct parts of epidemiology.

👥 Person: Who Gets Sick?

👥 Identifying affected groups

  • Epidemiologists ask: Who is getting sick? What characteristics do affected individuals share?
  • Key demographics include sex, age, occupation, proximity to environmental factors, etc.
  • Equally important: What do people who do not experience the outcome have in common?

📊 Example: Circadian sleep disorder

The excerpt provides a WHO data example showing:

  • Much more common in males than females
  • Most common in adult males in their 20s and 30s
  • Among females, most common in adolescents and young adults

Why this matters:

  • Clinical use: A 5-year-old female with sleep troubles is unlikely to have circadian sleep disorder based on distribution patterns.
  • Public health use: Health education campaigns about sleep habits would be most cost-effective if targeted at young and middle-aged adult males.

🎯 Resource allocation principle

Knowing patterns of disease helps public health departments plan what to do with their limited resources.

  • Distribution data reveals where interventions will get the most "bang for the buck."

🗺️ Place: Geographic Patterns

🗺️ Why location matters

Geographic location has direct implications for health through:

  • Environmental exposures (e.g., families living near a polluting factory)
  • Climate and disease vectors (e.g., mosquito species that carry specific diseases)
  • Regional health behaviors and cultural practices

🦟 Infectious disease example

  • Malaria, dengue, and Zika occur in the tropics where Aedes aegypti mosquitoes live
  • These diseases do not occur in colder regions with different mosquito species
  • A physician in Maine would not diagnose malaria unless the patient had recently traveled to the tropics

🚗 Health behavior example: Seat belt use

The CDC data on U.S. seat belt use shows wide geographic variation:

  • A health department in Minnesota (high seat belt use) probably does not need to spend resources encouraging seat belt use
  • In Montana (lower use), this might be an excellent use of resources

Don't confuse: Place patterns reflect both environmental factors (pollution, vectors) and behavioral/cultural factors (seat belt use)—both are valid aspects of geographic distribution.

⏰ Time: Trends and Changes

⏰ Temporal patterns

The final factor in descriptive epidemiology is time: How is the distribution of disease changing over time?

📈 Example: Cesarean section rates

Data from New South Wales, Australia (1994–2010):

  • Red line (predicted): Cesarean rate expected based on pregnancy risk factors (maternal height, blood pressure, multiple pregnancy) rose only slightly
  • Blue line (actual): Actual rate rose much more quickly than expected
  • This unexpected jump is observed worldwide, not just in Australia

Why this matters:

  • Cesarean surgery carries risks (major surgery always has risks)
  • The gap between predicted and actual rates is alarming
  • Reducing cesarean rates is now a top priority for obstetricians, midwives, and public health officials worldwide

🔍 Time trends inform action

Understanding temporal changes helps identify:

  • Emerging health problems requiring intervention
  • Whether current risk factors fully explain observed patterns
  • Priorities for clinical and public health efforts

🧬 Distribution vs. Determinants

🧬 The distinction

ConceptWhat it addressesFocus
DistributionWho, where, whenDescriptive patterns (person, place, time)
DeterminantsWhyCauses or prevention factors

🧬 What determinants mean

A cause (determinant) is anything that changes the likelihood that an individual will become diseased.

  • In epidemiology, "cause" means "cause or prevent"
  • A determinant increases risk (e.g., smoking) or decreases risk (e.g., exercise)
  • Both are considered "causes" because both alter disease risk

🧬 Types of determinants

Determinants can be:

  • Behaviors (smoking, exercise)
  • Demographics (age, sex)
  • Genetics
  • Environmental contaminants
  • Any factor that alters disease risk

Etiology: Collectively, all determinants of a disease.

Don't confuse: Distribution (descriptive epidemiology) is almost always a necessary first step before investigating determinants (why the patterns exist).

2

Foundations of Epidemiology: Incidence, Prevalence, and Study Design

Person

🧭 Overview

🧠 One-sentence thesis

Epidemiologists measure disease frequency through incidence (new cases) and prevalence (existing cases), using different study designs to calculate measures of association that quantify exposure-disease relationships while accounting for random error, bias, and confounding.

📌 Key points (3–5)

  • Incidence vs. prevalence: Incidence counts new cases over time (requires at-risk population); prevalence counts all existing cases at one point (includes both new and old cases).
  • Study designs differ by timing: Cohorts and RCTs follow people forward (incident cases → risk/rate ratios); case-control and cross-sectional look backward or at one time point (prevalent cases → odds ratios).
  • Threats to validity: Random error (quantified by p-values/confidence intervals), bias (systematic measurement/selection errors), and confounding (third variables distorting associations) all affect results.
  • Common confusion: Don't confuse incidence proportion (people at risk in denominator) with incidence rate (person-time at risk in denominator); both measure new cases but handle varying follow-up times differently.
  • Causality requires caution: Statistical association doesn't prove causation—diseases have multiple causes, and determining causality requires considering temporality, biological plausibility, consistency across studies, and potential confounders.

📊 Measuring disease frequency

📈 Incidence: counting new cases

Incidence: The occurrence of new cases of disease during a specified time period in a population at risk.

  • Requires knowing who is at risk (disease-free at baseline)
  • Always includes a time component (per year, per month, etc.)
  • Two main types: incidence proportion and incidence rate

Incidence proportion (also called cumulative incidence or risk):

  • Numerator: new cases during follow-up period
  • Denominator: number of people at risk at the start
  • Formula: (new cases) / (people at risk at baseline)
  • Range: 0 to 1 (it's a proportion)
  • Example: If 10 students in a 100-student dorm develop flu over one month, incidence proportion = 10/100 = 0.10 or 10% per month

Incidence rate (also called incidence density):

  • Numerator: new cases during follow-up
  • Denominator: sum of person-time at risk contributed by all participants
  • Accounts for people entering/leaving study at different times
  • More realistic but more complex to calculate
  • Example: If 3 people develop disease over 87 person-months of observation, rate = 3/87 = 0.0345 per person-month

🔄 Person-time at risk calculations

  • Each participant contributes time only while at risk
  • Stop counting when: person develops disease, dies from competing cause, or is lost to follow-up
  • Example from excerpt: Person 2 enrolled January 1, developed disease end of August → contributed 8 person-months at risk (then became a prevalent case)

Don't confuse: Person-time before study enrollment cannot be counted—we'd miss all the prevalent cases that occurred before the study started, artificially lowering our incidence estimate.

📸 Prevalence: counting existing cases

Prevalence: The proportion of a population that has a disease at a specific point in time.

  • Includes both new and long-standing cases
  • No time dimension (it's a snapshot)
  • Formula: (all current cases) / (total population) at one time point
  • Used for resource allocation, not for studying disease causes
  • Example: If 80% of nursing home residents have dementia, this informs staffing needs

Relationship to incidence:

  • Prevalence ≈ Incidence × Average disease duration
  • If incidence rises but people die quickly, prevalence may not rise much
  • If treatment improves (people live longer with disease), prevalence rises even if incidence stays flat
  • HIV example from excerpt: After 1996, antiretroviral therapy allowed people to "live with HIV" → prevalence increased while incidence remained steady

⚖️ When to use each measure

Use incidence when:

  • Studying causes of disease (need to know exposure came before disease)
  • Evaluating prevention programs
  • Disease has short duration or high fatality

Use prevalence when:

  • Assessing disease burden for resource planning
  • Screening program planning
  • Disease is chronic with long duration

Don't confuse: Prevalence studies cannot establish causality because we don't know if exposure or disease came first—the disease may have caused the exposure rather than vice versa.

🔬 Study designs for measuring associations

🏃 Cohort studies: following people forward

Procedure:

  1. Select non-diseased sample from target population (at-risk individuals only)
  2. Measure exposure status at baseline
  3. Follow over time, observing incident disease
  4. Calculate risk ratio or rate ratio

Strengths:

  • Exposure measured before disease → clear temporality
  • Can study rare exposures (deliberately sample exposed individuals)
  • Can assess multiple outcomes in same cohort
  • Less prone to recall bias than retrospective designs

Weaknesses:

  • Expensive and time-consuming
  • Cannot study rare diseases (would need huge sample)
  • Cannot study diseases with very long latent periods (decades of follow-up impractical)
  • Loss to follow-up creates selection bias

Example: Framingham Heart Study enrolled 5,000+ adults in 1948, measured numerous exposures (diet, smoking, exercise), followed for decades, documented heart disease and stroke incidence.

🎲 Randomized controlled trials: experimental cohorts

Procedure:

  • Same as cohort, but investigator randomly assigns exposure instead of measuring existing exposure
  • Half get intervention, half get control/placebo
  • Follow forward, measure incident disease

Strengths:

  • Randomization eliminates all confounding (known, unknown, measured, unmeasured)—confounders cannot be associated with exposure if exposure is randomly assigned
  • Strongest evidence for causality
  • Required by FDA for drug approval

Weaknesses:

  • Even more expensive than observational cohorts
  • Ethical constraints (cannot randomize harmful exposures like smoking)
  • Generalizability issues (people willing to be randomized may differ from general population)
  • Must precisely specify the intervention (what if you choose wrong dose/duration?)

Example: Ridker trial (2005) randomized 20,000 women to aspirin vs. placebo—found aspirin prevents heart attacks in men but not women (gender is an effect modifier).

🔍 Case-control studies: working backward

Procedure:

  1. Select cases (people with disease) from target population
  2. Select controls (people without disease) from same target population
  3. Look backward to assess past exposures
  4. Calculate odds ratio

Strengths:

  • Fast and cheap (no waiting for disease to develop)
  • Efficient for rare diseases
  • Can study diseases with long latent periods
  • Can assess multiple exposures

Weaknesses:

  • Prone to recall bias (cases may remember past exposures differently than controls)
  • Selecting appropriate controls is difficult—must come from same population as cases
  • Cannot calculate incidence (only odds ratios)
  • Temporality less clear (did exposure really precede disease?)

Example: Doll and Hill's 1950 smoking/lung cancer study—recruited lung cancer patients (cases) and comparable hospital patients without lung cancer (controls), asked about past smoking.

Don't confuse: Cases are people with disease (not "people with disease who are exposed"). Both cases and controls are recruited without regard to exposure status.

📷 Cross-sectional studies: single snapshot

Procedure:

  1. Draw sample from target population
  2. Measure exposure and disease status simultaneously (at one point in time)
  3. Calculate odds ratio

Strengths:

  • Fastest, cheapest design
  • Used for surveillance (NHANES, BRFSS, PRAMS)
  • Good for hypothesis generation

Weaknesses:

  • Cannot determine temporality (which came first?)
  • Cannot study rare exposures or rare diseases (you "get what you get")
  • Limited to hypothesis generation, not causal inference
  • Uses prevalent cases (mixes disease onset with disease duration)

Example: Survey asking current physical activity levels and current dementia status—cannot tell if inactivity led to dementia or dementia led to inactivity.

📐 Measures of association

🔢 Risk ratio and rate ratio (RR)

  • Used for cohort studies and RCTs (designs that measure incidence)
  • Formula: (Incidence in exposed) / (Incidence in unexposed)
  • Null value = 1.0 (no association)
  • RR > 1 means exposure increases disease risk
  • RR < 1 means exposure decreases disease risk (protective)

Interpretation template: "The risk [or rate] of [disease] was [RR] times as high in [exposed] compared to [unexposed] over [time period]."

Example: RR = 2.27 → "The risk of hypertension was 2.27 times as high in smokers compared to nonsmokers over 10 years."

Don't confuse: "2 times as high" ≠ "2 times higher" (the latter would be RR = 3.0, since null is 1.0 not 0).

🎲 Odds ratio (OR)

  • Used for case-control and cross-sectional studies (designs using prevalent cases)
  • Formula: (odds of disease in exposed) / (odds of disease in unexposed)
  • For 2×2 table with cells A, B, C, D: OR = (A×D) / (B×C)
  • Null value = 1.0
  • Interpretation same as RR: "The odds of [disease] were [OR] times as high in [exposed] compared to [unexposed]."

OR vs. RR:

  • OR always further from null than RR for same data
  • When disease is rare (<5% prevalence), OR approximates RR
  • When disease is common, OR exaggerates the association
  • Some cohort studies incorrectly report OR instead of RR (due to statistical software defaults)

➖ Risk difference (absolute measure)

  • Formula: (Incidence in exposed) − (Incidence in unexposed)
  • Keeps units (e.g., "42 per 100 per 10 years")
  • Shows absolute impact, not relative
  • Interpretation: "Over [time], the excess number of cases attributable to [exposure] is [RD]; the remaining [baseline incidence] would have occurred anyway."

Why it matters: A RR of 0.5 sounds impressive, but if baseline risk is 1 in a million and exposed risk is 2 in a million, the absolute difference is tiny—probably not worth a public health intervention.

🎯 Threats to validity

🎲 Random error

Random error: Unpredictable measurement variability present in all data; quantified by p-values and confidence intervals.

  • Not the same as bias (which is systematic)
  • Comes from imperfect measurement tools, human error, natural variability
  • Cannot be eliminated, only quantified

p-values:

  • Probability of getting your data (or more extreme) if the null hypothesis is true
  • p ≤ 0.05 = "statistically significant" (arbitrary cutoff)
  • Does NOT tell you the probability that the null hypothesis is true
  • Only meaningful when working with samples (not whole populations)

Confidence intervals (95% CI):

  • If you repeated the study 100 times, 95 of those CIs would contain the true population value
  • If CI excludes null value (0 for differences, 1.0 for ratios) → p < 0.05
  • Narrower CI = more precise estimate (usually from larger sample)

Type I error (α): False positive—rejecting null when it's actually true (set at 5% by using p < 0.05 cutoff)

Type II error (β): False negative—failing to reject null when it's actually false

Power = 1 − β: Probability of detecting an association if one truly exists (ideally ≥90%)

🔀 Bias (systematic error)

Selection bias:

  • Sample not representative of target population (affects external validity—who can you generalize to?)
  • Exposed and unexposed come from different populations (affects internal validity—results are wrong)
  • Different participation or loss-to-follow-up rates between groups
  • Healthy worker bias: workers are healthier than general population

Misclassification bias:

  • Measuring exposure or disease incorrectly → people in wrong boxes of 2×2 table
  • Nondifferential: Same error rate in all groups (usually biases toward null, but not always)
  • Differential: Error rate differs by group (fatal to internal validity)
  • Includes recall bias, social desirability bias, interviewer bias

Assessing misclassification: Ask "Can people tell me this?" (if no, stop). Then ask "Will people tell me this?" (if no, expect bias).

Publication bias: Studies with exciting results more likely to be published → literature review shows biased picture.

Don't confuse: Bias cannot be fixed with statistics (unlike random error). Must prevent through good study design.

🔗 Confounding

Confounder: A third variable (not exposure, not outcome) that distorts the true exposure-disease association.

Three criteria for potential confounder:

  1. Statistically associated with exposure (disproportionately distributed between exposed/unexposed)
  2. Causes the outcome (or plausibly could)
  3. NOT on causal pathway (exposure doesn't cause confounder)

Example: In study of foot size and reading ability among elementary students, grade level is a confounder—higher grades have bigger feet (criterion 1) and better reading (criterion 2), but foot size doesn't cause grade level (criterion 3).

Controlling for confounding:

Design phase:

  • Restriction (limit sample to one level of confounder—e.g., only 3rd graders)
  • Matching (in case-control studies, match each case to control with same confounder value)
  • Randomization (in RCTs, random assignment breaks confounder-exposure link)

Analysis phase:

  • Stratification (make separate 2×2 tables for each confounder level, calculate stratum-specific measures)
  • Regression (automated stratification accounting for all confounder levels)

Detecting confounding: If crude and adjusted measures differ by >10%, and crude doesn't fall between stratum-specific measures → confounding present → report adjusted measure.

🔄 Effect modification

Effect modification (interaction): The exposure-disease association differs across levels of a third variable.

Detecting effect modification: Stratum-specific measures are different from each other, AND crude measure falls between them.

Example: Sleep and GPA study—among men, <8 hours sleep associated with higher GPA (RR=0.68); among women, <8 hours sleep associated with lower GPA (RR=1.7). Gender is an effect modifier.

Reporting: Present stratum-specific measures separately (don't calculate adjusted measure—the interesting finding IS that groups differ).

Confounding vs. effect modification:

AspectConfoundingEffect modification
Stratum-specific measuresSimilar to each otherDifferent from each other
Crude measureOutside stratum-specific rangeBetween stratum-specific measures
What to reportAdjusted measureStratum-specific measures
InterpretationCovariable distorts associationAssociation truly differs by covariable

Don't confuse: Same variable can theoretically be both confounder and effect modifier (rare in practice).

🧬 Causality

🥧 Multiple causes of disease

Three tenets:

  1. All diseases have multiple causes (no single "the" cause)
  2. Not all causes act at the same time
  3. Many different combinations of causes can produce the same disease

Jar analogy: Each person starts with a jar (size determined by genetics, early life). Harmful exposures add liquid; protective exposures drain liquid. Disease begins when jar fills to top. Different people have different-sized jars and encounter different exposures → many paths to same disease.

Implications:

  • Don't need to identify all causes before taking action (eliminating one cause prevents some cases)
  • "Strength" of causes is population-dependent (if we eliminate smoking, radon will look like stronger cause of lung cancer)
  • Attributable fractions sum to >100% (not useful)

✅ Determining causality in epidemiology

Process:

  1. First establish association is real (not due to bias, confounding, or random error)
  2. Then assess causality using considerations like:
    • Temporality (exposure preceded disease?)
    • Biological plausibility (known mechanism?)
    • Consistency (multiple studies reach same conclusion?)
    • Dose-response (more exposure → more disease?)
    • Experimental evidence (RCT results?)

Hill's considerations: Not a checklist, but things to think about. Some apply better to infectious diseases (specificity) than chronic diseases (smoking causes multiple diseases, not just one).

RCTs provide strongest causal evidence (if well-conducted) because randomization eliminates confounding. But many research questions cannot be studied with RCTs (ethical/practical constraints) → rely on converging evidence from multiple observational studies.

Don't confuse: Statistical association ≠ causation. Epidemiologists carefully use non-causal language ("associated with," "evidence suggests") until field reaches consensus.

🩺 Screening and diagnostic testing

🔍 Screening vs. diagnosis

Screening: Testing asymptomatic population to find early disease

  • Requires test that detects pre-symptomatic disease
  • Disease must be common enough or serious enough to justify cost
  • Most useful when critical point (point beyond which treatment doesn't help) falls between screening detection and symptom onset

Diagnostic testing: Testing symptomatic patient to determine which condition they have

  • Part of differential diagnosis process
  • Order of tests depends on disease severity, test costs, disease prevalence

Same test, different context: Mammogram is screening test if no symptoms; diagnostic test if patient found a lump.

📊 Test characteristics (fixed)

Sensitivity (Sn): Probability test is positive given person has disease

  • Formula: (True positives) / (True positives + False negatives)
  • High sensitivity → few false negatives
  • SnOUT: High sensitivity test, when negative, rules OUT disease
  • Screening programs use high-sensitivity tests (don't want to miss cases)

Specificity (Sp): Probability test is negative given person doesn't have disease

  • Formula: (True negatives) / (False positives + True negatives)
  • High specificity → few false positives
  • SpIN: High specificity test, when positive, rules IN disease

Don't confuse: Sensitivity and specificity are fixed (don't change with disease prevalence).

🎯 Predictive values (vary with prevalence)

Positive predictive value (PPV): Probability person has disease given positive test

  • Formula: (True positives) / (True positives + False positives)
  • Used to interpret positive test result
  • Decreases as prevalence decreases

Negative predictive value (NPV): Probability person doesn't have disease given negative test

  • Formula: (True negatives) / (False negatives + True negatives)
  • Used to interpret negative test result
  • Increases as prevalence decreases

Example: Mammography in 40-year-old women—breast cancer prevalence is 0.98%, so PPV is <1% → >99% of women sent for biopsy are false positives. In high-risk women (BRCA carriers), prevalence is higher → PPV is higher → screening more useful.

Don't confuse: Must know disease prevalence in the relevant population to interpret PPV/NPV. Same test result means different things in different populations.


This document covers the foundational concepts of epidemiology including disease frequency measurement, study design selection, statistical inference, bias assessment, confounding control, and causal reasoning—all essential for critically reading and applying epidemiologic research to public health practice.

3

Determinants

Determinants

🧭 Overview

🧠 One-sentence thesis

Determinants are the factors that either cause or prevent disease.

📌 Key points (3–5)

  • What determinants are: things that cause or prevent disease.
  • Alternative terminology: determinants are also called "causes."
  • Role in epidemiology: determinants are central to understanding disease patterns and prevention.

🔍 Core definition

🔍 What determinants mean

Determinants: Things that cause or prevent disease. Also called "causes."

  • The term captures both causal factors (things that lead to disease) and protective factors (things that prevent disease).
  • This is a foundational concept in epidemiology—identifying what makes disease more or less likely.
  • The excerpt provides a simple, direct definition without elaboration.

🔄 Synonyms and usage

  • "Determinants" and "causes" are used interchangeably in epidemiological literature.
  • Both terms refer to the same underlying concept: factors that influence disease occurrence.
  • Example: A determinant might be a risk factor (increasing disease likelihood) or a preventive measure (decreasing disease likelihood).

📊 Context in epidemiology

📊 Relationship to other concepts

  • The excerpt places determinants within a glossary of epidemiological terms.
  • Determinants are distinct from measures of disease frequency (like incidence or prevalence) and study designs (like cohort or cross-sectional studies).
  • Understanding determinants is essential for descriptive epidemiology, which summarizes known risk factors for a condition.

🎯 Why determinants matter

  • Identifying determinants helps explain why disease occurs in certain populations or individuals.
  • Knowledge of determinants informs prevention strategies and intervention design.
  • Example: If a determinant is identified as protective, public health efforts can promote that factor; if causal, efforts can focus on reducing exposure.
4

Disease

Disease

🧭 Overview

🧠 One-sentence thesis

In epidemiology, "disease" is broadly defined to mean any health-related outcome—not just traditional illnesses—and epidemiologists study these outcomes at the population level rather than in individual patients.

📌 Key points (3–5)

  • Broad definition: "disease" in epidemiology means any health outcome, whether traditionally considered an illness or not.
  • Examples span wide range: includes traditional diseases (measles, diabetes) and non-illness outcomes (pregnancy, death, physical activity).
  • Determinants are causes: anything that changes disease likelihood—either increasing risk (smoking) or decreasing it (exercise)—counts as a cause.
  • Common confusion: health behaviors can be both determinants AND diseases depending on context (e.g., smoking causes lung cancer but is itself an outcome in cessation programs).
  • Population focus: epidemiologists study groups with common characteristics, not individual people, using samples to generalize to target populations.

🏥 What counts as "disease" in epidemiology

📖 The epidemiological definition

Disease: any health-related condition or outcome.

  • This is much broader than everyday usage of "disease."
  • The term encompasses outcomes regardless of whether they represent illness in the traditional sense.
  • It's an umbrella term for whatever health outcome epidemiologists are studying.

🔬 Traditional diseases vs. other outcomes

CategoryExamples from excerpt
Traditional diseases (illness)Measles, HIV, diabetes, leukemia
Health outcomes (not illness per se)Pregnancy, malnutrition, physical activity, death
  • Both categories are equally valid as "diseases" in epidemiological research.
  • The key is that they are all health-related conditions that can be studied.

🎯 Determinants and causes

🧬 What determinants are

Determinant (cause): anything that changes the likelihood that an individual will become diseased.

  • In epidemiology, "cause" means "cause or prevent"—it's bidirectional.
  • A determinant can either increase risk or decrease risk.
  • Both directions count as "causing" the disease outcome.

⚖️ How determinants work in both directions

  • Increases risk: smoking increases the chance of various conditions.
  • Decreases risk: exercise generally reduces disease risks.
  • Both smoking and exercise are "causes" by the epidemiological definition—they alter disease probability.

🧩 Types of determinants

Determinants can be anything that meets the criterion of altering disease risk:

  • Behaviors
  • Demographics
  • Genetics
  • Environmental contaminants
  • And so on

Etiology: collectively, all determinants of a disease.

🔄 The dual nature of health behaviors

Don't confuse: Health behaviors occupy a unique position—they can be both determinants AND diseases depending on context.

Example from the excerpt:

  • As a determinant: Smoking causes lung cancer (it's the cause).
  • As a disease: In a smoking cessation program evaluation, smoking is the outcome being studied (it's the "disease").

The same dual role applies to physical activity, nutrition, and other health behaviors.

👥 The population perspective

🌍 What populations are

Population: a group of people with a common characteristic.

Examples of populations:

  • Residents of the United States
  • People with type 1 diabetes
  • People under age 25 who work full-time

🎯 Target population vs. sample

  • Target population: the group about whom researchers wish to say something.
  • Sample: a smaller group drawn from the target population to actually participate in the study.
  • Why sampling is necessary: researchers cannot possibly enroll all members of a target population.

Example from the excerpt:

  • Research question: relationship between sleep and GPA in college students.
  • Target population: "full-time undergraduates."
  • Sample: some smaller group of full-time undergraduates actually enrolled in the study.

🔍 Generalizability considerations

  • Ideally, the sample should be similar enough to the target population that results can be generalized back.
  • Researchers work to recruit diverse samples that resemble the population.
  • Example of poor generalizability: studying only first-year biology majors when trying to say something about all full-time undergraduates.
  • Note: generalizability doesn't always matter as much in epidemiology as in other fields.

📋 Defining populations with inclusion/exclusion criteria

Purpose: Lists that allow any person to decide whether they belong to the population.

  • Inclusion criteria: characteristics that qualify someone (e.g., "include kids").
  • Exclusion criteria: characteristics that disqualify someone (e.g., "exclude adults").
  • These are flip sides of the same coin; use whichever provides greater clarity.

Example from the excerpt (strength training study for osteoporotic fractures):

  • Must specify: lower (and potentially upper) age cutoff for "elderly."
  • Must specify: biological females, those who identify as women, or both.
  • Must specify: physical capability requirements (not all elderly women can do strength training).
  • Might exclude: women for whom exercise is contraindicated (e.g., heart failure patients) or those who already had a hip fracture.

Important note: Rarely is there one "correct" answer for criteria—scientific, clinical, or policy reasons guide choices, but many boundaries are somewhat arbitrary as long as they're clearly defined and consistently applied.

🏛️ Epidemiology vs. clinical medicine

👨‍⚕️ The fundamental difference

The excerpt emphasizes that epidemiologists concern themselves with populations, not individual people.

  • This is both a great asset and a source of great confusion.
  • Physicians, nurses, and other clinicians focus on individual patients.
  • Epidemiologists look at distributions and determinants of disease in populations.

This population-level focus distinguishes epidemiology from clinical practice.

5

Populations

Populations

🧭 Overview

🧠 One-sentence thesis

Epidemiologists study populations rather than individuals, which is a powerful tool for identifying group-level patterns but requires careful interpretation because population statistics do not predict individual risk.

📌 Key points (3–5)

  • What a population is: a group of people with a common characteristic; epidemiologists draw samples from target populations to study health outcomes.
  • How populations are defined: through inclusion/exclusion criteria that are clear enough for anyone to determine whether they belong to the population.
  • Population vs individual focus: epidemiology focuses on groups, whereas clinicians focus on individuals—this is both an asset and a source of confusion.
  • Common confusion: population-level statistics (e.g., average risk in a state) say nothing about any one individual's risk; aggregated data mask individual variation.
  • Why it matters: population-level data allow epidemiologists to identify why some groups are at higher risk and to plan public health actions.

🧩 Core concept: What a population is

🧩 Definition and purpose

A population is a group of people with a common characteristic.

  • Examples: residents of the United States, people with type 1 diabetes, people under age 25 who work full-time.
  • For epidemiologists, the population is the group about whom they wish to say something.
  • Example: if studying the relationship between sleep and GPA among college students, the population might be "full-time undergraduates."

🎯 Target population vs sample

  • Target population: the entire group the researcher wants to generalize about.
  • Sample: a smaller group drawn from the target population to actually conduct the study.
  • Why sampling is necessary: it is impossible to enroll all members of a large target population (e.g., all full-time undergraduates in the world).
  • The sample should be similar enough to the target population so that results can be generalized back to that population.
  • Example: a study done only among first-year biology majors would be hard-pressed to generalize to all full-time undergraduates.
  • Don't confuse: generalizability does not always matter as much in epidemiology as in other fields (the excerpt notes this is discussed further under "external validity").

🔧 Defining populations: Inclusion and exclusion criteria

🔧 What they are

  • Populations are defined via lists of inclusion and/or exclusion criteria.
  • These are flip sides of the same coin: you can either include kids or exclude adults.
  • The list must be sufficiently complete that any given person could look at it and decide whether they are in the population.

📋 Example: Strength training study

If planning a study of strength training to prevent osteoporotic fractures in elderly women, the criteria must specify:

  • The lower (and potentially upper) age cutoff (i.e., what is "elderly" for the study's purposes?)
  • Whether the study is interested in biological females, those who identify as women, or both
  • Whether there are exclusions in terms of physical capabilities (e.g., not all elderly women can do a strength training regimen)
  • Possible exclusions: women for whom exercise is contraindicated (e.g., heart failure patients) or those who have already had a hip fracture

🤔 No single "correct" answer

  • Only rarely is there a "correct" answer when creating inclusion/exclusion criteria.
  • Scientific or clinical considerations help narrow it down, but often the exact cutoff doesn't matter as long as one is set and followed consistently.
  • Example: it probably doesn't matter if the lower age bound is set at 60, 65, or 70, as long as one is chosen and stuck to.
  • Occasionally, policy reasons dictate the choice—for instance, Medicare in the US covers individuals ages 65 and older, so studies often use this age group.

🔍 Population-level vs individual-level data

🔍 The key distinction

  • Clinicians (physicians, nurses, etc.) are concerned mainly with diseases in individuals.
  • Epidemiologists focus instead on populations.
  • This difference makes interpreting epidemiologic results difficult, since epidemiologic results pertain to populations, not individuals.

📊 Example: Opioid epidemic data

The excerpt provides state-level opioid death rates:

StateOpioid-related deaths per 100,000 people per year
West Virginia (riskiest)43.4
Nebraska (least risky)2.4
  • These statistics say nothing about individual levels of risk.
  • They only say that, on average, people from West Virginia are much more likely to die from opioid-related causes than people from Nebraska.
  • For any one individual, much more must be considered than just where the person lives.
  • Example: a person addicted to painkillers in Nebraska surely has a higher risk of opioid-related death than a person in West Virginia who has never taken any pain medicine stronger than aspirin, even though the population-level risks might suggest otherwise.

📊 Example: Sleep and GPA study

Suppose a study of 4,000 students at Oregon State University (2,000 males and 2,000 females) finds:

  • Male students sleep an average of 7.2 hours per night

  • Female students sleep an average of 7.9 hours per night

  • This comparison allows commenting on differences between male and female students on a population level.

  • It says nothing about individual students.

  • Within the sample, it would be relatively easy to find a given pair of students wherein the male student averaged more sleep than the female.

  • Don't confuse: averages also mask personal variations—even if one person averages 7.4 hours per night, there are nights with less and nights with more.

💪 Why population-level statistics are powerful

💪 The work of epidemiology

  • Population-level statistics allow epidemiologists to figure out why some groups (populations) are at higher risk than others.
  • When looking at aggregated statistics, one must always keep in mind that any one individual's risk is lost within the group.
  • Always remember: epidemiologic data refer to groups of people—not to individuals.

🎯 Application to public health

The excerpt emphasizes that epidemiology uses information on distributions and determinants of diseases in populations to control health problems.

  • This final application step is controversial in the field (not all definitions include it), but the excerpt argues the rest does not matter without this step.
  • Epidemiology is the fundamental science of public health, which is concerned with preventing disease and improving general wellness in the public.
  • Example: merely knowing that male students get less sleep than female students does little good—the way to contribute to public health is by taking action based on this knowledge.
  • Example: imagine if the epidemiologists who first made the link between smoking and lung cancer had not acted on their findings.

🤝 Collaboration for action

  • Epidemiologic data are a key part of numerous possible public health actions: health education campaigns, policy or regulation changes, clinical practice changes, and many other initiatives.
  • Rarely do epidemiologists take this step by themselves—collaboration with professionals from other fields within and related to public health is a must.
  • The effectiveness of these actions should always be formally evaluated (a process that often involves epidemiologists) to make sure they worked as intended.
6

Controlling Health Problems

Controlling Health Problems

🧭 Overview

🧠 One-sentence thesis

Epidemiology must go beyond studying disease patterns to actively control health problems through data-driven public health interventions.

📌 Key points (3–5)

  • The action step: epidemiology uses distribution and determinant data to control health problems, not just describe them.
  • Why it matters: merely knowing patterns (e.g., sleep differences) is useless without taking action to improve public health.
  • How action happens: epidemiologic data inform health education, policy changes, clinical practice changes, and other initiatives.
  • Collaboration is essential: epidemiologists rarely act alone; they work with other public health professionals to plan and evaluate interventions.
  • Common confusion: "epidemiology" can mean both the methods used to study disease and the collected body of knowledge about a specific health outcome.

🎯 The action imperative

🎯 Why studying patterns is not enough

  • The excerpt emphasizes that knowing distributions and determinants "does us little good" without action.
  • Example: discovering that male students get less sleep than female students is meaningless unless we use that knowledge to improve health.
  • The smoking-lung cancer link illustrates this: if epidemiologists had not acted on their findings, the knowledge would have been wasted.

🚀 What "controlling health problems" means

Controlling health problems: using epidemiologic data on disease distributions and determinants to take action that prevents disease and improves public wellness.

  • This application step is controversial—not all epidemiology definitions include it.
  • The excerpt argues it is essential because epidemiology is the fundamental science of public health, and public health is concerned with prevention and wellness.
  • Don't confuse: this is not about treating individual patients; it's about population-level interventions.

🤝 How epidemiologists contribute to action

🤝 Types of public health actions

Epidemiologic data inform multiple intervention types:

  • Health education campaigns
  • Policy or regulation changes
  • Clinical practice changes
  • Many other initiatives

👥 The role of collaboration

  • Epidemiologists rarely take action by themselves.
  • Collaboration with other public health professionals is a must.
  • Epidemiologists provide the fundamental data for planning actions.
  • Epidemiologists often participate in formal evaluation to ensure interventions worked as intended.
PhaseEpidemiologist's roleCollaboration need
Data generationPrimary responsibilityMinimal
Action planningProvide foundational dataEssential—work with other fields
EvaluationOften involvedEssential—assess effectiveness

📚 Two meanings of "epidemiology"

📚 Methods vs. knowledge

The word "epidemiology" has two distinct uses:

  1. The methods: the set of techniques used to study distribution and determinants of disease (how the term has been used in the chapter so far).
  2. The body of knowledge: everything known about a particular health outcome as a result of epidemiologic study.
  • Example: "the epidemiology of heart failure" refers to all collected knowledge about risk factors and prognoses for heart failure.
  • Don't confuse: when someone says "epidemiology," check whether they mean the process of studying or the accumulated findings.

🔑 Core definition recap

🔑 What epidemiology is for

Epidemiology is the fundamental science of public health.

  • Public health is concerned with preventing disease and improving general wellness in the public.
  • Epidemiologists study disease patterns within populations to determine risk profiles and potential health-improvement targets.
  • They collaborate with others to implement data-driven, population-level, health-related interventions.
  • The excerpt stresses that without the action step, "the rest does not matter."
7

Conclusions

Conclusions

🧭 Overview

🧠 One-sentence thesis

Surveillance activities enable epidemiologists and public health professionals to monitor usual disease levels in populations so that potential threats can be detected early and addressed before they become public health crises.

📌 Key points (3–5)

  • Core purpose of surveillance: to track "usual" disease levels and spot emerging threats early enough to mount a proper response.
  • How surveillance creates value: benefit develops as data are compared over time, allowing detection of changes and trends.
  • Multiple systems operating simultaneously: the US runs numerous surveillance systems at any one time, each contributing to the overall monitoring effort.
  • Common confusion: surveillance is not just about collecting data—it's about longitudinal comparison to identify deviations from normal patterns.

🎯 The ultimate goal of surveillance

🎯 Early detection and response

  • Surveillance aims to notice potential public health threats early so that a proper response can be mounted before a public health crisis ensues.
  • The emphasis is on prevention: catching problems while they are still manageable.
  • Example: if a surveillance system detects an unusual increase in a disease, public health officials can investigate and intervene before widespread harm occurs.

📊 Monitoring "usual" levels

Surveillance activities allow epidemiologists and other public health professionals to monitor the "usual" levels of disease in a population.

  • "Usual" means the baseline or expected level of disease under normal conditions.
  • Deviations from usual levels signal potential problems.
  • This monitoring is continuous, not one-time.

⏳ How surveillance creates value over time

⏳ Longitudinal comparison

  • Much of the benefit from surveillance systems develops as data are compared over time.
  • A single snapshot is less useful than tracking changes across months, years, or decades.
  • Example: comparing this year's disease rates to previous years reveals whether rates are rising, falling, or stable.

🔄 Multiple systems working together

  • The US has numerous surveillance systems operating at any one time.
  • Different systems capture different aspects of health (notifiable diseases, cancer, vital statistics, survey data).
  • The combined data from multiple systems provides a comprehensive picture of population health.

🔍 Don't confuse: data collection vs. surveillance

🔍 Surveillance is more than just gathering information

  • Surveillance is not simply collecting data—it's about monitoring trends and detecting changes.
  • The value comes from the ability to compare current data to historical patterns.
  • Without temporal comparison, surveillance loses much of its utility for early threat detection.
8

Counts (a.k.a. Frequencies)

Counts (a.k.a. Frequencies)

🧭 Overview

🧠 One-sentence thesis

Counts—simple tallies of how many people have a disease—are sufficient for rare conditions and epidemic detection but cannot compare populations of different sizes.

📌 Key points (3–5)

  • What counts are: a raw number of cases with no fractions or denominators; units are always "people."
  • When counts are useful: for extremely rare conditions where knowing "how many" is enough for a public health response.
  • Epidemic detection: comparing the count to the expected number (e.g., zero) reveals whether an outbreak has occurred.
  • Common confusion: counts alone cannot compare two populations of different sizes—1,000 cases in a small town vs. 100,000 in a large city cannot be compared at a glance.
  • Why it matters: counts guide resource allocation (e.g., vaccine stockpiling) and trigger responses when they exceed expected levels.

📊 What counts measure

📊 Definition and structure

Count: a simple number of cases of disease or health behavior; no fractions, numerators, or denominators; units are always "people."

  • A count answers: "How many people are sick?"
  • It is the most basic measure of disease frequency.
  • Example: 6 students got meningococcal meningitis—this is a count.

🔢 No denominators

  • Counts do not incorporate population size or any reference group.
  • They are standalone numbers, not proportions or rates.
  • This simplicity makes them easy to communicate but limits their use for comparison.

🦠 When counts are sufficient

🦠 Rare conditions

  • For extremely rare diseases, knowing the raw number of cases is often enough.
  • The excerpt emphasizes that counts work well when the baseline expectation is very low (e.g., zero cases).
  • Example: meningococcal meningitis at Oregon State University—6 cases were reported during the 2017/2018 academic year.

🚨 Epidemic detection

  • Surveillance data establish the expected number of cases (often zero for rare diseases).
  • When the observed count exceeds the expected level, it signals an epidemic.
  • Example: since the expected number of meningococcal meningitis cases is zero, 6 cases constitute "a level quite above what is expected" and trigger a public health response (requiring vaccination for students 25 years old and younger).

🗺️ Geographic resource planning

  • Counts by location help health departments allocate resources.
  • The excerpt describes animal rabies cases in Oregon counties over 10 years.
  • Example: Josephine County had recorded cases, so the local health department might keep rabies vaccine doses on hand; Wallowa County had no cases in 10 years, so resources might be better spent elsewhere (assuming quick access to vaccine from neighboring counties if needed).
  • Don't confuse: this is about preparedness, not comparison of disease burden across counties.

⚠️ Limitations of counts

⚠️ Cannot compare populations of different sizes

  • The excerpt states: "Counts are less useful if we want to compare 2 populations."
  • Without denominator information (population size), raw counts are misleading.
  • Example: 1,000 flu cases in Ashland, New Hampshire, versus 100,000 flu cases in New York City—"we cannot compare these 2 figures at a glance, because the denominators (i.e., the number of people living in each city) are so different."
  • A small town with 1,000 cases might have a higher disease burden per capita than a large city with 100,000 cases.

🔄 When to use other measures

  • For comparing populations or studying disease patterns, measures that incorporate denominators (incidence and prevalence) are needed.
  • The excerpt introduces these as alternatives: prevalence measures existing disease, and incidence measures new disease.
  • Prevalence is used for resource allocation; incidence is used to study causes of disease.
9

Incidence and Prevalence

Incidence and Prevalence

🧭 Overview

🧠 One-sentence thesis

Incidence measures new disease cases and is best for studying disease causes, while prevalence measures existing disease burden and is better for resource allocation decisions.

📌 Key points (3–5)

  • Two ways to measure incidence: incidence proportion (risk over a period) and incidence rate (cases per person-time at risk).
  • Person-time approach: allows nuanced tracking of who is at risk and for how long, accounting for staggered enrollment, loss to follow-up, and competing risks.
  • Common confusion: prevalence cannot establish whether exposure or disease came first, so it cannot prove causality; incidence ensures everyone is disease-free at baseline, so exposure precedes disease.
  • Why incidence matters: used to study causes of disease because temporal order is clear.
  • Why prevalence matters: assesses disease burden in a community for resource allocation (e.g., prevention programs in high-prevalence regions).

📏 Two types of incidence

📊 Incidence proportion vs incidence rate

FeatureIncidence ProportionIncidence Rate
NumeratorNew cases over a periodNew cases over a period
DenominatorNumber of people at risk at the startSum of person-time at risk
Must specifyTime framePerson-time units
Also calledRisk, cumulative incidence, absolute riskIncidence density
Range0 to 1 (it's a proportion)0 to infinity
  • Both count new cases, but the denominator differs.
  • Incidence proportion treats everyone as contributing equally (one person = one unit), regardless of how long they were followed.
  • Incidence rate uses person-time (e.g., person-months or person-years), so someone followed longer contributes more to the denominator.

🧮 What is person-time at risk

Person-time at risk: the sum of time each person spends at risk of the disease during the study.

  • Example: 100 person-months (PM) could mean 100 people followed for 1 month, 10 people for 10 months, or 1 person for 100 months—all yield the same denominator.
  • The excerpt notes this is a limitation: the rate does not distinguish between these scenarios.
  • Strength: more realistic than incidence proportion because not everyone enrolls on day one, some are lost to follow-up or experience competing risks, and some cases occur almost immediately (contributing very little person-time).

⚠️ Why prior person-time cannot be counted

  • If you enroll 50-year-olds and follow them, you cannot also count the 50 years before study entry.
  • Reason: you are missing all the prevalent cases—people who developed the disease before age 50 and are thus ineligible.
  • Without data on how much person-time those prevalent cases contributed before getting sick, adding 50 person-years per enrolled person would artificially lower incidence (inflated denominator without accounting for the full population).

🔍 Incidence vs prevalence for studying causes

🕰️ Why incidence can establish causality

  • Incidence studies enroll only people who are disease-free at baseline (the population at risk).
  • Any exposures measured at the start came before disease onset by definition.
  • This temporal order is necessary for establishing causality.

🚫 Why prevalence cannot establish causality

  • Prevalence measures existing cases; the disease has already happened.
  • You cannot tell whether the exposure or the disease came first.
  • Example from the excerpt: obesity is associated with lower physical activity.
    • Scenario A: lower activity → energy imbalance → obesity.
    • Scenario B: obesity came first → joint pain → reduced activity.
  • Studying prevalent obesity cases does not allow you to distinguish between these scenarios.
  • Don't confuse: prevalence tells you "how much disease exists now," not "what caused it."

🏥 Uses of incidence and prevalence

🔬 Incidence: studying disease causes

  • Because everyone is disease-free at baseline, incidence is the preferred measure for etiologic research (identifying causes).
  • Temporal sequence is clear: exposure precedes disease.

📍 Prevalence: assessing disease burden and resource allocation

  • Prevalence is more useful for understanding the burden of disease in a community.
  • Example from the excerpt: state health departments in the Northeast and upper-Midwest spend budget on Lyme disease prevention education (e.g., billboards about tucking pants into socks) because Lyme disease is quite prevalent in those regions.
  • In contrast, prevalence of Lyme disease in Colorado is extremely low, so health departments there would allocate resources differently.
  • Why it matters: prevalence helps decide where to focus prevention programs and healthcare resources.

⚙️ Strengths and limitations of person-time

✅ Strengths

  • More nuanced view of the population at risk.
  • More realistic: accounts for staggered enrollment, loss to follow-up, competing risks, and cases that occur almost immediately (contributing very little person-time).
  • Example: if a case pops up almost immediately, that person contributes very little person-time to the denominator; with incidence proportion, they would add a full person to the denominator.

⚠️ Limitations

  • More complex to calculate and interpret.
  • Does not distinguish between different follow-up patterns that yield the same total person-time (100 people × 1 month = 10 people × 10 months = 1 person × 100 months = 100 PM).
  • Loss to follow-up is probably not random (people may drop out because they feel poorly but before being recorded as an incident case); this can also affect incidence proportion.
  • Best practice: state the time period over which people were eligible to be followed (e.g., one year in the excerpt's Figure 2-3).
10

Prevalence

Prevalence

🧭 Overview

🧠 One-sentence thesis

Prevalence is more useful for assessing disease burden and allocating resources in a community than for studying disease causes, because it cannot distinguish whether exposure or disease came first.

📌 Key points (3–5)

  • Why prevalence is limited for causal research: the disease has already happened, so we cannot tell whether the exposure or the disease occurred first.
  • What prevalence is good for: assessing disease burden in a community and guiding resource allocation decisions.
  • Common confusion: prevalence vs incidence—prevalence includes existing cases and cannot establish temporal order; incidence follows disease-free people forward, so exposure clearly precedes disease onset.
  • Real-world application: health departments use prevalence data to decide where to invest in prevention education (e.g., Lyme disease prevention in high-prevalence regions).

🔍 Why prevalence cannot establish causality

🚫 The temporal order problem

Prevalence is less useful for studying causes of disease because the disease has already happened; we thus have no way of knowing whether the disease or the exposure happened first (necessary for establishing causality).

  • Causality requires knowing that the exposure came before the disease.
  • With prevalent cases, both the disease and the exposure are present at the same time, so the sequence is unclear.
  • Example: obesity is associated with lower physical activity—two equally possible scenarios exist:
    • Lower physical activity → energy imbalance → obesity.
    • Obesity → joint pain → reduced physical activity.
  • Studying prevalent obesity cases does not allow us to distinguish between these scenarios.

✅ How incidence solves this problem

  • Incidence studies follow people who are disease-free at baseline (the population at risk).
  • Any exposures assessed at the beginning came before disease onset by definition.
  • This clear temporal sequence makes incidence suitable for studying potential causes of disease.

Don't confuse: Prevalence = snapshot of existing cases (no time order); incidence = new cases arising over time (exposure precedes disease).

🏥 What prevalence is useful for

📊 Assessing disease burden

  • Prevalence measures how much disease exists in a particular community at a given time.
  • It reflects the total load of disease, not just new cases.
  • This makes it useful for understanding the scope of a health problem in a population.

💰 Resource allocation

  • Health departments use prevalence data to decide where to spend budgets.
  • Example: state health departments in the Northeast and upper-Midwest spend part of their budgets on Lyme disease prevention education (e.g., billboards about tucking pants into socks) because Lyme disease is quite prevalent in those regions.
  • In contrast, the prevalence of Lyme disease in Colorado is extremely low, so health departments there would allocate resources differently.

Why this matters: Prevalence tells decision-makers where the problem is concentrated, guiding targeted interventions.

🆚 Prevalence vs incidence comparison

AspectPrevalenceIncidence
What it measuresExisting cases (disease burden)New cases arising over time
Temporal orderCannot establish (disease already present)Can establish (exposure assessed before disease onset)
Use for causalityNot useful (cannot tell which came first)Useful (follows disease-free people forward)
Use for resource allocationVery useful (shows disease burden in community)Less directly useful for this purpose
Population studiedIncludes people who already have diseaseOnly disease-free people at baseline (population at risk)

🔑 Key distinction

  • Incidence studies know that everyone is disease-free at baseline, since they study only the population at risk.
  • Prevalence studies include people who already have the disease, so the timing of exposure relative to disease is unknown.
11

Incidence

Incidence

🧭 Overview

🧠 One-sentence thesis

Incidence measures new disease cases over time and is essential for studying disease causes, whereas prevalence measures existing cases and is better suited for assessing disease burden and resource allocation.

📌 Key points (3–5)

  • Two types of incidence: incidence proportion (risk over a fixed period) and incidence rate (cases per person-time at risk).
  • Person-time approach: accounts for varying follow-up durations and competing risks, making it more realistic than counting whole persons.
  • Common confusion: incidence vs. prevalence—incidence captures new cases and establishes temporal order (exposure before disease), while prevalence includes existing cases and cannot determine which came first.
  • Why incidence matters for causality: studying incidence ensures everyone is disease-free at baseline, so exposures measured at the start definitively precede disease onset.
  • Why prevalence matters for planning: prevalence reflects total disease burden in a community, guiding resource allocation decisions.

📏 Two measures of incidence

📊 Incidence proportion vs. incidence rate

FeatureIncidence ProportionIncidence Rate
NumeratorNew cases over a period of timeNew cases over a period of time
DenominatorNumber of people at risk at the startSum of person-time at risk
Must specifyThe time frameThe person-time units
Also calledRisk, cumulative incidence, absolute riskIncidence density
Range0 to 1 (it's a proportion)0 to infinity

🧮 Incidence proportion

Incidence proportion: the number of new cases divided by the number of people at risk at the start, over a defined time period.

  • Also known as risk, cumulative incidence, or absolute risk.
  • You must define the time frame (e.g., one year).
  • It is a proportion, so it ranges from 0 to 1.
  • Limitation: treats everyone as contributing equally, even if a case occurs almost immediately (that person still counts as one full person in the denominator).
  • Special case: if the disease can occur more than once, incidence proportion can exceed 1 (numerator counts multiple episodes, denominator counts people); usually only the first episode is counted to avoid this.

⏱️ Incidence rate (person-time approach)

Incidence rate: the number of new cases divided by the sum of person-time at risk.

  • Also called incidence density.
  • Denominator is measured in person-years (or person-months, etc.) at risk.
  • You must report the person-time units used.
  • Range is 0 to infinity (not bounded like a proportion).

🔍 Strengths and limitations of person-time

✅ Strengths of the person-time approach

  • More nuanced view of the population at risk: not everyone enrolls on day one; some experience competing risks or are lost to follow-up.
  • More realistic: a case that pops up almost immediately contributes very little person-time to the denominator, reflecting the actual time at risk.
  • Example: with incidence proportion, a person who develops disease on day two still adds a full person to the denominator; with person-time, they contribute only a tiny amount of time.

⚠️ Limitations of the person-time approach

  • More complex than incidence proportion.
  • Does not distinguish between different follow-up patterns that yield the same total person-time:
    • 100 people followed for 1 month = 100 person-months
    • 10 people followed for 10 months = 100 person-months
    • 1 person followed for 100 months = 100 person-months
  • Loss to follow-up is probably not random: people may drop out because they're feeling poorly but before being recorded as an incident case (this can also affect incidence proportion).
  • Best practice: state the time period over which people were eligible to be followed (e.g., one year).

🚫 Why prior person-time cannot be counted

  • The problem: if you enroll 50-year-olds at risk of heart disease, why not count the 50 years of person-time before study entry?
  • Why you can't: you are missing all the prevalent cases—some people developed heart disease before age 50 and are not eligible for the study.
  • Without data on how many person-years at risk those people had before developing disease, your incidence would be artificially low (you add 50 person-years per enrolled person to the denominator without accounting for the entire population, which includes cases that are prevalent by age 50).

🎯 Uses of incidence and prevalence

🔬 Incidence: studying causes of disease

  • Why incidence is useful for causality: when studying incidence, everyone is disease-free at baseline (you study only the population at risk).
  • Any exposures assessed at the beginning came before disease onset by definition.
  • This temporal order is necessary for establishing causality.

🧩 Prevalence: less useful for causality

  • The problem with prevalent cases: the disease has already happened, so you cannot know whether the disease or the exposure happened first.
  • Example: obesity is associated with lower levels of physical activity.
    • Scenario 1: lower physical activity leads to obesity (secondary to energy imbalance).
    • Scenario 2: obesity came first, and the person subsequently reduced physical activity (possibly secondary to joint pain).
  • Studying prevalent obesity cases does not allow you to distinguish between these scenarios.

🏥 Prevalence: assessing disease burden and resource allocation

  • Why prevalence is useful: it reflects the total disease burden in a particular community.
  • Application: resource allocation decisions.
  • Example: state health departments in the Northeast and upper-Midwest spend budget on Lyme disease prevention education (e.g., billboards about tucking pants into socks) because Lyme disease is quite prevalent in those regions.
  • In contrast, the prevalence of Lyme disease in Colorado is extremely low, so health departments there would allocate resources differently.
12

Uses of Incidence and Prevalence

Uses of Incidence and Prevalence

🧭 Overview

🧠 One-sentence thesis

Incidence is used to study disease causes because it ensures exposures precede disease onset, while prevalence is used to assess disease burden for resource allocation and planning.

📌 Key points (3–5)

  • Why incidence studies causes: incidence measures only new cases in disease-free populations, so any baseline exposures definitively came before disease onset.
  • Why prevalence cannot establish causality: with prevalent cases, the disease has already happened, so we cannot determine whether the exposure or the disease came first.
  • Common confusion: distinguishing temporal order—obesity and low physical activity are associated, but prevalence data cannot tell us which came first.
  • What prevalence is good for: assessing disease burden in communities to guide resource allocation and health care planning.
  • Prevalence depends on two factors: both incidence (new cases) and disease duration (survival time), so changes in prevalence require investigating which component changed.

🔬 Why incidence reveals causality

🔬 Temporal sequence in incidence studies

  • Incidence studies follow only the population at risk—everyone is disease-free at baseline by definition.
  • Any exposures measured at the beginning of the study necessarily occurred before disease onset.
  • This temporal ordering is essential for establishing causality: the cause must precede the effect.

❌ Why prevalence cannot establish causality

Prevalence is less useful for studying causes because the disease has already happened; we have no way of knowing whether the disease or the exposure happened first (necessary for establishing causality).

  • The temporal problem: prevalent cases are existing cases, so both the disease and the exposure are already present when we observe them.
  • We cannot determine which came first.

🔄 The obesity and physical activity example

The excerpt illustrates the ambiguity with prevalent cases:

ScenarioTemporal orderInterpretation
Scenario ALow physical activity → obesityInactivity causes energy imbalance leading to obesity
Scenario BObesity → low physical activityObesity causes joint pain, reducing activity
  • Both scenarios are equally plausible when studying prevalent obesity cases.
  • Don't confuse: association with causation—prevalence data show that obesity and low physical activity are associated, but cannot distinguish which scenario is correct.
  • Example: If we measure both obesity and activity levels at the same time in existing cases, we see they occur together but cannot tell which happened first.

🏥 What prevalence is useful for

🏥 Assessing disease burden for resource allocation

Prevalence is more useful as a way of assessing the disease burden in a particular community, perhaps for purposes of resource allocation.

  • Prevalence tells us how much disease exists in a population right now.
  • This information helps decision-makers allocate resources where they are most needed.

🗺️ Geographic resource allocation example

The excerpt describes Lyme disease prevention spending:

  • Northeast and upper-Midwest states: Lyme disease is quite prevalent in these regions, so state health departments spend budget on prevention education (e.g., billboards about tucking pants into socks).
  • Colorado: Lyme disease prevalence is extremely low, so health departments should spend their money elsewhere.
  • The key: prevalence data guide where to invest prevention resources.

🏨 Health care planning example

If you know that 80% of your nursing home residents have dementia in some form, then this has implications for staffing, standard operating procedures, and potentially even for the layout and design of the space (pictorial signs on the walls to indicate the purposes of rooms, for instance).

  • Prevalence data inform:
    • Staffing levels: more dementia cases require more specialized staff.
    • Operating procedures: protocols must accommodate cognitive impairment.
    • Physical design: pictorial signs help residents with memory problems navigate.

📊 Relationship between incidence and prevalence

📊 The fundamental relationship

The excerpt states:

Prevalence is affected by both the incidence (how many new cases pop up) and the disease's duration.

  • If people live longer with a disease, they remain prevalent cases for longer.
  • Formula in words: Prevalence approximately equals incidence multiplied by average survival time after diagnosis.
  • Two components: any change in prevalence must be due to a change in incidence, survival time, or both.

🦠 HIV epidemic example

The excerpt traces HIV prevalence and incidence over time:

📈 Early 1980s: rising incidence drives prevalence

  • What happened: Before we knew how to prevent HIV, incidence kept rising—more people got infected and then infected others.
  • Why prevalence rose: The early rise in prevalence is attributable solely to rising incidence.
  • Why prevalence didn't rise faster: People died of AIDS within a few years (short survival time), limiting how long they remained prevalent cases.

📉 Mid-1980s to 1990s: prevention lowers incidence

  • What happened: We discovered how to prevent new cases (condom use, screening blood donations, universal precautions).
  • Result: Incidence went down.
  • Prevalence response: Prevalence took a couple years to catch up, then eventually leveled off.

📈 Late 1990s onward: treatment extends survival

  • What happened: In 1996, highly active antiretroviral treatments (HAART) became common, allowing people to "live with HIV."
  • Result: The increasing prevalence starting in the late 1990s is due entirely to an increase in patient survival (average duration of illness).
  • Key observation: Incidence was steady at that time, so the prevalence increase was not due to more new cases.

🧠 Interpreting prevalence changes

When a change in prevalence is observed, the smart public health professional pauses to consider whether the change is due to a change in the number of new cases (incidence) or to a change in available treatments (and thus survival).

Why this matters: the public health response differs depending on which component changed.

If prevalence rises because...Public health response
Incidence increasedFocus on prevention: reduce new cases
Survival time increasedFocus on treatment access and quality of life for existing cases
  • Don't confuse: a rising prevalence does not automatically mean more people are getting sick—it might mean people are living longer with the disease.
  • Example: If prevalence of a chronic disease rises but incidence is stable, the change is likely due to better treatments keeping people alive longer, not an epidemic of new cases.
13

Relationship between Incidence and Prevalence

Relationship between Incidence and Prevalence

🧭 Overview

🧠 One-sentence thesis

Prevalence is shaped by both how many new cases arise (incidence) and how long people live with the disease, so changes in prevalence can signal either a change in new infections or a change in survival.

📌 Key points (3–5)

  • The core relationship: Prevalence is approximately equal to incidence multiplied by average survival time after diagnosis.
  • Two drivers of prevalence: both the number of new cases (incidence) and disease duration affect how many total cases exist at any time.
  • Common confusion: when prevalence rises, it could mean either more new cases are occurring or patients are living longer—these require different public health responses.
  • Why it matters for resource planning: prevalence data help administrators allocate staffing and design services (e.g., nursing homes knowing 80% of residents have dementia).
  • Why it matters for prevention: incidence data reveal whether new cases are being prevented, independent of survival improvements.

🔗 How incidence and duration combine

🔗 The fundamental formula

Prevalence is approximately equal to incidence multiplied by the average survival time after diagnosis.

  • Prevalence counts all existing cases (new and old).
  • Incidence counts only new cases appearing in a time period.
  • If people live longer with a disease, they remain prevalent cases for longer, so prevalence grows even if incidence stays flat.
  • Example: A disease with 100 new cases per year and average survival of 5 years will have roughly 500 prevalent cases at any time.

⏱️ Duration extends prevalence

  • When treatments improve and patients survive longer, the average duration of illness increases.
  • Each new case contributes to prevalence for a longer period.
  • Don't confuse: rising prevalence does not always mean the disease is spreading faster—it may mean people are living longer with it.

📈 Reading prevalence and incidence together: the HIV example

📈 Early epidemic phase (early 1980s)

  • What happened: Incidence kept rising because the virus spread unchecked (no knowledge of prevention methods yet).
  • Why prevalence rose: More people got infected, and they infected others in turn.
  • Short survival: People died of AIDS within a few years, so prevalence rose but was limited by high mortality.
  • The excerpt states: "The early rise in prevalence is thus attributable solely to the rising incidence."

📉 Prevention phase (mid-1980s to mid-1990s)

  • What happened: Discovery of prevention methods (condom use, blood screening, universal precautions) caused incidence to drop.
  • Why prevalence leveled off: Fewer new cases meant fewer people entering the prevalent pool; prevalence took a couple of years to catch up but eventually flattened.
  • Example: Even though incidence fell, existing cases still contributed to prevalence until they died or the cohort aged out.

📈 Treatment phase (late 1990s onward)

  • What happened: In 1996, highly active antiretroviral treatments (HAART) became common, allowing people to "live with HIV."
  • Why prevalence rose again: Incidence remained steady (flat red line in the figure), but survival time increased dramatically.
  • The excerpt states: "The increasing prevalence, starting in the late 1990s, is thus due entirely to an increase in patient survival, or the average duration of illness."
  • Don't confuse: this prevalence rise does not mean the epidemic was worsening—it means treatment success.

🧠 Why distinguishing incidence from prevalence matters

🧠 Different public health responses

ScenarioWhat it signalsAppropriate response
Prevalence rises due to rising incidenceMore new cases occurringStrengthen prevention programs, investigate transmission sources
Prevalence rises due to longer survivalPatients living longer with diseaseExpand treatment access, plan for chronic care resources
  • The excerpt emphasizes: "when a change in prevalence is observed, the smart public health professional pauses to consider whether the change is due to a change in the number of new cases (incidence) or to a change in available treatments (and thus survival)."
  • Example: If a health department sees rising diabetes prevalence, they must ask: Are more people developing diabetes (incidence problem), or are diabetics living longer (treatment success)?

🏥 Resource allocation uses prevalence

  • Prevalence tells you how many people currently need services.
  • Example from the excerpt: If 80% of nursing home residents have dementia, this affects staffing levels, standard procedures, and even facility design (pictorial signs on walls to indicate room purposes).
  • Incidence alone would not capture the full burden of existing cases requiring care.

🔬 Studying disease causes uses incidence

  • The excerpt notes that incidence "is used to study disease etiology" (causes).
  • Incidence isolates new cases, making it easier to identify risk factors and transmission patterns.
  • Prevalence mixes old and new cases, which can obscure when and why disease occurs.

📊 Summary of measures

📊 Three measures of disease frequency

MeasureWhat it countsBest use case
CountsRaw number of casesExtremely rare conditions
PrevalenceNew and existing cases at a point in timeResource allocation and service planning
IncidenceOnly new cases in a time periodStudying disease causes and prevention effectiveness

📊 Two types of incidence

  • Incidence proportion: uses the number of people at risk as the denominator.
  • Incidence rate: uses the sum of person-time at risk as the denominator.
  • Both measure new cases but differ in how they account for the population at risk.
14

Risk Difference and Absolute Measures of Association

Summary

🧭 Overview

🧠 One-sentence thesis

Risk difference and related absolute measures reveal the actual magnitude of excess disease attributable to an exposure, which relative measures like risk ratio and odds ratio can mask when absolute risks are small.

📌 Key points (3–5)

  • Why absolute measures matter: Relative measures (RR, OR) can be misleading when absolute risks are tiny—a 50% reduction sounds impressive but may mean only 1 in a million fewer cases.
  • What risk difference shows: The actual excess number of cases in the exposed group compared to the unexposed, keeping the same units as incidence.
  • Derived measures: Attributable risk (AR) and number needed to treat/harm (NNT/NNH) both come from risk difference and make the causal interpretation more explicit.
  • Common confusion: Relative vs absolute—a large relative risk does not always mean a large public health impact if the baseline incidence is very low.
  • Why they are rare in literature: Absolute measures imply causation more explicitly and are harder to adjust for confounding, so ratio measures dominate published studies.

⚖️ Relative measures can mislead

🔍 The small-absolute-risk problem

  • The excerpt gives an example: incidence in exposed = 1 per 1,000,000 in 20 years; incidence in unexposed = 2 per 1,000,000 in 20 years.
  • The risk ratio is 0.5, which sounds like a 50% reduction—seemingly a major public health win.
  • But: the absolute difference is only 1 in a million, a trivial real-world impact.
  • Don't confuse: a large percentage change (relative) with a large number of cases prevented (absolute).

📊 Why ratio measures dominate

Measure typeExamplesAdvantageLimitation
Relative/ratioRR, OREasier to control for confounding; less causal languageCan exaggerate importance when baseline risk is tiny
Absolute/differenceRD, AR, NNT/NNHShows actual case counts; clearer public health impactImplies causation more explicitly; harder to adjust for confounding

📏 Risk difference (RD)

📐 Definition and calculation

Risk difference (RD): the incidence in the exposed minus the incidence in the unexposed.

  • Formula: RD = I(E+) − I(E−)
  • Units do not cancel (unlike ratio measures), so RD has the same units as incidence.
  • Example from the excerpt (smoking and hypertension):
    • I(E+) = 75 per 100 in 10 years
    • I(E−) = 33 per 100 in 10 years
    • RD = 75 − 33 = 42 per 100 in 10 years

🧠 Interpretation

  • The excerpt states: "Over 10 years, the excess number of cases of HTN attributable to smoking is 42; the remaining 33 would have occurred anyway."
  • This wording assigns a causal role to the exposure—it says how many cases are "because of" smoking.
  • That is why absolute measures are less common: they imply causation more explicitly, which requires stronger study design.

🧮 Derived absolute measures

🎯 Attributable risk (AR)

  • Formula: AR = RD / I(E+)
  • In the smoking example: AR = 42 per 100 / 75 per 100 = 56%
  • Interpretation: "56% of cases can be attributed to smoking, and the rest would have happened anyway."
  • Limitation: Diseases have multiple causes, so if you calculate AR for each cause, the sum will exceed 100%—making this measure less useful in practice.

🔢 Number needed to treat / number needed to harm (NNT/NNH)

  • Formula: NNT or NNH = 1 / RD
  • In the smoking example: NNH = 1 / 0.42 per 10 years = 2.4
  • Interpretation: "Over 10 years, for every 2.4 smokers, 1 will develop hypertension."
  • NNT is used for protective exposures (how many you need to treat to prevent one bad outcome).
  • NNH is used for harmful exposures (how many need to be exposed to cause one bad outcome).
  • The excerpt notes that for many commonly used drugs, NNTs are in the hundreds or even thousands.

🔄 Don't confuse NNT and NNH

  • Both use the same formula, but the context differs:
    • NNT: preventive intervention → number treated to prevent one case.
    • NNH: harmful exposure → number exposed to cause one case.
  • Example: If a drug has NNT = 500, you must treat 500 people to prevent one bad outcome—this helps weigh benefit against cost or side effects.

📚 Summary table from the excerpt

The excerpt concludes with a table linking study design to the type of cases and preferred measure:

Study DesignMethods SummaryIncident or Prevalent Cases?Preferred Measure of Association
CohortStart with nondiseased sample, determine exposure, follow over timeIncidentRisk ratio or rate ratio
RCTStart with nondiseased sample, assign exposure, follow over timeIncidentRisk ratio or rate ratio
Case-ControlStart with diseased (cases), recruit comparable nondiseased (controls), look at previous exposuresPrevalentOdds ratio
Cross-sectional(excerpt cuts off)Prevalent(not stated in excerpt)
  • Key takeaway: Cohort and RCT designs collect incident cases, so they use risk/rate ratios; case-control and cross-sectional designs use prevalent cases, so they use odds ratios.
  • Absolute measures (RD, AR, NNT/NNH) are not the "preferred" measure for any design in this table, reinforcing that they are less common in the literature despite their interpretive value.
15

Notifiable Conditions

Notifiable Conditions

🧭 Overview

🧠 One-sentence thesis

Notifiable condition reporting creates a legally mandated passive surveillance system that allows public health officials to detect epidemics by comparing reported cases against expected baseline levels and triggering interventions when cases exceed zero for rare or eradicated diseases.

📌 Key points (3–5)

  • What notifiable conditions are: a list of mostly infectious diseases (plus some chronic diseases and injuries) that clinicians and health departments must report to the CDC whenever encountered.
  • How the reporting chain works: clinic → local health department → state health department → CDC, ideally within days or hours for major threats.
  • Why rare/eradicated diseases stay on the list: when the expected (endemic) level is 0, even one case triggers immediate public health intervention.
  • Common confusion: most conditions require reporting of new cases (for incidence calculation), but exceptions like hepatitis C require reporting of newly diagnosed cases regardless of when infection occurred.
  • Privacy exception: notifiable condition reporting is exempt from HIPAA Privacy Rule, allowing disclosure of protected health information without patient authorization for public health purposes.

📋 The notifiable conditions system

📋 What gets reported

Notifiable conditions: a list of conditions—mostly infectious diseases, but a few chronic diseases and injuries—that must be reported to the CDC whenever encountered by clinicians or health department officials.

  • The list is reviewed and revised every year or so based on current public health threats and priorities.
  • Example: Zika virus and its associated congenital conditions were added in 2016.
  • The 2020 list is maintained and published by the CDC.

🔗 The reporting chain

The excerpt describes a three-tier reporting structure:

LevelWho reportsTo whom
1. ClinicalClinic/providerLocal health department
2. LocalLocal health departmentState health department
3. StateState health departmentCDC
  • Speed matters: reporting should happen quickly—within days, or within hours for potentially major threats.
  • Example: A nurse practitioner diagnoses measles in a patient with high fever, cough, watery eyes, and full-body rash; the clinic must report to the local health department, who reports up the chain.

📊 How cases are confirmed

  • Each condition has an associated set of case criteria.
  • Available evidence (laboratory data, symptoms, relevant exposures, physician diagnoses) is compared against these criteria to confirm or rule out a case report.
  • Why criteria exist: to ensure all epidemiologists evaluate case reports consistently.
  • Example: Lyme disease has specific case criteria last revised in 2017.

🚨 Why rare diseases stay on the list

🚨 The zero-tolerance principle

Some conditions on the notifiable list are extremely rare (human rabies, plague) or have been eradicated (smallpox), yet they remain on the list.

The logic:

  • The expected level (endemic level) for these conditions is 0.
  • These conditions are dangerous enough that even one suspected case warrants immediate public health intervention.
  • Don't confuse "rare" with "not worth tracking"—rarity makes detection more critical, not less.

📈 Detecting epidemics through comparison

An epidemic is "an increase, often sudden, in the number of cases of disease above what is normally expected in that population in that area."

  • The excerpt explains that surveillance tells us "how much is expected."
  • Example: The meningococcal meningitis outbreak at Oregon State University in 2017/2018—the expected number of cases was 0, so after only 6 cases over several weeks, the university (after consulting health departments) required students age 25 and younger to be vaccinated before registering for classes.

🔍 New cases vs. newly diagnosed cases

🔍 The standard: reporting new cases

For most notifiable conditions, reporting criteria specify new cases so that incidence can be calculated from the data.

  • Incidence requires knowing when disease onset occurs.
  • This is the default assumption for most conditions on the list.

🔍 The exception: newly diagnosed cases

Some conditions are reported as newly diagnosed cases, regardless of whether they are also new onset.

Why the exception exists:

  • Hepatitis C is challenging to identify in its initial stage because few patients exhibit symptoms.
  • Many hepatitis C infections are identified during laboratory testing for something unrelated, or once symptoms of liver damage occur—long after initial infection.
  • For such conditions, the CDC requests notification of any newly diagnosed cases, even if the infection is not recent.

Don't confuse: "new case" (recent infection/onset) vs. "newly diagnosed case" (recent identification, possibly old infection).

🔐 Privacy and legal framework

🔐 HIPAA exemption for public health

Protected health information: details about your health or the care you received.

  • The Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule (45 CFR 154.512[b]) normally prohibits health care providers from disclosing protected health information without patient permission.
  • Exception: Public health functions such as notifiable condition reporting are exempt from the HIPAA Privacy Rule.
  • The US Department of Health and Human Services states: "The HIPAA Privacy Rule recognizes the legitimate need for public health authorities and others responsible for ensuring public health and safety to have access to protected health information to carry out their public health mission."

What this means in practice:

  • Clinics can and must report notifiable conditions to health departments without asking patients for permission.
  • Many people are unaware of this exemption, even though they acknowledge HIPAA privacy policies annually at clinics.

⚠️ Underreporting caveat

  • The excerpt notes that underreporting is inherent in all passive surveillance systems, including notifiable condition data.
  • Notifiable condition reporting is a form of passive surveillance (the health department passively receives reports, rather than actively seeking them out).
  • This means the published case counts are likely lower than the true number of cases.

📊 How the data are used

📊 Weekly publication

  • The CDC publishes weekly data tabulating all reported cases of notifiable conditions.
  • The excerpt provides an example: a screenshot of notifiable conditions tables for meningitis in Oregon during the last few months of 2017.
  • Complete data tables for notifiable conditions are available on the CDC website.

📊 Calculating incidence

  • Because most conditions require reporting of new cases, the data can be used to calculate incidence (new cases over time in a population).
  • Exception: conditions like hepatitis C that report newly diagnosed cases may not accurately reflect incidence, since diagnosis can occur long after infection.
16

Cancer Registries

Cancer Registries

🧭 Overview

🧠 One-sentence thesis

Cancer registries are unique surveillance systems that track extensive diagnostic and outcome information over time, providing both incidence data and longitudinal follow-up that distinguishes them from other notifiable condition reporting.

📌 Key points (3–5)

  • What cancer registries collect: extensive information including tumor type, stage at diagnosis, histology, treatments, and eventual outcomes (death, recurrence, etc.).
  • How they differ from other notifiable conditions: cancer cases are followed over time, not just reported once at diagnosis.
  • When reporting occurs: upon diagnosis (except non-melanoma skin cancers), making registries a potential source of incidence data.
  • Common confusion: like hepatitis C, diagnosis timing matters—sometimes diagnosis occurs late in the disease process, affecting what the data represent.
  • Why they matter: multi-state data contribute to the SEER database, used for both surveillance and research purposes.

📋 What cancer registries track

📋 Reporting requirements

Cancer is a notifiable condition with more extensive reporting requirements than other conditions.

  • Physicians who diagnose cancer (other than non-melanoma skin cancers) must report to the health department.
  • The reporting is more detailed than for other notifiable conditions.

🔬 Types of information collected

Depending on the state, cancer registries may include:

  • Type of tumor
  • Stage at diagnosis
  • Histology information
  • Treatments given
  • Eventual outcome (death, recurrence, etc.)

Don't confuse: This is not just a one-time notification—the extensive detail and follow-up distinguish cancer registries from simpler notifiable condition reports.

⏱️ Timing and data sources

⏱️ When cases are reported

  • Cancer cases are reported upon diagnosis.
  • This makes cancer registries a potential source of incidence data (new cases).

⚠️ Limitations of diagnosis timing

  • Sometimes a diagnosis occurs quite late in the disease process.
  • This is the same caveat discussed for hepatitis C in the excerpt.
  • Implication: What looks like a "new case" may actually be a disease that has been present for some time.

🏥 Additional data sources

  • Cancer registries also include cases identified through autopsies or death certificates that were not diagnosed while the person was alive.
  • Underreporting issue: Not everyone has an autopsy when they die, so undiagnosed cancers among deceased people are underreported.

🔄 Unique longitudinal feature

🔄 Following patients over time

Cancer registries are somewhat unique compared to other notifiable conditions data because patients are followed over time.

  • Most notifiable condition systems report a case once.
  • Cancer registries track the same patient through treatment, recurrence, and outcome.
  • This longitudinal tracking provides richer data for understanding disease progression and treatment effectiveness.

Example: An organization diagnoses a patient with cancer in 2020, reports the initial diagnosis, then continues to report treatment outcomes and whether the cancer recurs over subsequent years.

🗄️ The SEER database

🗄️ Multi-state collaboration

Several states contribute their cancer registry data to the Surveillance, Epidemiology, and End Results (SEER) database.

  • SEER aggregates cancer registry data from multiple participating states.
  • This creates a larger, more comprehensive dataset than any single state could provide.

🔬 Uses of SEER data

The SEER database is available for:

  • Surveillance: monitoring cancer trends and patterns across populations
  • Research purposes: supporting epidemiological studies and investigations

Why it matters: The combination of detailed information, longitudinal follow-up, and multi-state scope makes SEER a powerful tool for understanding cancer patterns and outcomes at a population level.

17

Vital Statistics

Vital Statistics

🧭 Overview

🧠 One-sentence thesis

Surveillance systems collect health data from populations to monitor disease trends and detect public health threats early, enabling timely responses before crises develop.

📌 Key points (3–5)

  • Purpose of surveillance: monitor "usual" disease levels and detect potential threats early so proper responses can be mounted before crises occur.
  • How surveillance works: data are compared over time to identify trends and anomalies.
  • Types of surveillance data: includes both prevalent cases (current status) and various collection methods (telephone surveys, paper surveys, examination data).
  • Common confusion: surveillance data from surveys like BRFSS and PRAMS contain only prevalent cases, not incident cases—they capture current status, not new occurrences.
  • Practical value: surveillance data are freely available for research and widely used to monitor public health indicators like seat belt use.

📊 Major U.S. surveillance systems

📞 BRFSS (Behavioral Risk Factor Surveillance System)

A telephone-based survey of adults who self-report their health and health behaviors.

  • Adults provide information about their own health status and behaviors.
  • Data collection method: telephone interviews.
  • Example use: the seat belt use map mentioned in chapter 1 was created using BRFSS data.

🤰 PRAMS (Pregnancy Risk Assessment Monitoring System)

A paper-based survey of women who have recently given birth, reporting on health and health care utilization for themselves and their newborn(s).

  • Targets recent mothers to capture maternal and newborn health information.
  • Data collection method: paper surveys.
  • Focuses on health care use patterns as well as health status.

🔬 NHANES (National Health and Nutrition Examination Survey)

  • Mentioned as another surveillance system (referenced but not detailed in this excerpt).
  • Part of the broader network of U.S. surveillance activities.

🔍 Characteristics of surveillance data

📈 Prevalent cases only

  • The survey systems described (BRFSS and PRAMS) contain only data from prevalent cases.
  • This means they capture existing conditions at the time of survey, not new cases arising.
  • Don't confuse: prevalent cases (current status) vs. incident cases (new occurrences).

🆓 Open access and research use

  • Survey data are freely available to students and researchers.
  • Numerous articles are published each year using these datasets.
  • This accessibility supports widespread public health research and education.

🎯 How surveillance achieves its goals

⏱️ Monitoring trends over time

  • Much of the benefit from surveillance systems develops as data are compared over time.
  • Comparing data across time periods allows identification of changes and trends.
  • Example: tracking whether seat belt use is increasing or decreasing in different regions.

🚨 Early threat detection

  • The ultimate goal is to notice potential public health threats early.
  • Early detection enables mounting a proper response before a public health crisis ensues.
  • Multiple surveillance systems operate simultaneously in the U.S. to provide comprehensive coverage.

👥 Who uses surveillance data

  • Epidemiologists use surveillance data to monitor disease levels.
  • Other public health professionals rely on these systems for population health monitoring.
  • The data inform decisions about resource allocation and intervention priorities.
18

Survey-Based Surveillance Systems

Survey-Based Surveillance Systems

🧭 Overview

🧠 One-sentence thesis

Survey-based surveillance systems collect health data directly from individuals through questionnaires and exams to monitor population health trends over time, though they capture only prevalent cases rather than new diagnoses.

📌 Key points (3–5)

  • What survey-based surveillance is: direct data collection from residents using questionnaires, and sometimes physical exams and lab tests.
  • Main examples: NHANES (includes physical/lab data), BRFSS (telephone survey on health behaviors), and PRAMS (paper survey for recent mothers).
  • Key limitation: these surveys contain only data from prevalent cases, not incident cases.
  • Common confusion: survey data show who currently has a condition (prevalence), not when people first developed it (incidence)—unlike cancer registries or notifiable disease reports that capture diagnoses.
  • Why it matters: freely available data allow monitoring of trends (e.g., seat belt use maps), support research, and help identify emerging public health issues.

📋 What survey-based surveillance collects

📋 Direct data collection methods

  • The US conducts numerous surveillance activities that gather information directly from individual residents.
  • Most use questionnaires to collect self-reported information.
  • Some systems go beyond surveys: NHANES includes physical exam and laboratory data in addition to questionnaires.

🔍 Three major survey systems

SystemMethodPopulationWhat it collects
NHANESPhysical exam + lab + questionnaireGeneral populationHealth status, nutrition, clinical measurements
BRFSSTelephone surveyAdultsSelf-reported health and health behaviors
PRAMSPaper surveyWomen who recently gave birthHealth and health care use for mother and newborn
  • BRFSS = Behavioral Risk Factor Surveillance System
  • PRAMS = Pregnancy Risk Assessment Monitoring System
  • Example: the seat belt use map shown in chapter 1 was created using BRFSS data.

⚠️ Key limitation: prevalent cases only

⚠️ What "prevalent cases only" means

Prevalent cases: people who currently have a condition at the time of the survey.

  • Survey-based systems capture a snapshot of who has a condition now, not when they first got it.
  • This contrasts with systems that track new diagnoses (incidence).

🔄 Don't confuse with incidence data

  • Incidence data (from notifiable disease reports, cancer registries): capture when someone is first diagnosed.
  • Prevalence data (from surveys): capture who has the condition at survey time, regardless of when it started.
  • Example: A cancer registry records the moment of diagnosis; a survey might ask "Do you currently have cancer?" but won't tell you when it began.
  • The excerpt mentions this limitation explicitly: "they contain only data from prevalent cases."

💡 How survey data are used

💡 Monitoring trends over time

  • Public health professionals use survey data to monitor trends over time.
  • Example: tracking changes in seat belt use across years using repeated BRFSS surveys.
  • The excerpt emphasizes that "much of the benefit from these systems develops as data are compared over time."

📖 Research and accessibility

  • Survey data are freely available to students and researchers.
  • Numerous articles are published each year using these datasets.
  • This makes survey-based surveillance valuable for both operational public health work and academic research.

🎯 Broader surveillance goals

  • The ultimate goal of all surveillance (including surveys) is to notice potential public health threats early.
  • Early detection allows a proper response before a public health crisis develops.
  • Surveillance helps epidemiologists monitor the "usual" levels of disease in a population, so deviations can be spotted.
19

Conclusions

Conclusions

🧭 Overview

🧠 One-sentence thesis

Epidemiologic data are commonly summarized in 2×2 tables using relative measures like risk ratio and odds ratio, but absolute measures like risk difference are equally important for interpreting the real-world impact of associations.

📌 Key points (3–5)

  • Two main measure types: relative/ratio measures (RR, OR) vs. absolute/difference measures (RD, AR, NNT/NNH).
  • Why relative measures can mislead: a large relative risk can mask a tiny absolute risk difference (e.g., 50% reduction sounds big, but 1 in a million is small).
  • Which measure for which design: cohorts and RCTs use risk/rate ratio (incident cases); case-control and cross-sectional studies use odds ratio (prevalent cases).
  • Common confusion: relative risk tells you the ratio of risks, but risk difference tells you the excess number of cases—both are needed for full interpretation.
  • Why absolute measures matter less in literature: they imply causation more explicitly and are harder to adjust for confounding, so they appear less often despite their importance.

📏 Relative vs. absolute measures

📊 Relative measures: RR and OR

Risk ratio (RR) and odds ratio (OR) are ratio measures of association that compare the relative frequency of disease between exposed and unexposed groups.

  • Risk ratio (RR): used when you have incidence data (cohorts, RCTs).
  • Odds ratio (OR): used when you have prevalent cases (case-control, cross-sectional studies).
  • Both express how many times more (or less) likely disease is in one group vs. another.

⚠️ Why ratios can mislead

  • A large ratio can hide a small absolute difference.
  • Example from the excerpt: incidence in exposed = 1 per 1,000,000; incidence in unexposed = 2 per 1,000,000.
    • RR = 0.5 → "50% reduction in disease" sounds impressive.
    • But the absolute difference is only 1 in a million—tiny real-world impact.
  • Don't confuse: a big percentage change (relative) vs. a big number of cases (absolute).

📐 Risk difference (RD)

Risk difference (RD) = incidence in exposed minus incidence in unexposed.

  • Formula: RD = I(E+) − I(E−)
  • Units: same as incidence (e.g., per 100 in 10 years), because you subtract, not divide.
  • Interpretation: the excess number of cases attributable to the exposure.
  • Example from the excerpt (smoking/HTN):
    • I(E+) = 75 per 100 in 10 years; I(E−) = 33 per 100 in 10 years.
    • RD = 75 − 33 = 42 per 100 in 10 years.
    • Meaning: over 10 years, 42 extra cases of HTN are due to smoking; the remaining 33 would have occurred anyway.

🧮 Measures derived from risk difference

🎯 Attributable risk (AR)

Attributable risk (AR) = risk difference divided by incidence in the exposed.

  • Formula: AR = RD / I(E+)
  • Interpretation: the proportion of cases in the exposed group that can be attributed to the exposure.
  • Example from the excerpt:
    • AR = 42 per 100 / 75 per 100 = 56%.
    • Meaning: 56% of HTN cases in smokers are due to smoking; the rest would have happened anyway.
  • Limitation: diseases have multiple causes, so ARs for all causes sum to well over 100%, making this measure less useful.

💊 Number needed to treat/harm (NNT/NNH)

NNT/NNH = 1 divided by the risk difference.

  • Formula: NNT = 1 / RD
  • NNT (number needed to treat): for protective exposures—how many you need to treat to prevent one bad outcome.
  • NNH (number needed to harm): for harmful exposures—how many need to be exposed to cause one bad outcome.
  • Example from the excerpt (smoking/HTN):
    • NNH = 1 / (42 per 100 per 10 years) = 1 / 0.42 = 2.4.
    • Meaning: over 10 years, for every 2.4 smokers, 1 will develop hypertension.
  • Context: for many common drugs, NNTs are in the hundreds or thousands.

🗂️ Study design and measure pairing

📋 Which measure for which design

Study DesignCase TypePreferred MeasureWhy
CohortIncidentRisk ratio or rate ratioFollows nondiseased over time; can calculate incidence
RCTIncidentRisk ratio or rate ratioAssigns exposure, follows over time; can calculate incidence
Case-controlPrevalentOdds ratioStarts with diseased cases; cannot calculate incidence directly
Cross-sectionalPrevalentOdds ratioCollects data at one time point; uses prevalent cases

🔍 Why the distinction matters

  • Incident cases: you observe new cases over time → you can calculate true incidence → use RR.
  • Prevalent cases: you start with existing cases → you cannot calculate incidence directly → use OR.
  • Don't confuse: cohort/RCT designs collect incident data; case-control/cross-sectional designs use prevalent data.

🧠 Why absolute measures appear less often

📚 Rarity in the literature

  • The excerpt notes that absolute measures (RD, AR, NNT/NNH) are "not often seen in the literature."
  • Two reasons given:
    1. Causation: interpretation of absolute measures implies causation more explicitly (e.g., "excess cases attributable to exposure").
    2. Confounding: it is more difficult to control for confounding variables when calculating difference measures.

⚖️ Why both types are needed

  • Relative measures (RR, OR) are easier to adjust for confounding and are standard in epidemiologic reporting.
  • Absolute measures (RD, NNT/NNH) show the real-world impact and help avoid misleading interpretations.
  • Key takeaway from the excerpt: "it is nonetheless always important to keep the absolute risks (incidences) in mind when interpreting results."
20

Necessary First Step: 2 × 2 Notation

Necessary First Step: 2 x 2 Notation

🧭 Overview

🧠 One-sentence thesis

The 2 × 2 table is epidemiology's compact notation for organizing exposure and disease data from a study, enabling quick calculation of measures of association.

📌 Key points (3–5)

  • What a 2 × 2 table is: a compact summary of data for 2 variables—exposure and health outcome—from a study.
  • Standard notation: rows represent exposed (E+) vs unexposed (E−); columns represent diseased (D+) vs nondiseased (D−); cells are labeled A, B, C, D.
  • Convention matters: epidemiologists place exposure on the left (rows) and disease across the top (columns) by convention, though either arrangement works.
  • Margin totals: row and column totals (e.g., A+B, C+D, A+C, B+D) help verify data and calculate measures of association.
  • Common confusion: continuous variables (age, height) vs categorical/dichotomous variables—epidemiologists prefer continuous when possible, but 2 × 2 tables require dichotomization (exposed/not, diseased/not) for simpler math.

📊 What a 2 × 2 table represents

📊 The basic structure

A 2 × 2 table (or two-by-two table): a compact summary of data for 2 variables from a study—namely, the exposure and the health outcome.

  • It condenses individual-level data into four cells.
  • Each cell counts how many individuals fall into that combination of exposure and disease status.
  • Example: A 10-person study on smoking and hypertension records each participant's smoking status (Y/N) and hypertension status (Y/N). The 2 × 2 table summarizes: 3 people smoked and had hypertension, 1 smoked but did not have hypertension, 2 did not smoke but had hypertension, and 4 neither smoked nor had hypertension.

🔤 Exposure and disease terminology

  • Exposed (E+): individuals with the exposure (e.g., smokers).
  • Unexposed (E−): individuals without the exposure (e.g., nonsmokers).
  • Diseased (D+): individuals with the health outcome (e.g., hypertension).
  • Nondiseased (D−): individuals without the health outcome.

The excerpt emphasizes: "smoking is the exposure and hypertension is the health outcome."

🗂️ Standard notation and convention

🗂️ Cell labels: A, B, C, D

Epidemiologists use shorthand letters to refer to specific cells:

D+D−
E+AB
E−CD
  • A: exposed and diseased.
  • B: exposed but not diseased.
  • C: unexposed but diseased.
  • D: unexposed and not diseased.

This shorthand simplifies formulas for measures of association.

📐 Convention: exposure on the left, disease on top

  • The excerpt states: "it does not really matter whether exposure or disease is placed on the left or across the top of a 2 × 2 table."
  • However, the convention in epidemiology is to have exposure on the left and disease across the top.
  • Following convention ensures consistency and reduces confusion when reading or comparing studies.

🧮 Margin totals and their uses

🧮 What margin totals are

Margin totals are the sums of rows and columns:

D+D−Total
E+ABA+B
E−CDC+D
TotalA+CB+DA+B+C+D
  • A+B: total number of exposed individuals.
  • C+D: total number of unexposed individuals.
  • A+C: total number of diseased individuals.
  • B+D: total number of nondiseased individuals.
  • A+B+C+D: total sample size.

🔍 Why margin totals matter

  • They help verify that the table matches the original data (e.g., total should equal the number of participants).
  • They are "sometimes helpful when calculating various measures of association," according to the excerpt.

Example: In the 10-person smoking/hypertension study, margin totals confirm 4 smokers, 6 nonsmokers, 5 with hypertension, 5 without, and 10 total participants.

🔢 Variable types and dichotomization

🔢 Continuous vs categorical variables

Variable typeDefinitionExamples
ContinuousPossible values are infinite or close to itAge, height
CategoricalDiscrete list of possible answersReligion, favorite color
DichotomousSpecial case of categorical with only 2 possible answersYes/No, Exposed/Unexposed

⚠️ Dichotomizing continuous variables

  • It is possible to split a continuous variable into two categories (e.g., age → "old" and "young").
  • However, the excerpt warns: "it is not always advisable to do this because a lot of information is lost."
  • Additional problem: "how does one decide where to dichotomize? Does 'old' start at 40, or 65?"
  • Epidemiologists' preference: "leave continuous variables continuous to avoid having to make these judgment calls."

🧮 Why this book uses dichotomous variables

  • The excerpt states: "having dichotomous variables (a person is either exposed or not, either diseased or not) makes the math much easier to understand."
  • For teaching purposes, the book assumes "all exposure and disease data can be meaningfully dichotomized and placed into 2×2 tables."
  • Don't confuse: this is a pedagogical simplification, not a recommendation to always dichotomize in practice.

📚 Context: study designs and incidence data

📚 Where 2 × 2 tables fit

  • The excerpt mentions that the book will discuss "study designs interwoven with a discussion of the appropriate measure(s) of association for each."
  • The 2 × 2 table is introduced as a "necessary first step" before covering study designs and measures of association.
  • Two study types that collect incidence data are named: prospective cohort studies and randomized controlled trials.

📚 What incidence data implies

The excerpt states that for studies using incidence data, "we instantly know 3 things":

  1. "We are looking for new cases of disease."
  2. "There is thus some longitudinal follow-up that must occur to allow" (sentence incomplete in excerpt).

(The excerpt ends mid-sentence; no further details are provided.)

21

Studies That Use Incidence Data

Studies That Use Incidence Data

🧭 Overview

🧠 One-sentence thesis

Prospective cohort studies and randomized controlled trials both use incidence data to measure new disease cases over time, allowing researchers to calculate risk ratios or rate ratios that compare disease occurrence between exposed and unexposed groups.

📌 Key points (3–5)

  • What incidence studies require: looking for new cases, longitudinal follow-up, and starting with people at risk (without the disease at baseline).
  • Two study types that use incidence: prospective cohort studies (observational) and randomized controlled trials (experimental).
  • Key difference between cohort and RCT: in cohorts, participants self-select exposure; in RCTs, researchers randomly assign exposure.
  • Common confusion: the 2×2 table at the start of a cohort study has zero cases in the D+ column; epidemiologists mean the end-of-study table when discussing cohort results.
  • What these studies calculate: risk ratio (from incidence proportions) or rate ratio (from incidence rates), both abbreviated RR and interpreted as "times as high."

📊 2×2 tables and notation

📊 Standard cell notation

The excerpt uses shorthand letters for 2×2 table cells:

D+ (diseased)D− (not diseased)Total
E+ (exposed)ABA+B
E− (unexposed)CDC+D
TotalA+CB+DA+B+C+D
  • Margin totals help calculate measures of association and check against original data.

📊 Variables: continuous vs categorical

Continuous variables: things like age or height, where possible values are infinite or close to it.

Categorical variables: things like religion or favorite color, where there is a discrete list of possible answers.

Dichotomous variables: a special case with only 2 possible answers.

  • You can dichotomize a continuous variable (e.g., split "age" into "old" and "young"), but this loses information and requires arbitrary cutoffs.
  • Epidemiologists prefer to leave continuous variables continuous to avoid judgment calls.
  • For this book: the excerpt assumes all exposure and disease data can be meaningfully dichotomized into 2×2 tables (to make the math easier to understand).

🔬 Prospective cohort studies

🔬 What defines a cohort study

The excerpt covers four epidemiologic study types; two collect incidence data: prospective cohort studies and randomized controlled trials.

Three things we instantly know about incidence studies:

  1. Looking for new cases of disease.
  2. Longitudinal follow-up must occur to allow new cases to develop.
  3. Must start with those at risk (without the disease) as the baseline.

🎯 Procedure: starting point

Target population: contains both diseased and non-diseased individuals.

  • Researchers rarely study entire populations (too big, not logistically feasible).
  • Instead, draw a sample from the target population.
  • For a cohort study, draw a non-diseased sample (people at risk of the outcome).

🎯 Procedure: assess exposure

After drawing the sample:

  1. Assess exposure status of individuals.
  2. Determine whether each person is exposed or not.

At the beginning of the study, the 2×2 table looks like this (using a 10-person smoking/hypertension example):

D+D−Total
E+044
E−066
Total01010
  • By definition: everyone is still at risk at the start, so there are no individuals in the D+ column.
  • In this example, 5 cases of incident hypertension will occur as the study progresses—but none have occurred yet.

🎯 Procedure: follow-up

  • Follow participants for some length of time and observe incident cases as they arise.
  • Length of follow-up varies by disease process:
    • Childhood exposure and late-onset cancer: decades.
    • Infectious disease outbreak: days or hours, depending on the incubation period.

At the end of the study, the 2×2 table looks like this:

D+D−Total
E+314
E−246
Total5510
  • Important: when epidemiologists talk about a 2×2 table from a cohort study, they mean the table at the end of the study—the beginning table was much less interesting (D+ column was empty).

🔄 Retrospective cohort studies

The excerpt mentions retrospective cohorts briefly:

  • Conducted exactly like prospective cohorts in theory: start with non-diseased sample, determine exposure, "follow" for incident cases.
  • Difference: all this has already happened; reconstruct information using existing records (employment records, medical records, administrative datasets).
  • Analyzed the same way (risk ratios or rate ratios).
  • Common confusion: retrospective cohorts are often confused with case-control studies, so the book focuses exclusively on prospective cohorts.

🧪 Randomized controlled trials (RCTs)

🧪 What defines an RCT

Randomized controlled trial (RCT): procedure is exactly the same as a prospective cohort, with one exception—the investigator randomly assigns participants to "exposed" and "unexposed" groups instead of allowing self-selection.

  • Exposure status is determined entirely by chance.
  • Typically half assigned to new drug, half to old drug or placebo.
  • Required by the Food and Drug Administration for approval of new drugs.

🧪 The only difference from cohort studies

  • Cohort study: measure existing exposures (people self-select based on personal preferences and life circumstances).
  • RCT: tell people whether they will be exposed or not (researcher assigns exposure).
  • Both still measure incident disease.
  • Both calculate risk ratio or rate ratio.

🧪 Observational vs experimental

Observational studies: the researcher is merely observing what happens in real life—people self-select into being exposed or not.

  • Cohort studies, cross-sectional studies, and case-control studies are observational.

Experimental studies: the researcher is conducting an experiment that involves telling people whether they will be exposed to a condition or not.

  • Randomized controlled trials are experimental.

📐 Calculating risk ratios

📐 Overall incidence proportion

Using the smoking/hypertension example (10 years of follow-up):

  • Overall incidence proportion = (number of new cases) / (number at risk) = 5/10 = 50 cases per 100 people in 10 years.
  • Using ABCD notation: (A+C) / (A+B+C+D).

📐 Incidence by exposure group

  • Incidence among exposed (I E+): 3/4 = 75 per 100 in 10 years.
  • Incidence among unexposed (I E−): 2/6 = 33 per 100 in 10 years.

📐 Risk ratio formula and interpretation

Risk Ratio (RR): the ratio of incidence in the exposed to incidence in the unexposed.

  • Formula: RR = I E+ / I E− = (75/100) / (33/100) = 2.27.
  • Using ABCD notation: [A/(A+B)] / [C/(C+D)].
  • No units: time-dependent units cancel out.

Interpreting RR values:

  • RR > 1: more disease in exposed group (exposure may be causing disease if we assume causality).
  • RR < 1: less disease in exposed group (exposure may be preventing disease if we assume causality).
  • Null value: RR = 1.0 means no observed association (incidence identical in both groups).

📐 Correct interpretation template

Template sentence: "The risk of [disease] was [RR] times as high in [exposed] compared to [unexposed] over [x] days/months/years."

Example: "The risk of hypertension was 2.27 times as high in smokers compared to nonsmokers over 10 years."

Why "times as high" matters:

  • Be careful with words "higher" or "lower."
  • RR of 2.0 means twice as high, not 2 times more (which would be RR = 3.0, since null is 1, not 0).
  • For RR = 0.5, saying "0.5 times as high" correctly means multiplying unexposed risk by 0.5 to get exposed risk—yielding lower incidence in exposed, as expected with RR < 1.

📐 Calculating rate ratios

📐 Person-time approach

If the cohort study uses a person-time approach, the 2×2 table includes a column for sum of person-time at risk (PTAR):

D+D−TotalΣ PTAR
E+31427.3 PY
E−24652.9 PY
Total551080.2 PY

📐 Incidence rates

Incidence rate: uses person-time denominator.

  • Overall incidence rate: 5 / 80.2 = 6.2 per 100 person-years.
  • Incidence rate among exposed: 3 / 27.3 = 11.0 per 100 person-years.
  • Incidence rate among unexposed: 2 / 52.9 = 3.8 per 100 person-years.

📐 Rate ratio formula and interpretation

Rate ratio (also abbreviated RR): the ratio of incidence rates in the exposed to incidence rates in the unexposed.

  • Formula: RR = 11.0 / 3.8 = 2.9.
  • Units cancel out, leaving just a number.

Interpretation: same as risk ratio, but substitute "rate" for "risk": "The rate of hypertension was 2.9 times as high in smokers compared to non-smokers, over 10 years."

  • Still include study duration in the interpretation, even though some individuals were censored (stopped being followed before the end).
  • Knowing follow-up duration is important for applying findings in practice (100 person-years can be accumulated in many ways; knowing it was 10 years vs. 1 year or 50 years matters).

📐 "Relative risk" terminology

  • Both risk ratio and rate ratio are abbreviated RR.
  • Often referred to collectively as relative risk by epidemiologists.
  • The excerpt notes this is inconsistent lexicon in the field.
  • The book uses "risk ratio" and "rate ratio" separately to distinguish between population-at-risk vs. person-time-at-risk approaches.
  • Regardless, a measure called RR is always calculated as incidence in exposed divided by incidence in unexposed.

⏱️ Why incidence studies are resource-intensive

⏱️ Cost and time

  • Following participants while waiting for incident cases is expensive and time-consuming.
  • Epidemiologists often need faster and cheaper answers.
  • Alternative: take advantage of prevalent cases (already occurred, require no wait).
  • The excerpt transitions here to discussing studies that use prevalence data (not covered in this section).
22

Studies That Use Prevalence Data

Studies That Use Prevalence Data

🧭 Overview

🧠 One-sentence thesis

When epidemiologists need faster and cheaper answers than longitudinal studies allow, they use cross-sectional and case-control designs that rely on prevalent (already-occurred) disease cases and calculate odds ratios instead of risk ratios.

📌 Key points (3–5)

  • Why prevalence designs exist: Following participants for incident disease is expensive and time-consuming; prevalence-based studies are faster and cheaper.
  • Two main prevalence designs: cross-sectional studies (snapshot at one point in time) and case-control studies (start with diseased cases, look backward at exposures).
  • Key measure difference: Prevalence studies cannot calculate risk ratio (RR) because they lack incidence data; they calculate odds ratio (OR) instead.
  • Common confusion—OR vs RR: The OR is always further from the null value (1.0) than the RR; the more common the disease, the greater the divergence; only when disease prevalence is ≤5% does OR approximate RR.
  • What you cannot do: In case-control studies, you cannot calculate overall sample prevalence because the researcher artificially sets the proportion of diseased individuals (usually 50%).

📸 Cross-sectional studies

📸 What a cross-sectional study is

Cross-sectional studies: often called "snapshot" or "prevalence" studies; the researcher takes a snapshot at one point in time, determining who is exposed and who is diseased simultaneously.

  • The sample now includes both diseased and non-diseased people at baseline (unlike cohort studies, which start with only non-diseased).
  • Because prevalent cases are used, some proportion of the sample is already diseased when the study begins.
  • No follow-up over time is involved.

🧮 Calculating the odds ratio (OR) in cross-sectional studies

  • Formula in words: OR equals (odds of disease in exposed) divided by (odds of disease in unexposed).
  • What "odds" means statistically: the number of people who experienced an event divided by the number who did not experience it.
  • Using a 2×2 table notation: OR = (a × d) / (b × c), where a, b, c, d are the cell counts.
  • Example: In a hypothetical smoking/hypertension cross-sectional study, OR = 6.0 means the odds of hypertension were 6.0 times as high in smokers compared to nonsmokers.

📊 Interpreting OR from cross-sectional studies

  • Interpretation mirrors RR interpretation, but substitute "odds" for "risk" and omit time references (since cross-sectional studies do not involve time).
  • OR > 1 means exposure is more common among diseased.
  • OR < 1 means exposure is less common among diseased.
  • The null value is 1.0 (no association).

📈 Additional measures from cross-sectional studies

  • You can calculate overall prevalence of disease: Prevalence = (a + c) / (a + b + c + d) using 2×2 table notation.
  • Some authors call the OR from a cross-sectional study the "prevalence odds ratio" as a reminder that it uses prevalent cases; the calculation is identical.

🔍 Case-control studies

🔍 What a case-control study is

  • The researcher first draws a sample of diseased individuals (cases).
  • Then draws a sample of non-diseased individuals (controls) from the same underlying population.
  • Critical requirement: Cases and controls must come from the same underlying population, or the study will be biased.
  • After sampling, the researcher measures exposures at some point in the past (could be yesterday for foodborne illness or decades ago for osteoporosis).

🧮 Calculating OR in case-control studies

  • Again, incidence cannot be calculated because prevalent cases are used, so OR is calculated the same way as in cross-sectional studies.
  • Interpretation is identical, but now must refer to the time period because past exposure data are explicitly examined.
  • Example: "The odds of hypertension are 6.0 times as high in people who were smokers 10 years ago, compared to people who were nonsmokers 10 years ago."

⚠️ What you cannot calculate

  • You cannot calculate overall sample prevalence from a case-control 2×2 table.
  • Reason: The researcher artificially sets the prevalence in the sample (usually at 50%) by deliberately choosing diseased individuals for the cases.

🔄 Exposure OR vs disease OR (technical note)

  • Technically, case-control studies calculate the disease OR: odds of being exposed among diseased compared to odds of being exposed among non-diseased.
  • Cross-sectional studies calculate the exposure OR: odds of being diseased among exposed compared to odds of being diseased among unexposed.
  • Don't confuse: Both formulas simplify to the same final equation (a × d) / (b × c), so the interpretation is the same: OR > 1 means disease is more common in the exposed group (or exposure is more common in the diseased group—same thing).

⚖️ OR versus RR: understanding the difference

⚖️ How OR and RR diverge

  • The OR is always further from the null value (1.0) than the RR.
  • The more common the disease, the more this divergence increases.
  • When OR approximates RR: If disease prevalence is about 5% or less, the OR provides a close approximation of the RR.
  • Example: In the hypothetical smoking/hypertension example (hypertension prevalence 40%), the OR deviates substantially from the RR.

🚨 Common misuse: reporting OR from cohort studies or RCTs

  • Occasionally, cohort studies or RCTs report OR instead of RR.
  • Why this is problematic:
    • Cohorts and RCTs use incident cases, so RR is the best measure of association.
    • One common statistical technique (logistic regression) automatically calculates ORs; investigators often do not back-calculate the RR and just report the OR.
  • Two reasons this is troublesome:
    1. Human brains interpret risks more easily than odds, so risks should be used when possible.
    2. Cohort studies and RCTs almost always have relatively common outcomes; reporting OR makes the exposure seem like a bigger problem (or better solution if OR < 1) than it really is.

📏 Absolute measures of association

📏 Risk difference (RD)

  • Formula: RD = (incidence in exposed) – (incidence in unexposed).
  • Addresses a limitation of ratio measures: ratio measures can be misleading if absolute risks are small.
  • Example: If incidence in exposed is 1 per 1,000,000 in 20 years and incidence in unexposed is 2 per 1,000,000 in 20 years, RR = 0.5 (a 50% reduction), but the absolute difference is only 1 in a million.
  • Units: RD has the same units as incidence (units do not cancel when subtracting).
  • Example interpretation: "Over 10 years, the excess number of cases of HTN attributable to smoking is 42; the remaining 33 would have occurred anyway."
  • Why it's less common: Interpretation implies causation more explicitly; it is more difficult to control for confounding variables when calculating difference measures.

📏 Attributable risk (AR)

  • Formula: AR = RD / (incidence in exposed).
  • Example: AR = 42 per 100 in 10 years / 75 per 100 in 10 years = 56%.
  • Interpretation: "56% of cases can be attributed to smoking, and the rest would have happened anyway."
  • Limitation: Implies causality; because diseases have more than one cause, the ARs for each possible cause will sum to well over 100%, making this measure less useful.

📏 Number needed to treat / number needed to harm (NNT/NNH)

  • Formula: NNT or NNH = 1 / RD.
  • NNT is for preventive exposures; NNH is for harmful exposures.
  • Example: NNH = 1 / 0.42 per 10 years = 2.4.
  • Interpretation: "Over 10 years, for every 2.4 smokers, 1 will develop hypertension."
  • For protective exposures, NNT is interpreted as the number you need to treat to prevent one case of a bad outcome.
  • For harmful exposures, it is the number needed to be exposed to cause one bad outcome.
  • Many drugs in common use have NNTs in the hundreds or even thousands.

📋 Summary comparison

Study DesignMethodsIncident or Prevalent Cases?Preferred Measure of Association
CohortStart with non-diseased sample, determine exposure, follow over timeIncidentRisk ratio or rate ratio
RCTStart with non-diseased sample, assign exposure, follow over timeIncidentRisk ratio or rate ratio
Cross-sectionalSnapshot at one point in time; determine exposure and disease simultaneouslyPrevalentOdds ratio
Case-controlStart with diseased (cases), recruit comparable non-diseased (controls), look at previous exposuresPrevalentOdds ratio

🔑 Key takeaway

  • Prevalence-based designs (cross-sectional and case-control) are faster and cheaper than longitudinal designs but cannot calculate incidence-based measures (RR).
  • Always keep absolute risks (incidences) in mind when interpreting results, not just ratio measures.
23

Conclusions

Conclusions

🧭 Overview

🧠 One-sentence thesis

Different epidemiologic study designs vary widely in cost and internal validity, and while better-quality studies are generally more expensive and time-consuming, each design has specific strengths and appropriate use cases beyond simple cost or validity considerations.

📌 Key points (3–5)

  • Hierarchy of evidence: study designs range from single case reports (lowest validity) to meta-analyses (highest validity), with cross-sectional, case-control, cohort, and RCT designs in between.
  • Cost-validity trade-off: higher internal validity generally requires more time and money, except for review papers which depend on existing studies.
  • Context matters for design choice: sometimes a particular design is preferred for reasons independent of cost or validity (e.g., case-control studies are better for rare diseases).
  • Common confusion: "better" does not always mean "more appropriate"—the right study design depends on the research question, disease characteristics, and practical constraints.
  • Relying on systematic reviews: for non-researchers, well-done systematic reviews and meta-analyses provide a better overall picture with less bias than individual studies.

📊 Study design hierarchy

📊 The validity-cost spectrum

The excerpt presents a hierarchy of epidemiologic study types ordered by internal validity and cost:

Study typeRelative validityRelative costPosition in hierarchy
Case reportsLowestLowestBottom
Cross-sectionalLow-moderateModerateLower-middle
Case-controlModerateModerateMiddle
CohortModerate-highHighUpper-middle
RCTHighHighNear top
Meta-analysesHighestHigh (indirect)Top

💰 The cost paradox of reviews

  • Review papers themselves are not particularly expensive to conduct.
  • However, they cannot exist until numerous other studies have been published first.
  • When you include those prerequisite studies as indirect costs, systematic reviews and meta-analyses represent substantial cumulative time and money investment.

🔍 Choosing the right design

🎯 Beyond cost and validity

The excerpt emphasizes: "There are occasions, independent of cost or validity considerations, when one design or another is preferred."

  • The "best" study design is not always the one highest on the validity hierarchy.
  • Practical and scientific factors influence design choice.

🦠 Example: rare diseases

  • Case-control studies are preferred for rare diseases, regardless of where they fall in the validity hierarchy.
  • Why: when a disease is rare, it would be impractical or impossible to assemble a large enough cohort to observe sufficient cases.
  • Don't confuse: "preferred" here means "most appropriate for the question," not "highest validity."

⚖️ Four main study types

The excerpt identifies cross-sectional, case-control, cohort, and RCT as the 4 main study types:

  • Each has specific strengths and weaknesses.
  • Readers of epidemiologic literature should be aware of these trade-offs.
  • Design selection should match the research question and practical constraints.

📚 Using systematic reviews effectively

📚 Why reviews matter for practitioners

  • Time constraint reality: no one can keep up with the literature beyond a very narrow topic area.
  • Who benefits from individual studies: mainly researchers actively working in that specific field.
  • For public health professionals and clinicians not routinely engaging in research: systematic reviews and meta-analyses provide a much better overall picture.

✅ Identifying well-done reviews

What to look for:

  • The title should explicitly include "systematic review" or "meta-analysis."
  • The methods should mirror established systematic review protocols (comprehensive search, explicit inclusion/exclusion criteria, quality assessment, synthesis).

What to avoid:

  • Review papers not explicitly labeled as systematic reviews.
  • Papers called "integrative review," "literature review," or just "review" (anything that is not "systematic review").
  • These non-systematic reviews are extremely prone to author biases and probably should be ignored.

🛡️ Bias protection

  • Systematic reviews and meta-analyses are potentially less prone to the biases found in individual studies.
  • However, care must be taken to read well-done reviews—poorly conducted reviews can still be biased.
  • Exception noted: metasynthesis is a legitimate technique for systematic reviewing of qualitative literature.

🔬 Example from the excerpt

The excerpt references a systematic review on risk-reducing mastectomy (RRM):

  • Reviewed 21 studies on breast cancer incidence and mortality after bilateral RRM in high-risk women (BRCA1/2 carriers).
  • Found reductions in both incidence and death, particularly for BRCA1/2 mutation carriers.
  • Psychosocial findings: high satisfaction with the decision, reduced cancer worry, but diminished body image and sexual satisfaction.
  • Authors' conclusion: RRM effective but should be considered only for high-risk groups; more rigorous prospective studies needed.
  • No pooled estimate was provided, yet the authors conveyed the overall state of the literature and identified gaps.
24

Random Error

What Is Random Error?

🧭 Overview

🧠 One-sentence thesis

Random error—the unavoidable, non-systematic variation in measurements—exists in all data and differs fundamentally from bias because it fluctuates randomly rather than pushing measurements consistently in one direction.

📌 Key points (3–5)

  • What random error is: random, non-systematic errors in data that occur because no measurement system is perfect.
  • Common confusion: random error is not bias—bias is systematic (always over- or underestimating), while random error fluctuates unpredictably around the true value.
  • Where it comes from: measurement tools (scales, lab assays), human measurers (reading instruments), and participant self-reporting (memory, question clarity).
  • Magnitude varies: depends on the measurement scale (nanometers vs. centimeters) and the quality of tools (lab-grade scales vs. bathroom scales).
  • Self-report challenge: random error increases when people cannot accurately recall or report information (e.g., sleep one year ago vs. last night).

🔍 What random error is and is not

🔍 Definition and core nature

Random error: random errors in the data that occur because no measurement system is perfect.

  • It is "just what it sounds like"—unpredictable variation in measurements.
  • All data contain random errors; the question is only how much.
  • The magnitude depends on:
    • The measurement scale (molecular measurements have nanometer-level errors; human height has centimeter-level errors).
    • Tool quality (physics lab scales measure to the nearest nanogram; bathroom scales are accurate within half a pound).

⚖️ Random error vs. bias

Key distinction: direction of error.

AspectRandom errorBias
PatternFluctuates unpredictably—sometimes over, sometimes underSystematic—always pushes in one direction
NatureNon-systematicSystematic
ExampleEyeballing butter: sometimes 3.1 oz, sometimes 2.9 ozAlways underestimating or always overestimating butter
  • Don't confuse: even biased measurements contain random error within themselves.
  • Example: if you always underestimate butter (bias), each underestimate still varies slightly (random error).

📏 Not the same as inherent variability

  • Random error occurs when we measure things, not because people differ.
  • Inherent variability: people naturally have different heights, heart rates, GPA—this is not random error.
  • Epidemiology relies on this natural variability to identify risk patterns.
  • Random error is the measurement mistake on top of true differences.

🧈 Understanding through examples

🧈 The butter example

The excerpt uses measuring 6 tablespoons of butter to illustrate random error:

  • Method 1: Use marks on waxed paper (assuming they're lined up correctly).
  • Method 2: Unwrap, mark half the stick, then eyeball half of that half to get three-quarters.
  • Method 3: Eyeball the three-quarter mark from the start and slice.

All methods give "roughly 6 tablespoons" (3 ounces), good enough for baking, but not exactly 3 ounces.

  • Sometimes you're slightly over 3 ounces, sometimes slightly under—this fluctuation is random error.
  • If you always underestimated or always overestimated, that would be bias (but would still contain random error within it).

🔬 Sources of random error in epidemiology

🔬 Instrument-based error

Some measurements rely on tools that introduce random error:

  • Weight: scales with half-pound fluctuation.
  • Serum cholesterol: laboratory assays with margins of error of a few milligrams per deciliter.
  • The instrument itself is the source of random variation.

👤 Human measurer error

For some measurements, the person doing the measuring introduces random error:

  • Height: the measurer reads the scale.
  • Blood pressure: the measurer interprets the reading.
  • Example: like eyeballing butter, human judgment varies slightly each time.

📝 Self-report error

Many epidemiologic measurements rely on participant self-reporting, which introduces random error through questionnaires.

Variability by question type:

Variable typeRandom error levelExample
Stable, simple factsLess random errorSelf-reported race (people rarely check the wrong box accidentally)
Imprecise recallMore random error"In the last year, how many times per month did you eat rice?"

🤔 The "Can people tell me this?" test

A useful question to assess potential random error in self-reported data:

  • Can people theoretically answer this?
    • Most people could tell you how much sleep they got last night → less random error.
    • People would be hard-pressed to tell you how much sleep they got on the same night one year ago → more random error.
  • As the likelihood that people could accurately answer decreases, random error increases.
  • Don't confuse: whether people will tell you (honesty, social desirability) is a bias issue, not random error.
25

Quantifying Random Error

Quantifying Random Error

🧭 Overview

🧠 One-sentence thesis

Statistics—specifically p-values and confidence intervals—allow researchers to quantify the random error that inevitably exists in all epidemiologic measurements, enabling accurate interpretation of study results despite measurement imperfections.

📌 Key points (3–5)

  • Random error is unavoidable: appears in all measurements (instruments, human measurers, self-reports) but varies by variable type—some are more prone to error than others.
  • Statistics quantify random error: p-values and confidence intervals (CIs) are the primary tools for expressing how much random error may affect study conclusions.
  • Statistical significance threshold: p ≤ 0.05 is the conventional cutoff, meaning researchers accept a 5% chance of false positives (Type I error).
  • Common confusion—what p-values mean: a p-value describes the probability of your data assuming the null hypothesis is true, NOT the probability that the null hypothesis is true given your data.
  • CIs provide more information than p-values: confidence intervals show both statistical significance and the plausible range for the true population value.

📏 Sources of random error in epidemiology

📏 Instrument-based measurements

  • When researchers measure participants directly (height, weight, blood pressure, serum cholesterol), random error comes from:
    • Instrument variability: scales fluctuate by half a pound; lab assays have margins of error of a few milligrams per deciliter.
    • Human measurer variability: for height and blood pressure, the person taking the measurement introduces error (like the butter-measuring example referenced).

📝 Self-reported data

  • Many epidemiologic variables rely on participant questionnaires, which introduce their own random error patterns.
  • Less error: variables like self-reported race are quite accurate (though accidental wrong-box checks still occur).
  • More error: imprecise answers to questions like "In the last year, how many times per month did you eat rice?"
  • Key question to assess error: "Can people tell me this?"
    • People could theoretically report last night's sleep but would struggle to recall sleep from the same night one year ago.
    • Random error increases as the likelihood that people could answer decreases.

Note: Whether people will tell you (versus can tell you) relates to bias, not random error.

🧮 Understanding p-values

🎯 The null hypothesis framework

  • Research begins with a hypothesis (H₁), but statistical testing requires rephrasing as a null hypothesis (H₀).
  • Example study: comparing average height of male vs. female undergraduate students.
    • H₁: Male students are, on average, taller than female students.
    • H₀: There is no difference in mean height between male and female undergraduate students.

📊 What a p-value actually means

P-value: the probability that if you repeated the study, you would find a result at least as extreme, assuming the null hypothesis is true.

  • Example: mean height difference of 4 inches with p = 0.04 means:
    • If there really is no difference (null is true) and you repeat the study (drawing a new sample), there's a 4% chance you'll find a difference of 4 inches or more.

⚠️ Critical interpretation points

Two-tailed p-values in epidemiology

  • The 4% chance says nothing about which group is taller—just that one group (either males or females) will be taller by at least 4 inches.

P-values only apply to samples

  • If you enroll the entire population (e.g., all students in one specific class), p-values are meaningless.
  • Random measurement error still exists, but repeating the study would yield exactly the same result since you measured everyone.

The most common misinterpretation

  • ❌ Wrong: "The p-value tells us how likely the null hypothesis is to be true."
  • ✅ Correct: "The p-value tells us the likelihood of getting our data if the null hypothesis happened to be true."
  • This is a subtle but very important distinction.

🎲 Statistical significance and decision-making

🎲 The 0.05 threshold

  • Statistical significance: p ≤ 0.05 is the standard cutoff in public health and clinical research.
  • Interpretation:
    • p ≤ 0.05 → "reject the null hypothesis" → conclude there is a difference.
    • p > 0.05 → "fail to reject the null hypothesis" → conclude the data provided no evidence of a difference.

Important: "fail to reject" ≠ "accept"

  • We never "accept" the null hypothesis because proving the absence of something is very difficult.
  • Failing to reject merely means we didn't find evidence against the null—not that opposing evidence doesn't exist.
  • Possible reasons for p > 0.05: weird sample, too-small sample, etc.

🎯 Is 0.05 arbitrary?

  • Yes, absolutely. This is worth remembering, especially for p-values near the cutoff.
  • Is 0.049 really different from 0.051? Likely not, but they fall on opposite sides of the arbitrary line.

📐 What determines p-value size?

Three factors influence p-values:

FactorHow it affects p-value
Sample sizeLarger sample → smaller p-value (easier to reject null)
Effect sizeLarger true difference (e.g., 6 inches vs. 2 inches) → smaller p-value
Data consistencySmaller standard deviations around group means → smaller p-value
  • A p-value of 0.51 could almost certainly be made smaller by enrolling more people (relates to power).

⚖️ Type I and Type II errors

⚖️ Type I error (α, alpha)

Type I error: the probability that you incorrectly reject the null hypothesis—you "find" something that's not really there (false positive).

  • By choosing 0.05 as the cutoff, researchers accept that 5% of findings will be Type I errors.

⚖️ Type II error (β, beta)

Type II error: the probability that you incorrectly fail to reject the null hypothesis—you miss something that really is there.

💪 Statistical power

Power = 1 – β: the likelihood that you'll find things if they are there.

  • Ideally ≥ 90% (meaning Type II error rate is 10%), but often much lower in practice.
  • Power is proportional to sample size but exponentially:
    • Going from 40% to 45% power requires less additional sample than going from 90% to 95%.

Underpowered studies

  • If a study fails to reject the null but data suggest a large difference might exist, the issue is often insufficient power.
  • With a larger sample, the p-value would probably fall below 0.05.
  • However, small samples might also be non-representative by chance, and adding participants wouldn't necessarily drive results toward significance.
  • Example: comparing heights using only men's basketball team vs. women's gymnastics team would show an 18-inch difference that wouldn't hold up when other teams are added.

📊 Confidence intervals (CIs)

📊 Using CIs for significance testing

  • In epidemiology, 95% confidence intervals are most common (corresponding to α = 5%).
  • Significance rule: if the 95% CI does not include the null value, then p < 0.05 (statistically significant).
    • Null values: 0 for risk difference; 1.0 for odds ratios, risk ratios, and rate ratios.

📊 What a 95% CI actually means

95% CI interpretation: If you repeated the study 100 times (back to drawing your sample from the population), and the study is free of all bias, then 95 of those 100 times the CI you calculate would include the "real" answer you would get if you enrolled everyone in the population.

Visual concept

  • The population parameter (μ) represents the "real" answer from measuring everyone in the population.
  • Each study produces a CI (vertical line); most CIs (95 out of 100) contain μ.
  • The population parameter is almost always unobservable—it only becomes observable if you define your population narrowly enough to enroll everyone.

📊 Information CIs provide

Example: mean difference of 4 inches (95% CI: 1.5–7.0)

  1. Significance: p < 0.05, since the CI excludes 0 (the null value for a difference measure).
  2. Plausible range: the real difference almost certainly lies somewhere within 1.5–7.0 inches.
    • Could be as small as 1.5 inches or as large as 7 inches.

CI width and sample size

  • Larger samples yield narrower CIs.
  • Narrower CIs are better because they provide a more precise estimate of the true answer.

🔄 Summary implications

🔄 Key takeaways

  • Random error exists in all measurements, though some variables are more prone to it.
  • P-values and CIs quantify random error and guide interpretation.
  • p ≤ 0.05 typically indicates "statistical significance"; the corresponding CI would exclude the null value.
  • CIs express the potential range of the real population-level value being estimated—more informative than p-values alone.

🔄 Practical considerations

  • While random error can be minimized (high-quality instruments, staff training, good questionnaire design), it can never be eliminated entirely.
  • Statistics provide the tools to work with this unavoidable uncertainty.
  • Understanding what p-values and CIs actually mean (versus common misinterpretations) is essential for accurate study interpretation.
26

Confounding and Adjusted Measures of Association

Summary

🧭 Overview

🧠 One-sentence thesis

Confounders distort the true measure of association between exposure and outcome, so epidemiologists must control for them through study design or analysis to report adjusted estimates that reflect the real relationship.

📌 Key points (3–5)

  • What confounders do: they cause the calculated measure of association to be wrong in unpredictable ways, leading to inaccurate conclusions.
  • Three criteria for a confounder: must be statistically associated with the exposure, must cause the outcome, and must not be on the causal pathway between exposure and outcome.
  • How to control confounders: via study design (restriction, matching, randomization) or during analysis (stratification, regression).
  • Common confusion: controlling for a variable on the causal pathway is incorrect—the exposure must not be causing the confounder.
  • The 10% rule: if crude and adjusted estimates differ by more than 10%, the variable is considered a confounder and the adjusted estimate should be reported.

🔧 What confounders are and why they matter

🧩 Definition and impact

Confounders are variables—not the exposure and not the outcome—that affect the data in undesirable and unpredictable ways.

  • In confounded data, you calculate the wrong measure of association.
  • The direction of the error is unknown: you cannot tell whether your estimate is too high or too low.
  • This leads to inaccurate conclusions unless you control for the confounder.

✅ The three criteria

A variable is a potential confounder only if all three conditions hold:

  1. Associated with the exposure: the variable and the exposure must be statistically linked.
  2. Causes the outcome: the variable must be a cause of the disease or outcome.
  3. Not on the causal pathway: the exposure must not cause the confounder (i.e., the confounder is not a mediator).

Don't confuse: A variable on the causal pathway (where exposure → confounder → outcome) should not be controlled as a confounder. In uncertain cases, the excerpt recommends doing the analysis both ways.

🛠️ Methods to control confounding

🎨 Study design approaches

Confounders can be controlled before data analysis through three design strategies:

MethodWhat it does
RestrictionLimit the study population to exclude variation in the confounder
MatchingPair participants so that confounder levels are balanced between exposed and unexposed
RandomizationRandomly assign exposure to balance all confounders (known and unknown) across groups

📐 Analysis approaches

When confounders are not controlled in the design, they can be handled during data analysis:

  • Stratification: create separate 2×2 tables for each level (stratum) of the confounder, then calculate a weighted average (e.g., Mantel-Haenszel method).
  • Regression: the model calculates an adjusted measure of association by accounting for multiple confounders simultaneously.

Both methods produce an adjusted odds ratio (for case-control studies) or adjusted risk ratio / rate ratio (for cohort studies or RCTs).

Example: If you suspect smoking confounds the relationship between oral contraceptive use and ovarian cancer, you can stratify by smoking status (never smoked, smoked 1 month, smoked 2 months, etc.) and compute a weighted average across all strata.

📊 Interpreting and reporting adjusted estimates

📝 How to state adjusted results

The excerpt gives an example: the adjusted odds ratio for oral contraceptive use and ovarian cancer is 0.44.

You can report this in multiple equivalent ways:

  • "Women who have ovarian cancer are 0.44 times as likely to report a history of OCP use compared to women without ovarian cancer, controlling for smoking."
  • "…adjusting for smoking."
  • "…holding smoking constant."

All three phrases signal that smoking was treated as a confounder. The key is to make it clear that the measure of association accounts for the confounding variable.

🔍 The 10% change rule

If the crude and adjusted estimates of association are more than 10% different, the variable should be considered a confounder, and one would report the adjusted estimate.

  • Compare the crude (unadjusted) measure to the adjusted measure.
  • A difference greater than 10% indicates meaningful confounding.
  • In that case, report the adjusted estimate because it controls for the confounder and is more accurate.

Don't confuse: A small difference (<10%) suggests the variable is not an important confounder for that particular relationship, even if it meets the three criteria.

🧭 Choosing which confounders to control

📋 Building a list of potential confounders

The excerpt outlines a systematic process:

  1. List all variables that might cause the outcome.
  2. Check that each variable is associated with the exposure.
  3. Ensure the variable is not on the causal pathway (i.e., the exposure does not cause the confounder).

Variables that meet all three criteria are potential confounders.

⚖️ Deciding which to include in the analysis

  • Regression allows you to control for many confounders at once.
  • One approach: drop confounders that do not meet the 10% change criterion.
  • The excerpt notes that there are additional nuances beyond the scope of the text, and experts differ in their opinions.

🔎 Evaluating published studies

When reading the literature, ask:

  • Did the authors consider all potential confounders you identified?
  • If an obvious confounder is missing, did they explain why?
  • If not, the study may be less valid.

Example: If you are reading a study on exposure A and disease B, and you know that variable C causes B and is associated with A, but the authors did not control for C, the results may be confounded and the conclusions unreliable.

27

Internal versus External Validity

Internal versus External Validity

🧭 Overview

🧠 One-sentence thesis

Internal validity determines whether a study's results are trustworthy at all, while external validity determines how broadly those results can be applied—and a study must have internal validity before external validity even matters.

📌 Key points (3–5)

  • Internal validity concerns whether the study was conducted correctly (design, measurement, analysis); without it, results should not be believed.
  • External validity (generalizability) concerns whether results from the sample can be applied to the broader target population.
  • Hierarchy rule: if a study lacks internal validity, stop—there is rarely a need to assess external validity; only internally valid studies warrant generalizability assessment.
  • Common confusion: representativeness affects external validity, but the importance depends on the research question—biological questions are less sensitive to sample demographics than behavioral questions.
  • Selection bias can harm either internal or external validity, depending on how it operates.

🔍 Understanding the two validities

🔍 Internal validity: the inner workings

Internal validity refers to the inner workings of a study: Was the best design used? Were variables measured in a reasonable way? Did the authors conduct the correct set of analyses?

  • It asks: "Was this study done correctly?"
  • It cannot be measured or quantified numerically, but epidemiologic and biostatistical knowledge allows a qualitative appraisal.
  • If present: we can believe the results.
  • If absent: the study has major methodologic issues; we probably should not accept the results.

🌍 External validity: generalizability

External validity refers to how well the results of this particular study could be applied to the larger population.

  • It asks: "Can we apply these findings beyond this specific sample?"
  • Recall: the target population is the group about whom we wish to say something, using data collected from our sample.
  • A study can be internally valid (conducted correctly) but still have a sample that is not sufficiently representative of the target population.
  • If present: results generalize to the broader target population.
  • If absent: results generalize only to a narrower subpopulation similar to the sample.

⚖️ The hierarchy: internal first, external second

  • Stop rule: If a study lacks internal validity, stop. There is rarely a need to assess it further.
  • Proceed rule: If a study does seem to have internal validity, we then assess external validity.
  • This hierarchy reflects that flawed execution makes results meaningless, regardless of how representative the sample might be.

📚 Concrete example: physical activity in pregnancy

📚 Study 1: Broad cohort (good external validity)

  • Design: Large pregnancy cohort with lenient inclusion criteria—all women pregnant with a singleton fetus planning to deliver at a certain hospital were eligible.
  • Sample characteristics: Hundreds of exposures and dozens of outcomes; pregnant people were mostly sedentary, much like the general population.
  • Implication: Because the sample mirrors the general population's activity levels, results can be generalized to all pregnant women.

📚 Study 2: Active-only cohort (limited external validity)

  • Design: Advertisement mentioned studying exercise in pregnancy (rather than pregnancy in general).
  • Sample characteristics: Very few sedentary people; some reported running half marathons while pregnant—not normal for the general population.
  • Internal validity: Reasonable—the study was conducted correctly.
  • External validity: Limited—results cannot be generalized to all pregnant women, only to the subpopulation of highly active pregnant women.
  • Key point: Because it has good internal validity, results are still valid for the narrower group (highly active pregnant women), just not for everyone.

🧬 When does representativeness matter?

🧬 Biological vs behavioral research questions

The extent to which non-representative samples affect external validity depends on the research question.

Research question typeRepresentativeness concernReason
Biological (e.g., "Do statins lower serum cholesterol levels?")LowPhysiology does not usually vary greatly between people with different demographic characteristics (exception: sex differences); one person's body likely processes drugs nearly identically to most others'.
Behavioral (e.g., "If you prescribe statins for people with high cholesterol, will they live longer?")HighBehavior varies greatly by demographics and social context; requires behavior on both clinician's part (providing prescription) and patient's part (filling and taking medication as directed).
  • Don't confuse: A biological mechanism question can tolerate less representative samples, but a question involving adherence, access, or decision-making cannot.

🎯 Selection bias and validity

🎯 How selection bias affects validity

Selection bias can affect either the internal or the external validity of a study.

  • Affecting external validity: The exercise-in-pregnancy example (non-representative sample) is selection bias affecting external validity—results are generalizable only to the subset from whom the sample was actually drawn, not the entire population.
  • Not ideal, but recoverable: One can easily recover by simply narrowing the stated target population (e.g., from "all pregnant women" to "highly active pregnant women").
  • The excerpt notes that selection bias can also affect internal validity, but does not elaborate further in this section.

📊 Visual summary

📊 Internal vs external validity diagram

The excerpt references Figure 6-1, which uses a cohort diagram to illustrate the difference between internal and external validity. The same principle applies to all study designs.

  • Internal validity: concerns the correctness of what happens inside the study (design, measurement, analysis).
  • External validity: concerns the bridge from the study sample to the broader target population.
28

Selection Bias

Selection Bias

🧭 Overview

🧠 One-sentence thesis

Selection bias can undermine either external validity (limiting generalizability) or internal validity (making results fundamentally flawed), depending on whether it affects representativeness or creates incomparable study groups.

📌 Key points (3–5)

  • Two types of selection bias: one affects external validity (non-representative samples), the other affects internal validity (groups drawn from different populations).
  • External validity selection bias is recoverable—you simply narrow the target population to match who was actually studied.
  • Internal validity selection bias is much worse—the study results cannot be applied to anyone because the exposed/unexposed or diseased/nondiseased groups are not comparable.
  • Common confusion: representativeness vs. comparability—a non-representative sample may still have good internal validity if both groups come from the same underlying population.
  • Hidden sources: differential participation rates, loss to follow-up, and recruiting one group from workers and another from the general population can all introduce selection bias affecting internal validity.

🔍 Two faces of selection bias

🌍 Selection bias affecting external validity

Selection bias affecting external validity: when a sample is not representative of the underlying population, limiting generalizability.

  • What it means: Your sample doesn't reflect the full population you want to study.
  • Why it happens: You recruit from a subset (e.g., only highly active pregnant women instead of all pregnant women).
  • Impact: Results apply only to the subset you actually recruited, not the broader population.
  • Recovery: Narrow your target population to match who you studied.
  • Example: A pregnancy exercise study recruits only highly active women → results generalize only to highly active pregnant women, not all pregnant women.

Key diagnostic questions:

  • "Who did the researchers get?"
  • "Who did they miss?"

⚠️ Selection bias affecting internal validity

Selection bias affecting internal validity: when exposed and unexposed groups (cohort study) or diseased and nondiseased groups (case-control study) are not drawn from the same population.

  • What it means: Your comparison groups are fundamentally different populations, not just different exposure/disease statuses.
  • Why it's worse: The study has fundamental flaws; results cannot be applied to anyone.
  • Impact: You cannot tell whether differences in outcomes are due to the exposure or due to the groups being drawn from different populations.
  • Example: In a maternal physical activity study, the "active" group was recruited from a prenatal exercise class (people who voluntarily pay for exercise classes), but the "sedentary" group was recruited from prenatal care clinics (general pregnant population) → the groups differ in ways beyond just activity level.

Key diagnostic questions:

  • "Who did they get? Who did they miss?"
  • "Was this different between the two groups?"

🔄 Don't confuse: representativeness vs. comparability

ConceptWhat it affectsCan you recover?Example
Non-representative sampleExternal validityYes—narrow your target populationStudy only highly active women → results apply only to highly active women
Non-comparable groupsInternal validityNo—fundamental flawActive group from exercise class, sedentary group from clinics → groups differ beyond activity
  • A study can have good internal validity even with a non-representative sample, as long as both groups come from the same underlying population.
  • The excerpt emphasizes: "This sort of selection bias [affecting external validity] is not ideal, but one can easily recover."

🕳️ Hidden sources of selection bias

📉 Differential participation and dropout

  • Participation rate differences: If one group has higher or lower participation rates, the groups may not reflect the same underlying population.
    • People who agree to participate are different from those who don't.
  • Loss to follow-up: If one group has more dropouts than the other, this creates selection bias.
    • Example: In studies of older adults, the sickest patients tend to drop out because they become too sick to attend clinic visits. If this happens more in one group, it leads to selection bias.
  • Why it matters: These differences can make groups incomparable, affecting internal validity.

💼 Healthy worker bias

Healthy worker bias: people who can work are generally healthier than the overall population because the overall population includes people who are too sick to work.

Two ways it can affect studies:

  1. External validity: Studies recruiting from workers may lack generalizability to the overall population.

    • This is acceptable as long as you're careful when applying results.
  2. Internal validity: If one group is recruited from workers and the other from the general population, the groups are not comparable.

    • Example: If Factory A has a suspected environmental toxin, the exposed group (Factory A workers) must be compared to workers from somewhere else—not spouses or neighbors (who may or may not work).
    • The key: both groups must be drawn from working populations.

🧭 When representativeness matters

The excerpt distinguishes two types of research questions:

Question typeRepresentativeness concernWhy
Biological/physiological (e.g., "Do statins lower cholesterol?")LowerPhysiology doesn't vary much between people with different demographics (except sex)
Behavioral (e.g., "If you prescribe statins, will people live longer?")HigherBehavior varies greatly by demographics and social context; requires both clinician behavior (prescribing) and patient behavior (filling and taking medication)
  • The excerpt notes: "my body likely processes statin drugs in a nearly-identical manner to that of most other women's."
  • But behavior-based questions require careful attention to representativeness.

🔀 Misclassification bias (brief mention)

📦 What misclassification means

Misclassification: measuring things incorrectly, such that study participants get put into the wrong box in the 2×2 table.

  • Calling someone "diseased" when they're not (or vice versa).
  • Calling someone "exposed" when they're not (or vice versa).

🏃 Example: over-reporting physical activity

  • In a pregnancy exercise study, women are classified as "exposed" if they meet activity guidelines (30 minutes of moderate activity, most days).
  • Problem: People generally over-report their physical activity levels.
  • Result: Women who actually got just under the recommended amount will over-report and be incorrectly bumped into the "exposed" group.
  • Impact: The 2×2 table will show incorrect numbers—some truly unexposed women will appear in the exposed rows.

Illustration from the excerpt:

  • What the data should look like: 200 exposed with disease, 100 exposed without disease, 300 unexposed with disease, 400 unexposed without disease.
  • What it might actually look like after misclassification: 230 exposed with disease, 140 exposed without disease, 270 unexposed with disease, 360 unexposed without disease.
  • The misclassification shifts people between rows, distorting the true relationship.
29

Misclassification Bias

Misclassification Bias

🧭 Overview

🧠 One-sentence thesis

Misclassification bias—putting study participants into the wrong exposure or disease categories—distorts estimates of association, with differential misclassification posing a fatal threat to internal validity while nondifferential misclassification is more manageable.

📌 Key points (3–5)

  • What misclassification is: measuring things incorrectly so participants are placed in the wrong boxes of the 2×2 table (wrong exposure or disease status).
  • Nondifferential vs differential: nondifferential occurs at the same rate across all groups and may still preserve correct ranking; differential occurs unequally across groups and is a fatal threat to validity.
  • Common confusion: older textbooks claim nondifferential misclassification always biases toward the null, but it can bias in either direction—you cannot know which way.
  • Sources of misclassification: self-report errors (over-reporting, social desirability), interviewer/clinician bias, recall bias, and missing data patterns.
  • Why it matters: all studies have some bias; differential misclassification renders a study useless or requires extreme caution, while nondifferential misclassification is manageable if understood.

🔍 What misclassification means

🔍 The core definition

Misclassification refers simply to measuring things incorrectly, such that study participants get put into the wrong box in the 2×2 table: we call them "diseased" when really they're not (or vice versa); we call them "exposed" when really they're not (or vice versa).

  • It is not random noise—it is systematic error (bias).
  • It affects the estimate of association by giving the wrong answer.
  • Example: In a study of exercise during pregnancy, if women over-report their physical activity, some who actually fell short of guidelines will be incorrectly classified as "exposed" (meeting guidelines).

📊 How it distorts the 2×2 table

The excerpt provides a concrete scenario:

  • Correct data (unobservable in real life):
    • E+ D+: 200, E+ D−: 100
    • E− D+: 300, E− D−: 400
    • Odds ratio = 2.67
  • Biased data (what we actually collect):
    • E+ D+: 230, E+ D−: 140
    • E− D+: 270, E− D−: 360
    • Odds ratio = 2.19
  • The biased estimate is closer to the null (1.0), but the direction of bias is unpredictable.

🔀 Nondifferential misclassification

🔀 What makes it nondifferential

Nondifferential misclassification occurs at the same rate in both the diseased and nondiseased groups.

  • In the exercise example, 10% of unexposed were incorrectly classified as exposed in both the diseased and nondiseased groups.
  • It is a systematic error (bias), not random error—random error would go in both directions, but here over-reporting only goes one way.

📉 Does it always bias toward the null?

  • Common confusion: Some older textbooks claim nondifferential misclassification always biases toward the null (closer to 1.0 for odds ratios).
  • Reality: This is not true—it can bias in either direction.
  • Implication: You cannot know which way the bias is going or by how much.

✅ Why it's manageable

  • Even with misclassification, if everyone overestimates their exposure by the same amount, you can still rank people correctly.
  • Example: If everyone adds 30–60 minutes to their weekly exercise totals, you can still tell the least active from the most active.
  • The estimate of association is almost certainly not "correct," but you can often still make valid qualitative statements (e.g., "more exercise is associated with lower risk").
  • This statement will likely remain true even if you could correct for the overestimate.

⚠️ Differential misclassification

⚠️ What makes it differential

With differential misclassification, some people are put into the wrong boxes in the 2×2 table, but this time it is not equally distributed across all study groups.

  • Diseased people may misreport more than nondiseased people.
  • Investigators may subconsciously classify someone as "diseased" more often if they know the person is exposed.
  • Example: If only diseased participants over-report their exposure, the misclassification rate differs between diseased and nondiseased groups.

💀 Why it's a fatal threat

  • Differential misclassification is considered a fatal threat to a study's internal validity.
  • Study authors often acknowledge measurement errors but claim they are nondifferential (and therefore less serious).
  • Don't trust blindly: Think it through yourself before citing such work.
  • The excerpt warns that differential misclassification is more common than many admit.

🗂️ Other names and sources

🗂️ Alternative names for misclassification

Misclassification goes by many names, all describing the same underlying problem:

  • Social desirability bias
  • Interviewer bias
  • Clinician bias
  • Recall bias

Regardless of the name, it boils down to people being called exposed when they're not, not exposed when they are, diseased when they're not, or not diseased when they are.

🤔 Self-reported data and misclassification

When considering self-reported data, ask two questions in order:

  1. "Can people tell me this?" If not, stop.
  2. "Will people tell me this?" If not, the data may have misclassification bias.

Example: People generally over-report physical activity levels; in the US, people do not like to talk about money, so income questions often go unanswered.

📉 Missing data as misclassification

📉 When missing data creates bias

  • Missing data on individual variables leads to misclassification.
  • Example: If income questions go unanswered, and the kind of person who leaves it blank is different from those who answer, the data are not missing at random.
  • If data are truly missing at random (e.g., due to a quirk in page layout), the only result is a slightly smaller sample size and less power—no adverse effects.
  • Reality: Data are almost never missing at random; they are missing according to some pattern, creating bias.

🚩 Red flags for missing data

  • If an important variable is missing for more than 5% of the sample and the authors don't discuss it, be wary of the results—they are probably biased.
  • Study authors often don't mention missing data at all (the excerpt notes this is "strange but true").

🔬 Sensitivity analyses

🔬 What sensitivity analysis is

Sometimes called bias analysis, a sensitivity analysis is a set of extra analyses conducted after the main results of a study are known, with the goal of quantifying how much bias there might have been and in which direction it shifted the results.

  • Not all research questions and datasets are amenable to sensitivity analysis.
  • There is no set way of conducting it; instead, one examines all assumptions made during analysis and tests whether those assumptions drove the results.

🧪 How it works

Example from the exercise study:

  • Main analysis: Anyone meeting guidelines is "active"; all others are "sedentary."
  • Sensitivity analysis: Change the cutoff—now anyone accumulating 2+ hours per week is "active" (even though this is less than guidelines).
  • Interpretation: If the new estimate is close to the original, the choice of cutoff did not extensively affect results.
  • This does not prove the original results are correct, but it lessens the possibility that slightly different methods would yield vastly different answers.

🌐 Publication bias

🌐 What publication bias is

Publication bias arises because papers with more exciting results are more likely to get published.

  • A paper whose main finding is "there is no association between x and y" is difficult to publish.
  • There is an entire legitimate, peer-reviewed journal dedicated solely to publishing these "negative results."

🌐 How it affects the literature

  • This bias does not apply to individual studies, but to areas of the literature as a whole.
  • If papers with larger estimates of association and/or smaller p-values are more likely to be published, the entire body of literature on a topic is biased.
  • Example: When looking at whether elderly people should take prophylactic aspirin to prevent heart attacks, you only see the exciting papers—all the papers showing no effect were not published.
  • Implication: Keep this in mind whenever doing literature searches.

🎯 Practical takeaways

🎯 All studies have bias

  • All epidemiologic studies include bias.
  • Investigators can minimize biases through good design and measurement methods, but some will always remain.

🎯 Which biases are manageable vs fatal

Type of biasImpactManageability
Differential misclassificationFatal threat to internal validityRenders study useless or requires extreme caution
Selection bias affecting one group more than anotherThreatens internal validityRenders study useless or requires extreme caution
Nondifferential misclassificationDistorts estimate but may preserve rankingManageable if limitations are understood
Selection bias operating on entire sampleAffects external validity onlyManageable if limitations are understood

🎯 What to watch for

  • Missing data and the extent to which non-participation or non-compliance might have affected results should always be considered carefully.
  • When authors claim measurement errors are nondifferential and therefore irrelevant, think it through yourself before agreeing.
  • If important variables are missing for >5% of the sample without discussion, be wary.
30

Publication Bias

Publication Bias

🧭 Overview

🧠 One-sentence thesis

Publication bias distorts the overall body of literature on a topic because studies with exciting, positive results are more likely to be published than those showing no association.

📌 Key points (3–5)

  • What publication bias is: papers with more exciting results (larger associations, smaller p-values) are more likely to get published than papers showing no effect.
  • When it applies: this bias affects entire areas of literature, not individual studies—it distorts the overall picture when reviewing all research on a topic.
  • The "negative results" problem: papers finding "no association between x and y" are so difficult to publish that a dedicated peer-reviewed journal exists solely for them.
  • Common confusion: unlike selection bias or misclassification, publication bias doesn't invalidate a single study's internal validity; instead, it skews what you see when surveying all published work.
  • Why it matters: when doing literature searches, the published papers give a biased picture because studies showing no effect were never published.

📚 What publication bias means

📰 The core mechanism

Publication bias arises because papers with more exciting results are more likely to get published.

  • A study finding "there is no association between x and y" is difficult to publish.
  • Studies with larger estimates of association and/or smaller p-values have better publication chances.
  • This creates a systematic filter: exciting findings pass through, null findings do not.

🚫 The fate of negative results

  • Papers showing no effect are so hard to publish that an entire legitimate, peer-reviewed journal is dedicated solely to publishing these "negative results."
  • This illustrates how strong the bias is: the publishing system itself resists null findings.

🔍 How publication bias differs from other biases

🌐 Literature-level vs study-level

AspectPublication biasSelection bias / Misclassification
ScopeAffects entire bodies of literatureAffects individual studies
What it distortsThe overall picture when reviewing a topicA single study's internal validity
When you encounter itDuring literature searchesDuring study design or analysis
  • Publication bias does not apply to individual studies.
  • It operates at the meta level: when you try to look at the entire body of literature on a given topic.

🧩 The distorted picture

  • Example scenario: "Should elderly people take prophylactic aspirin to prevent heart attacks?"
  • When you search the literature, you only see papers that found an effect of aspirin.
  • All the papers that showed no effect of aspirin on heart attack were not published.
  • The picture you get is biased because only the exciting papers were published.

🔎 Implications for literature searches

📖 What to keep in mind

  • Whenever you are doing literature searches, remember that publication bias is operating.
  • The published literature is not a representative sample of all research conducted.
  • Studies showing no association are systematically missing from what you can find.
  • This topic is discussed further in chapter 9 (according to the excerpt).

⚠️ Practical caution

  • Don't assume that because multiple published studies show an association, the association is real.
  • The absence of published null findings doesn't mean null findings don't exist—they may simply be unpublished.
  • Be aware that your evidence base is filtered by what was exciting enough to publish.
31

Conclusion

Conclusion

🧭 Overview

🧠 One-sentence thesis

Effect modification is an interesting finding that should be reported by presenting stratum-specific measures of association, unlike confounding which we control for by reporting adjusted measures.

📌 Key points (3–5)

  • What effect modification is: when stratum-specific measures of association differ from each other and the crude measure lies between them.
  • How it differs from confounding: confounding is something to eliminate through adjustment, while effect modification is a meaningful finding to report.
  • How to detect it: conduct a stratified analysis and compare stratum-specific measures to each other and to the crude measure.
  • Common confusion: distinguishing effect modification from confounding—if stratum-specific measures are similar to each other but differ ≥10% from the crude (which does not fall between them), it's a confounder; if stratum-specific measures differ from each other and the crude lies between them, it's an effect modifier.
  • What to report: for effect modification, report the separate stratum-specific measures; for confounding, report an adjusted measure that controls for the confounder.

🔬 Three-step analysis workflow

📊 Step 1: Calculate crude measure

  • Calculate the crude measure of association while ignoring the covariable.
  • This serves as the baseline comparison for later steps.
  • Example: In a cross-sectional study of physical activity and dementia, the unadjusted odds ratio is 2.0.

🗂️ Step 2: Calculate stratum-specific measures

  • Create separate 2×2 tables for each level of the covariable.
  • Calculate a measure of association for each stratum.
  • Example: When stratifying by marital status, calculate one OR for "currently married" and another for "not currently married."

🔍 Step 3: Compare measures to classify the covariable

The comparison determines whether the covariable is a confounder, effect modifier, or neither:

PatternClassificationAction
Stratum-specific measures similar to each other AND ≥10% different from crude (crude does NOT fall between them)ConfounderReport adjusted measure
Stratum-specific measures different from each other AND crude lies between themEffect modifierReport stratum-specific measures
Stratum-specific measures similar to each other AND crude lies between them AND <10% differenceNeitherReport crude measure

🧩 Distinguishing confounding from effect modification

🎯 When it's a confounder

A covariable is a confounder when stratum-specific measures are similar to each other and at least 10% different than the crude measure, which does not fall between them.

  • The key is that the stratum-specific estimates are similar to each other but different from the crude.
  • The crude measure does not fall between the two stratum-specific measures.
  • Example: Physical activity and dementia study—crude OR = 2.0; among currently married OR = 3.1; among not currently married OR = 3.24. The stratum-specific measures (3.1 and 3.24) are similar to each other but both differ substantially from the crude (2.0), so marital status is a confounder.
  • Don't confuse: The crude falling between stratum-specific measures points toward effect modification, not confounding.

⚡ When it's an effect modifier

A covariable is an effect modifier when stratum-specific measures are different from each other and the crude measure lies between them.

  • The key is that the stratum-specific estimates are different from each other.
  • The crude measure falls between the two stratum-specific measures.
  • Example: Mediterranean diet trial for preterm birth—crude RR = 0.90; among nulliparas RR = 0.60; among multiparas RR = 1.15. The stratum-specific measures (0.60 and 1.15) differ substantially from each other, and the crude (0.90) lies between them, so parity is an effect modifier.
  • Why it matters: Effect modification reveals that the exposure has different effects in different subgroups, which is scientifically interesting and clinically relevant.

❌ When it's neither

  • Stratum-specific measures are similar to each other.
  • The crude lies between them.
  • The difference from crude is less than 10%.
  • Example: Melanoma and tanning bed study—crude OR = 3.5; among men OR = 3.45; among women OR = 3.56. The stratum-specific measures are very similar (both approximately 3.5), so gender is neither a confounder nor an effect modifier.

📝 Reporting requirements

📋 For confounding

  • Report an adjusted measure of association that controls for the confounder.
  • Do not report the crude measure as the main finding.
  • Example: In the physical activity and dementia study, report the adjusted OR of approximately 3.18, not the crude OR of 2.0.

📋 For effect modification

  • Report the stratum-specific measures of association separately.
  • Do not combine them into a single adjusted measure.
  • Example: In the Mediterranean diet trial, report RR = 0.60 for nulliparas and RR = 1.15 for multiparas separately, showing the exposure has opposite effects in the two groups.

📋 For neither

  • Report the crude estimate of association.
  • No adjustment or stratification is needed.
  • Example: In the melanoma study, report the crude OR of 3.5.

🔄 Special case: Same variable as both

🔄 Can a variable be both a confounder and effect modifier?

  • Yes, theoretically, though rarely seen in practice.
  • Usually occurs when a continuous covariable is dichotomized for stratified analysis.
  • Example: Age dichotomized as "older than 50" versus "50 or younger" may miss important nuances—51-year-olds are not like 70-year-olds.
  • This can result in:
    • Further effect modification with more categories (which would reduce statistical power).
    • "Residual" confounding that persists after adjustment.
  • Don't confuse: This is an edge case; in most analyses, a covariable acts as either a confounder or an effect modifier, not both.

🎯 Philosophical difference

🎯 How to think about confounding vs effect modification

  • Confounding: An unwanted distortion whose effects we want to eliminate through adjustment.
    • Goal: Remove the bias to get a single, correct measure of association.
  • Effect modification: An interesting finding in and of itself that we want to report.
    • Goal: Reveal how the exposure's effect varies across subgroups.
  • This fundamental difference drives all downstream decisions about analysis and reporting.
32

Criteria for Confounders

Criteria for Confounders

🧭 Overview

🧠 One-sentence thesis

A variable can be a potential confounder only if it meets three specific criteria: it must be statistically associated with the exposure, must cause the outcome, and must not lie on the causal pathway between exposure and outcome.

📌 Key points (3–5)

  • Three mandatory criteria: A potential confounder must (1) be associated with the exposure, (2) cause the outcome, and (3) not be on the causal pathway.
  • Association vs. causation for exposure: The confounder only needs to be statistically associated with (disproportionately distributed across) the exposure—it does not need to cause it.
  • Causation required for outcome: The confounder must have a causal link to the outcome, not merely be associated with it.
  • Common confusion: Variables on the causal pathway (mediators) are not confounders—if the exposure causes the variable which then causes the outcome, it fails criterion #3.
  • Why it matters: Reporting a confounded measure of association produces incorrect results; controlling for confounding is essential for valid study conclusions.

🔍 The three criteria explained

🔍 Criterion #1: Associated with the exposure

Association is a statistical term that does not necessarily imply a causal relationship; it means the confounding variable is more common in the exposed group than the unexposed group (or vice versa).

  • What "associated" means: The potential confounder is disproportionately distributed between exposed and unexposed groups.
  • No causation required: The confounder does not need to cause or prevent the exposure—just be unevenly distributed.
  • Example from the excerpt: In the foot size/reading ability study, grade level is disproportionately distributed across foot sizes—higher grades are more likely to have bigger feet.
  • Can there be causation?: Yes, the confounder can cause the exposure (but not vice versa—see criterion #3), but this is not necessary.

🎯 Criterion #2: Causes the outcome

  • Causation is required: There must be a causal link between the confounder and the outcome.
  • Standard of proof: It does not have to be proven causation, just "reasonably possible that this exposure causes (or prevents) that outcome."
  • Direction matters: The confounder must cause the outcome, not the other way around.
  • Example from the excerpt: Grade level (confounder) causes faster reading speed (outcome).

🔄 When causal direction is unclear

  • The problem: Sometimes we cannot determine whether the disease causes the confounder or the confounder causes the disease.
  • Example given: Excessive weight loss and illness—rapid weight loss can cause illness, but illness can also cause weight loss.
  • Practical solution:
    • Assume the arrow goes one direction and analyze accordingly (include or exclude the potential confounder).
    • Then assume the arrow goes the other direction and analyze again.
    • If both analyses produce similar results, arrow direction does not matter.
    • If results differ substantially, report both and let readers decide which applies to their situation.

🚫 Criterion #3: Not on the causal pathway

  • What this means: The variable must not be caused by the exposure and then cause the outcome.
  • Why it matters: Variables on the causal pathway are mediators, not confounders.
  • Example from the excerpt:
    • Exposure: amount of sleep
    • Potential confounder: alertness in class
    • Outcome: test scores
    • "Alertness in class" is not a confounder because it is caused by the amount of sleep—it is on the causal pathway.

Don't confuse: Mediators (on the pathway) vs. confounders (not on the pathway).

📊 Understanding confounding through examples

📊 The foot size and reading ability example

The excerpt uses a cross-sectional study of elementary school children measuring foot size (inches) and reading speed (words per minute).

Initial finding (confounded):

  • Children with feet ≥8.25″ are 28.8 times as likely to read ≥100 words per minute compared to children with shorter feet.
  • This appears to be a huge effect.

The confounder: Grade level

  • Kids in higher grades have bigger feet (because they are older).
  • Kids in higher grades are also faster readers.
  • Grade level meets all three criteria:
    1. Associated with exposure (foot size): higher grades → bigger feet
    2. Causes outcome (reading speed): higher grades → faster reading
    3. Not on causal pathway: foot size does not cause grade level

🧮 What happens when you control for the confounder

When the analysis is restricted to third graders only:

  • The odds ratio becomes 1.0 (no association).
  • This is the correct measure—foot size has nothing to do with reading speed.
  • The original OR of 28.8 was wrong; it was confounded by grade level.

Key insight: Removing the confounder's influence reveals the true (lack of) association between exposure and outcome.

🛠️ Methods of confounder control

🛠️ Design-phase control methods

The excerpt describes three approaches during study design:

MethodHow it worksEffect on criteria
RestrictionLimit sample to one level of confounder (e.g., third graders only)Fails criterion #1: confounder no longer disproportionately distributed
MatchingRecruit controls with same confounder level as cases (e.g., match maternal age)Negates criterion #2: forces equal confounder distribution between groups
RandomizationRandomly assign participants to exposureFails criterion #1: ensures equal confounder distribution between exposed/unexposed

🔬 Restriction in detail

  • How it works: Limit the study to only one level of the confounder.
  • Example: Study only third graders, so grade level cannot vary between exposed and unexposed.
  • Result: The potentially confounding variable no longer meets criterion #1.
  • Limitation: Often not realistic because it limits the study too much and reduces generalizability.
  • Example of limitation: "What are predictors of breast cancer death among 62-year-old women?" would be less useful than studying all postmenopausal women.

🎲 Matching in detail

  • Common use: Often used in case-control studies.
  • How it works: For each case, recruit a control with the same level of the confounder.
  • Example from excerpt: Studying birth defects (outcome) and maternal smoking (exposure) with maternal age as potential confounder—match each 30-year-old case with a 30-year-old control.
  • Effect: Forces equal confounder distribution between cases and controls, negating the confounder's effect on the exposure/outcome association.

📈 Analysis-phase control methods

Two main options mentioned:

  1. Stratifying: Create a separate 2×2 table for each level of the potential confounder.
  2. Regression: A special case of stratifying.

Note: The excerpt begins to discuss stratification with an oral contraceptive pill/ovarian cancer example but the text cuts off.

⚠️ Why confounding matters

⚠️ The danger of confounded results

A confounder is a third variable—not the exposure, and not the outcome—that biases the measure of association calculated for the particular exposure/outcome pair.

  • Research principle: Never report a measure of association that is confounded.
  • Consequence of ignoring confounding: Reporting an association that is not really true.
  • Example: Reporting the OR of 28.8 for foot size and reading ability without accounting for grade level would be incorrect—the association is confounded, not real.

🎯 Inherent variability note

The excerpt includes an important clarification about group-level patterns:

  • Not all individuals in a group (e.g., third graders) are identical.
  • However, groups selected on some characteristic are more similar to each other than to people in other groups.
  • Epidemiology works because of both individual variation and group-level similarities.
  • Example: Third graders in general have bigger feet and read better than first graders, and have smaller feet and read more slowly than fifth graders.
33

Confounding: Definition and Control

Confounding: Definition

🧭 Overview

🧠 One-sentence thesis

Confounding produces biased measures of association between exposure and outcome, but can be controlled through study design (restriction, matching, randomization) or analysis (stratification, regression) to yield accurate adjusted estimates.

📌 Key points (3–5)

  • Three criteria for a confounder: must be associated with the exposure, must cause the outcome, and must not be on the causal pathway between exposure and outcome.
  • Design-phase control methods: restriction limits the study population, matching pairs participants by confounder level, and randomization breaks the confounder–exposure link.
  • Analysis-phase control methods: stratification creates separate 2×2 tables for each confounder level; regression is a special case that accounts for all possible strata.
  • Common confusion: crude vs. adjusted measures—crude measures ignore confounders and may be wrong; adjusted measures control for confounders and reflect the true association.
  • 10% rule: if crude and adjusted measures differ by more than 10%, the variable is acting as a confounder and the adjusted measure should be reported.

🎯 What makes a variable a confounder

🎯 The three criteria

A variable qualifies as a potential confounder only if it meets all three conditions:

  1. Associated with the exposure – the confounder's distribution differs between exposed and unexposed groups
  2. Causes the outcome – the confounder independently increases or decreases risk of the outcome
  3. Not on the causal pathway – the exposure does not cause the confounder

Variables that meet the confounder criteria are potential confounders. They may or may not actually produce a biased estimate of association; we figure this out during the analysis.

🚬 Example: Smoking as a confounder in OCP/ovarian cancer study

Criterion 1 (associated with exposure):

  • Smoking and oral contraceptives both increase risk of deep venous thrombosis
  • Smoking is a contraindication to OCP use
  • Clinicians prescribe other birth control to smokers
  • Result: disproportionate distribution of smokers between OCP users and non-users

Criterion 2 (causes outcome):

  • Smoking causes lung cancer and has been associated with other cancers
  • Reasonable to suspect it might cause ovarian cancer

Criterion 3 (not on causal pathway):

  • Taking birth control pills would not cause a woman to start smoking
  • The causal direction is clear

⚠️ Don't confuse: Causal pathway vs. confounding

When it's difficult to know which variable causes which, analyze the data both ways (treating each variable as the confounder in turn).

🛠️ Controlling confounding in study design

🔒 Restriction

Restriction: limiting the study population to people with the same level of the confounder.

  • How it works: If everyone has the same confounder value, the confounder cannot create bias
  • Example: Studying only 62-year-old women eliminates age as a confounder
  • Limitation: Often not realistic because it limits generalizability too much
  • Trade-off: "What are predictors of breast cancer death among 62-year-old women?" is less useful than studying all postmenopausal women

🤝 Matching

Matching: recruiting controls with the same confounder level as each case (used in case-control studies).

  • How it works: Forces the confounder distribution to be the same between cases and controls
  • Effect: Negates criterion #2 (the confounder still causes the outcome, but its distribution is now equal in both groups)
  • Example: For a 30-year-old case with a birth defect, recruit a 30-year-old control; maternal age can no longer confound the smoking/birth defect association
  • Result: The confounder cannot affect the exposure/outcome measure of association

🎲 Randomization

Randomization: randomly assigning participants to exposure groups.

  • How it works: Forces the confounder to fail criterion #1
  • Effect: Ensures equal distribution of the confounder between exposed and unexposed groups
  • Result: The link between confounder and exposure is broken, just as with restriction

📊 Controlling confounding in analysis

📊 Stratification: Creating separate tables

Stratification: making a different 2×2 table for each level of the potential confounder.

Process:

  1. Create one 2×2 table (exposure × outcome) for each confounder level
  2. Calculate the measure of association (OR, RR) for each stratum
  3. Compare stratum-specific measures to the crude measure

Example structure:

  • All participants appear in exactly one table, depending on their confounder status
  • Smokers get their own OCP × ovarian cancer table
  • Nonsmokers get their own OCP × ovarian cancer table

🔍 Interpreting stratified results

MeasureValueInterpretation
Crude OR1.0OCP use not associated with ovarian cancer (unadjusted, ignores smoking)
OR for smokers0.44Among smokers, OCP users 0.44 times as likely to have ovarian cancer
OR for nonsmokers0.44Among nonsmokers, OCP users 0.44 times as likely to have ovarian cancer

Key insight:

  • Stratum-specific ORs (0.44, 0.44) are similar to each other but different from crude OR (1.0)
  • This pattern indicates smoking is acting as a confounder
  • The crude OR was wrong; it was confounded by smoking

📏 The "similar" and "different" rules

For "different" (crude vs. adjusted):

  • Standard criterion: more than 10% different
  • If crude and adjusted ORs differ by >10%, most epidemiologists consider this evidence of confounding

For "similar" (between strata):

  • No firm consensus
  • Perhaps within 2–3% of each other
  • Importantly: the crude value does not fall between the stratum-specific values

🧮 Calculating adjusted measures

Mantel-Haenszel odds ratio:

  • A weighted average of the stratum-specific odds ratios
  • Each stratum contributes according to its size
  • Result is called the adjusted OR because it controls for confounding

Regression:

Regression: a special case of stratified analysis that accounts for all possible strata.

  • Creates a 2×2 table for every possible value of the confounder
  • Example: for continuous smoking data (months smoked), makes separate tables for 0 months, 1 month, 2 months, etc.
  • Calculates a weighted average across all tables (like a mega–Mantel-Haenszel)
  • Result: adjusted odds ratio (case-control), adjusted risk ratio, or adjusted rate ratio (cohort/RCT)

📝 Interpreting adjusted measures

All three phrasings are acceptable:

  1. "Women who have ovarian cancer are 0.44 times as likely to report a history of OCP use compared to women without ovarian cancer, controlling for smoking."
  2. "...compared to women without ovarian cancer, adjusting for smoking."
  3. "...compared to women without ovarian cancer, holding smoking constant."

The key: make it clear that the measure of association has already dealt with the confounding.

⚠️ Practical considerations

⚠️ Categorizing continuous variables

The problem:

  • Stratified analysis by hand requires categories (strata)
  • Continuous variables (age, height) must be divided into 2–3 categories
  • Example: height categories might be <5'2", 5'2"–6'0", >6'0"

Residual confounding:

  • Within each category, considerable variability remains (5'2" is 10" shorter than 6'0")
  • Some confounding is removed, but some is left over
  • This is why epidemiologists usually jump straight to regression, which can keep continuous variables continuous

🎯 Choosing which confounders to control

Step 1: Make a list of potential confounders

  1. List all variables that might cause your outcome
  2. Check which are associated with the exposure
  3. Verify they are not on the causal pathway (exposure does not cause the confounder)

Step 2: Decide which to include in analysis

  • Drop confounders that do not meet the 10% change criterion
  • Additional nuances exist beyond this book's scope
  • Prominent epidemiologists differ in their opinions

When reading the literature:

  • Did the authors consider all potential confounders you thought of?
  • If an obvious potential confounder is missing, did they explain why?
  • If not, the article may not be the most valid

🔍 Don't confuse: Crude vs. adjusted measures

Crude (unadjusted) measure of association: takes into account only the exposure and the outcome; has not yet accounted, adjusted, or controlled for any confounders.

Adjusted measure of association: controls for one or more confounders via stratification or regression.

  • Crude measures may be wrong in unpredictable ways if confounding is present
  • Adjusted measures reflect the true association after removing confounding bias
  • Always report adjusted measures when confounding is detected (>10% difference)
34

Methods of Confounder Control

Methods of Confounder Control

🧭 Overview

🧠 One-sentence thesis

Epidemiologists control confounding through design methods (restriction, matching, randomization) and analysis methods (stratification, regression), with stratified and regression analyses revealing whether a crude measure of association is biased by a confounder.

📌 Key points (3–5)

  • Design-phase control: restriction, matching, and randomization each break one of the confounder criteria to prevent confounding before data analysis.
  • Analysis-phase control: stratification and regression adjust for confounders by calculating stratum-specific measures and weighted averages.
  • How to detect confounding: when stratum-specific measures are similar to each other but different from the crude measure (typically >10% different), confounding is present.
  • Common confusion: crude vs. adjusted measures—crude measures ignore confounders; adjusted measures control for them and represent the "real" association.
  • Choosing confounders: list variables that cause the outcome, check if they're associated with the exposure, and confirm they're not on the causal pathway.

🛠️ Design-phase control methods

🚧 Restriction

Restriction: limiting the study to a narrow range of a confounder variable.

  • Works by making the confounder distribution identical across exposure groups.
  • Example: studying only 62-year-old women eliminates age as a confounder because everyone has the same age.
  • Limitation: often not realistic because it limits generalizability too much.
  • The excerpt notes that restricting by age would make a breast cancer study "much less useful because we wouldn't necessarily be able to generalize the results to women of other ages."

🔗 Matching

  • Often used in case-control studies.
  • Has "much the same effect as restriction" but works differently.
  • How it works: for each case, recruit a control with the same value of the confounder.
  • Example: if studying birth defects (outcome) and maternal smoking (exposure) with maternal age as a confounder, match each 30-year-old case with a 30-year-old control.
  • Mechanism: forces the confounder distribution to be the same between cases and controls, which negates criterion #2 (the confounder must cause the outcome in a way that differs between exposure groups).
  • The confounder still causes the outcome, but the forced equal distribution "negated the possible effect of the confounder on the exposure/outcome measure of association."

🎲 Randomization

  • How it works: randomly assigns participants to exposure groups.
  • Mechanism: forces the confounder to fail criterion #1—ensures equal distribution of the confounder between exposed and unexposed groups.
  • The link between the confounder and the exposure is now missing, just as with restriction.
  • The excerpt refers to chapter 9 for more details.

📊 Analysis-phase control methods

📐 Stratification basics

To stratify: make a different 2×2 table for each level of the potential confounder.

  • Instead of one overall table, create separate tables for each stratum (level) of the confounder.
  • Example: if smoking is the potential confounder, make one 2×2 table for smokers and another for nonsmokers.
  • All participants remain in the analysis—they're just divided into separate tables based on the confounder.

🔢 Calculating stratum-specific measures

  • After creating stratified tables, calculate the measure of association (OR, RR, etc.) for each stratum.
  • These are called stratum-specific measures.
  • Example from the excerpt: OR for smokers = 0.44; OR for nonsmokers = 0.44.
  • Interpretation requirement: must specify which stratum the measure applies to, either at the beginning or end of the interpretation.
    • "Among smokers, women who have ovarian cancer are 0.44 times as likely to report OCP use..."
    • "Women who have ovarian cancer are 0.44 times as likely to report OCP use... among smokers only."

🎯 Detecting confounding through stratification

When confounding is present:

  • Stratum-specific measures are similar to each other.
  • Stratum-specific measures are different from the crude (unadjusted) measure.
Measure typeOCP/ovarian cancer exampleInterpretation
Crude OR1.0No association (but wrong due to confounding)
OR for smokers0.44Protective association
OR for nonsmokers0.44Protective association
ConclusionSmoking is a confounderThe crude OR was biased; the "real" OR is 0.44

Quantitative thresholds:

  • "Different" from crude: most epidemiologists use >10% difference as evidence of confounding.
  • "Similar" to each other: perhaps within 2–3% of each other (no firm consensus).
  • Importantly, the crude value should not fall between the stratum-specific values.

📏 Adjusted measures

  • The most common way to report results when confounding is present.
  • Represents the association after controlling for the confounder.
  • Don't confuse: crude/unadjusted measures (exposure and outcome only) vs. adjusted measures (account for confounders).

Two main calculation methods:

  1. Mantel-Haenszel method: a weighted average of the stratum-specific odds ratios, with each stratum being weighted appropriately.
  2. Regression: a special case of stratified analysis that accounts for all possible strata.

🔄 Regression as extended stratification

  • Regression is "just a special case of a stratified analysis—specifically, it accounts for all possible strata."
  • Example: if smoking is measured as total months over a lifetime, regression creates a 2×2 table for nonsmokers, then for 1 month of smoking, then for 2 months, and so on.
  • The model calculates a weighted average of all these tables (like a "mega–Mantel Haenzel").
  • Result is also called the adjusted odds ratio (or adjusted risk ratio/rate ratio for cohort studies).
  • Most studies in the literature use regression rather than manual stratification.

⚠️ Residual confounding

  • Occurs when categorizing continuous variables creates strata that are still too heterogeneous.
  • Example: if height is the confounder and you create categories like "5′2″–6′0″", there's still 10″ of variability within that stratum.
  • This means "we have removed some of the confounding by height but there is still some confounding left."
  • Regression handles continuous variables better by keeping them continuous, avoiding this problem.

🗣️ Interpreting adjusted measures

📝 Standard interpretation phrases

When presenting an adjusted measure, you must indicate that confounding has been controlled. The excerpt provides three equivalent phrasings:

  1. "...controlling for smoking."
  2. "...adjusting for smoking."
  3. "...holding smoking constant."

Example interpretation (adjusted OR = 0.44 for OCP/ovarian cancer):

  • "Women who have ovarian cancer are 0.44 times as likely to report a history of OCP use compared to women without ovarian cancer, controlling for smoking."

Key point: It doesn't matter which phrase you choose—the important thing is making it clear that the confounding has been dealt with.

🎯 Choosing which confounders to control

📋 Three-step process

The excerpt describes a systematic approach to identifying potential confounders:

  1. List variables that might cause the outcome: start with all possible causes of the outcome.
  2. Check association with exposure: from that list, keep only variables that are associated with the exposure.
  3. Verify not on causal pathway: make sure the exposure is not causing the confounder.

🔍 Working through the criteria

The excerpt illustrates this with smoking as a potential confounder in the OCP/ovarian cancer study:

Criterion #1: Associated with exposure?

  • Yes. Both OCPs and smoking increase risk of deep venous thrombosis.
  • Smoking is a contraindication to OCP use, so clinicians prescribe other birth control to smokers.
  • This creates "a disproportionate distribution of smokers (the confounder) between women who do and do not use oral contraceptives (the exposure)."

Criterion #2: Causes the outcome?

  • Possibly. While smoking clearly causes lung cancer, it "has also been associated with other cancers often enough that it is reasonable to suspect that it might cause ovarian cancer too."

Criterion #3: Not on causal pathway?

  • Yes. "It seems highly unlikely that taking birth control pills would in turn cause a woman to take up smoking."

⚠️ Potential vs. actual confounders

  • Variables that meet the confounder criteria are potential confounders.
  • They may or may not actually produce a biased estimate of association.
  • "We figure this out during the analysis" by comparing crude and stratum-specific measures.

🤔 Ambiguous cases

  • The excerpt notes "there are many instances where it is difficult to know which is causing which."
  • In such cases, the text cuts off, but the implication is that judgment calls are sometimes necessary.
35

Choosing Confounders

Choosing Confounders

🧭 Overview

🧠 One-sentence thesis

When analyzing exposure–disease relationships, you must systematically identify all potential confounders by checking three criteria—association with exposure, causation of outcome, and not being on the causal pathway—then control for those that meaningfully change your measure of association.

📌 Key points (3–5)

  • Three-step identification process: list variables that cause the outcome, verify they associate with the exposure, and confirm they are not on the causal pathway between exposure and outcome.
  • The 10% change rule: a potential confounder should be controlled if the crude and adjusted measures of association differ by more than 10%.
  • Common confusion: when it's unclear which variable causes which (exposure causing confounder vs. confounder causing exposure), conduct the analysis both ways.
  • Critical reading skill: when reviewing published studies, check whether authors considered all plausible confounders and explicitly justified any omissions.
  • Expert disagreement exists: prominent epidemiologists differ on the final selection of confounders to control for, though the basic criteria remain standard.

🔍 Identifying potential confounders

🔍 The three mandatory criteria

To qualify as a potential confounder, a variable must meet all three conditions:

  1. Associated with the exposure (statistical relationship)
  2. Causes the outcome (not just correlated—must be causal)
  3. Not on the causal pathway (the exposure does not cause the confounder)

A confounder is a variable—not the exposure and not the outcome—that affects the data in undesirable and unpredictable ways.

📝 Step-by-step screening process

The excerpt recommends this workflow:

  1. First pass: List all variables that might cause your outcome.
  2. Second pass: From that list, keep only those associated with the exposure.
  3. Third pass: Remove any that are on the causal pathway (i.e., the exposure causes them).

Why this order matters: Starting with outcome causes ensures you don't waste time on variables that can't confound; then checking exposure association narrows the list; finally, pathway checking prevents controlling for mediators.

⚠️ Ambiguous causation

  • Sometimes it's difficult to know which variable causes which.
  • Don't confuse: A variable on the causal pathway (mediator) vs. a true confounder—controlling for a mediator can block the effect you're trying to measure.
  • Practical solution: When causation direction is unclear, the excerpt advises conducting the analysis both ways and comparing results.

Example: If you're unsure whether the exposure causes Variable X or Variable X causes the exposure, run one analysis treating X as a confounder and another excluding it.

🎯 Deciding which confounders to control

🎯 The 10% change criterion

After identifying potential confounders, you must decide which to actually control for in your analysis.

  • Rule of thumb: Drop confounders that do not produce at least a 10% change between the crude and adjusted measures of association.
  • How to apply: Calculate both the crude measure (unadjusted) and the adjusted measure (controlling for the confounder); if they differ by ≥10%, that variable is a true confounder and you should report the adjusted estimate.
ScenarioCrude ORAdjusted OR% ChangeDecision
Meaningful confounding0.500.4412%Control for it; report adjusted
Negligible confounding0.500.492%May drop from model

🧩 Multiple confounders at once

  • Regression methods allow you to control for many confounders simultaneously.
  • The excerpt notes that after applying the 10% rule, there are "additional nuances" beyond the scope of the text.
  • Expert disagreement: Prominent epidemiologists differ in their opinions on the final confounder selection strategy, so there is no single "correct" algorithm.

🎓 Practical skill for students

  • Beginning students won't conduct complex analyses themselves.
  • Key competency: Being able to think through an exposure/disease relationship and generate a complete list of potential confounders.
  • Critical appraisal: When reading published studies, ask:
    • Did the authors consider all variables that meet the three criteria?
    • If an obvious potential confounder is missing, did they explain why?
    • If a plausible confounder is absent without justification, the study's validity is questionable.

Example: Reading a study on oral contraceptive use and ovarian cancer—you should check whether the authors controlled for smoking, age, family history, and other plausible confounders, and whether they justified any exclusions.

📊 Interpreting adjusted results

📊 Reporting language

When you present an adjusted measure of association, you must clearly indicate that confounding has been addressed. The excerpt gives three equivalent phrasings (using the OCP/ovarian cancer example with adjusted odds ratio = 0.44):

  • "...controlling for smoking"
  • "...adjusting for smoking"
  • "...holding smoking constant"

Why this matters: Readers need to know the measure accounts for confounding; without this phrase, they might assume it's the crude (potentially biased) estimate.

📊 What "adjusted" means

  • The adjusted odds ratio (or adjusted risk ratio/rate ratio for cohort studies and RCTs) is calculated after controlling for confounders.
  • Methods: Either stratification (e.g., Mantel-Haenzel weighted average across strata) or regression modeling.
  • Both approaches produce a measure that removes the distorting effect of the confounder.

Example: If you stratify by smoking status, you create separate 2×2 tables for each smoking category (never smoked, smoked 1 month, smoked 2 months, etc.), then calculate a weighted average—this is the adjusted measure.

⚠️ Why confounding matters

⚠️ The problem with confounded data

In data that are confounded, one will calculate the wrong measure of association (and it is impossible to know in which direction one is wrong).

  • Confounders distort your results in unpredictable ways—you can't tell if your estimate is too high or too low.
  • This leads to inaccurate conclusions unless you control for the confounder.
  • Don't confuse: A confounder with an effect modifier—confounders bias the overall association; effect modifiers change the association's strength across subgroups (stratified analysis may be appropriate for the latter, but for different reasons).

⚠️ Control strategies

The excerpt mentions that confounders can be addressed at two stages:

StageMethods
Study designRestriction, matching, or randomization
Data analysisStratification or regression → produces adjusted measure
  • During analysis: If crude and adjusted estimates differ by >10%, report the adjusted estimate because it controls for confounding.
  • The excerpt focuses on the analysis stage (choosing which confounders to adjust for), assuming design-stage decisions have already been made.
36

Screening Test Accuracy and Predictive Values

Summary

🧭 Overview

🧠 One-sentence thesis

A screening test's ability to correctly identify disease depends not only on its fixed sensitivity and specificity but also on the prevalence of the disease in the population being tested, which directly determines how many positive results are true versus false.

📌 Key points (3–5)

  • Four test characteristics: sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) quantify test accuracy.
  • Fixed vs. variable metrics: sensitivity and specificity are fixed test properties; PPV and NPV change with disease prevalence in the tested population.
  • Prevalence drives predictive values: in low-prevalence populations, even tests with high specificity produce many false positives, so most positive results are wrong.
  • Common confusion: PPV from one study cannot be applied to a different population with different prevalence—you must recalculate using the same sensitivity/specificity but new prevalence.
  • Clinical implication: screening decisions (like mammography for 40-year-olds) depend on whether the target group has high or low underlying disease prevalence.

🧮 The four test accuracy measures

🎯 Sensitivity

Sensitivity: the proportion of people who truly have the disease who test positive.

  • Formula (in words): true positives divided by total diseased (true positives plus false negatives).
  • From the anemia example: 101 divided by 119 equals 84.9%.
  • What it tells you: how good the test is at catching disease when disease is present.
  • A sensitivity of 84.9% means the test misses about 15% of anemia cases.

🎯 Specificity

Specificity: the proportion of people who truly do not have the disease who test negative.

  • Formula: true negatives divided by total non-diseased (true negatives plus false positives).
  • From the anemia example: 866 divided by 881 equals 98.3%.
  • What it tells you: how good the test is at ruling out disease when disease is absent.
  • A specificity of 98.3% means the test incorrectly flags about 1.7% of healthy people.

🔮 Positive Predictive Value (PPV)

Positive Predictive Value: the proportion of people who test positive who truly have the disease.

  • Formula: true positives divided by total test positives (true positives plus false positives).
  • From the anemia example (11.9% prevalence): 101 divided by 116 equals 87.0%.
  • What it tells you: if someone tests positive, what is the chance they actually have the disease?
  • Key point: PPV changes with prevalence—the same test has different PPV in different populations.

🔮 Negative Predictive Value (NPV)

Negative Predictive Value: the proportion of people who test negative who truly do not have the disease.

  • Formula: true negatives divided by total test negatives (true negatives plus false negatives).
  • From the anemia example: 866 divided by 884 equals 98.0%.
  • What it tells you: if someone tests negative, what is the chance they are truly disease-free?
  • NPV also changes with prevalence, though usually less dramatically than PPV.

🔄 How prevalence changes predictive values

📉 Why low prevalence collapses PPV

  • When disease is rare, most of the population is healthy.
  • Even a highly specific test (98.3%) will produce some false positives among the large healthy group.
  • Because true positives are rare (few diseased people), false positives can outnumber them.
  • Result: most positive tests are false alarms.

🧪 Worked example: anemia test in two populations

The excerpt walks through recalculating PPV when prevalence drops from 11.9% to 1%:

PopulationPrevalenceSensitivitySpecificityPPVInterpretation
Original study11.9%84.9%98.3%87.0%Most positives are true
Adolescent males1%84.9% (same)98.3% (same)33.5%Most positives are false

Step-by-step recalculation (for 1,000 people at 1% prevalence):

  1. 10 people have disease (1% of 1,000).
  2. True positives: 84.9% of 10 = 8.49.
  3. False negatives: 10 minus 8.49 = 1.51.
  4. True negatives: 98.3% of 990 = 973.17.
  5. False positives: 990 minus 973.17 = 16.83.
  6. New PPV: 8.49 divided by (8.49 plus 16.83) = 33.5%.

Don't confuse: The test itself did not change (sensitivity and specificity are fixed); only the population changed, so the predictive values changed.

🩺 Clinical consequence

  • In the low-prevalence group, only 33.5% of positive results are real anemia.
  • The excerpt notes you would follow up with a blood draw to confirm.
  • For a negative result, the new NPV is 99.9%—no follow-up needed.

🏥 Real-world application: mammography screening

🎗️ Mammography in 40-year-old women

  • The excerpt states that breast cancer prevalence in women in their 40s is very low (0.98%).
  • PPV in this population is "well under 1%."
  • Implication: greater than 99% of women sent for follow-up (breast biopsy) are false positives.
  • These women undergo expensive, invasive, emotionally stressful procedures unnecessarily.

👨‍👩‍👧 When screening is still warranted

  • Women with strong family history or known BRCA-1/BRCA-2 carriers have much higher underlying prevalence.
  • In this higher-prevalence subgroup, PPV is much higher, so screening is justified.
  • Key insight: the same test (mammography) is appropriate or inappropriate depending on the population's baseline risk.

🔑 Using test characteristics in practice

📋 When to use each metric

MetricWhen to usePurpose
Sensitivity & SpecificityBefore choosing a testCompare tests; pick one with the right balance for your clinical need
PPV & NPVAfter test results are knownInterpret what a positive or negative result means for this patient

⚠️ Critical rule for PPV and NPV

  • You must know the prevalence of disease in the target population to use or calculate PPV and NPV.
  • Published PPV/NPV from a study applies only to populations with the same prevalence.
  • If your patient comes from a different population, recalculate using the fixed sensitivity/specificity and the new prevalence.

🧩 Screening vs. diagnostic testing

  • The excerpt notes these are "similar procedures" but differ by context.
  • Screening: testing asymptomatic people (lower prevalence, lower PPV).
  • Diagnostic: testing symptomatic people (higher prevalence, higher PPV).
  • Don't confuse: the same test can be a screening tool in one setting and a diagnostic tool in another.
37

Differences between Confounding and Effect Modification

Differences between Confounding and Effect Modification

🧭 Overview

🧠 One-sentence thesis

Confounding distorts the true association and must be adjusted away, whereas effect modification reveals genuinely different associations across subgroups and should be reported separately.

📌 Key points (3–5)

  • Confounding: you get the wrong answer because the confounder is unevenly distributed between groups, distorting the measure of association.
  • Effect modification: you get the wrong answer because your sample contains subgroups in which the exposure/disease association is genuinely different.
  • Common confusion: both require stratified analysis, but confounding requires reporting an adjusted measure, while effect modification requires reporting stratum-specific measures separately.
  • How to distinguish: if stratum-specific measures are similar to each other but differ from the crude (which does not fall between them), it's confounding; if stratum-specific measures differ from each other and the crude lies between them, it's effect modification.
  • Why it matters: policy implications differ drastically—knowing which subgroups are affected differently (effect modification) leads to targeted interventions.

🔍 What each concept means

🔍 Confounding

With confounding, you're initially getting the wrong answer because the confounder is not distributed evenly between your groups.

  • The confounder distorts the measure of association you calculate.
  • The excerpt gives the example: having bigger feet is associated with reading speed only because of confounding by grade level.
  • You need to recalculate the measure of association, this time adjusting for the confounder.
  • Example: if you study physical activity and dementia and find an unadjusted odds ratio of 2.0, but after stratifying by marital status you find odds ratios of 3.1 and 3.24 in the two strata, marital status is a confounder—you would report the adjusted OR (around 3.18).

🔍 Effect modification

With effect modification, you're also initially getting the wrong answer, but this time it's because your sample contains at least 2 subgroups in which the exposure/disease association is different.

  • The association is genuinely different across subgroups.
  • You need to permanently separate those subgroups and report results separately for each stratum.
  • The excerpt gives the example: men who sleep less have higher GPAs than men who sleep more, but women who sleep more have higher GPAs than women who sleep less.
  • Example: in a trial of Mediterranean diet to prevent preterm birth, the crude RR is 0.90, but among nulliparas the RR is 0.60 and among multiparas the RR is 1.15—parity is an effect modifier, so you report the two stratum-specific RRs separately.

🧪 How to detect each one

🧪 The stratified analysis process

Both confounding and effect modification require the same initial steps:

  1. Calculate the crude measure of association (ignoring the covariable).
  2. Calculate stratum-specific measures of association, so each level of the covariable has its own 2×2 table.

The difference appears in step 3.

🧪 Decision rules

CriterionConfoundingEffect Modification
Stratum-specific measuresSimilar to each otherDifferent from each other
Crude measure positionDoes NOT fall between the stratum-specific measuresDOES fall between the stratum-specific measures
Difference from crudeAt least 10% different than the crudeStratum-specific measures differ from each other
  • Don't confuse: both involve stratification, but the pattern of results tells you which one you're dealing with.
  • Example of neither: in a case-control study of melanoma and tanning bed use, crude OR is 3.5, stratified by gender yields 3.45 (men) and 3.56 (women)—gender is neither a confounder nor an effect modifier because the crude lies between the stratum-specific estimates but they are not more than 10% different from the crude, and the stratum-specific estimates are not meaningfully different from each other.

📊 How to report results

📊 Reporting confounding

  • Report an adjusted measure of association that controls for the confounder.
  • The goal is to remove the distortion and present the true association.
  • Example: report the adjusted OR of 3.18 (not the crude 2.0 or the individual stratum-specific measures).

📊 Reporting effect modification

  • Report the stratum-specific measures of association.
  • Do NOT combine them into a single adjusted measure—the whole point is that the association is different across subgroups.
  • Example: report RR = 0.60 for nulliparas and RR = 1.15 for multiparas separately.

📊 Why reporting matters

  • Effect modification is an interesting finding in and of itself, and we report it.
  • Unlike confounding, whose effects we want to get rid of in our analysis, effect modification reveals important heterogeneity.
  • The excerpt emphasizes policy implications: if you only had crude data without age breakdowns from the recession job-loss example, potential policy solutions would be very different than if you had access to the stratified-by-age analysis.

🔄 Planning and conducting studies

🔄 Before the study

Both confounding and effect modification require the same preparation:

  • Think about what variables might act as confounders or effect modifiers based on what you know about the exposure/disease process under study.

🔄 During the study

Both require the same data collection:

  • Collect data about any potential covariables.
  • Stratified/adjusted analyses cannot be conducted without data on the covariable.

🔄 Can the same variable be both?

  • Yes, the same variable can theoretically act as both a confounder and an effect modifier.
  • This usually happens when the covariable is a continuous variable, dichotomized for checking effect modification.
  • Example: if you divide age into "old" and "young" (e.g., older than 50 vs. 50 or younger), you might miss nuances because 51-year-olds are not like 70-year-olds—there might be further effect modification with more categories or residual confounding.
  • However, one rarely sees this in practice.

📋 Summary comparison table

StageConfoundingEffect Modification
Before planningThink about potential confounders based on exposure/disease processThink about potential effect modifiers based on exposure/disease process
During studyCollect data on potential covariablesCollect data on potential covariables
Analysis step 1Calculate crude measure of associationCalculate crude measure of association
Analysis step 2Calculate stratum-specific measuresCalculate stratum-specific measures
Analysis step 3If stratum-specific measures are similar to each other, and at least 10% different than the crude (which does not fall between them), then it's a confounderIf stratum-specific measures are different from each other, and the crude lies between them, then it's an effect modifier
Writing resultsReport an adjusted measure of association that controls for the confounderReport the stratum-specific measures of association
38

Conclusion

Conclusion

🧭 Overview

🧠 One-sentence thesis

Different epidemiologic study designs vary widely in cost and internal validity, and systematic reviews/meta-analyses provide the best overall evidence by synthesizing multiple studies while avoiding the biases of individual papers.

📌 Key points (3–5)

  • Hierarchy of evidence: Study designs range from single case reports (low validity, low cost) to meta-analyses (high validity, high indirect cost), with cross-sectional, case-control, cohort, and RCT studies in between.
  • Systematic reviews as learning tools: They synthesize existing literature and are essential for professionals who cannot keep up with every individual study, but only well-done systematic reviews should be trusted.
  • Common confusion: Not all review papers are systematic—papers labeled "integrative review," "literature review," or just "review" are prone to author bias and should be avoided; only those explicitly labeled "systematic review" or "meta-analysis" follow rigorous methods.
  • Trade-offs in study design: Better studies (higher validity) are more expensive and time-consuming; however, design choice also depends on the research question (e.g., case-control is preferred for rare diseases regardless of cost).
  • Why it matters: Understanding study design strengths and weaknesses helps readers critically evaluate epidemiologic literature and shape evidence-based policy.

📊 Hierarchy of epidemiologic study designs

📊 Cost vs. validity trade-off

The excerpt presents a spectrum of study types ranked by internal validity and cost:

Study typeRelative validityRelative costNotes
Case reportsLowestLowestSingle unusual patient
Cross-sectionalLow-moderateLow-moderateOne of 4 main study types
Case-controlModerateModerateOne of 4 main study types; preferred for rare diseases
CohortModerate-highModerate-highOne of 4 main study types
RCTHighHighOne of 4 main study types
Meta-analysisHighestHighest (indirect)Synthesizes dozens of other studies
  • General pattern: "Better" studies (higher validity) are more expensive and time-consuming.
  • Exception for reviews: Review papers themselves are not particularly expensive to conduct, but they require numerous other studies to be published first, so indirect costs (time and money invested across all underlying studies) are very high.

🔍 When design choice overrides cost/validity

  • The excerpt notes that sometimes one design is preferred independent of cost or validity considerations.
  • Example given: case-control studies are preferred for rare diseases, regardless of whether they are the most valid or cheapest option.
  • Implication: the research question and practical constraints shape design choice, not just the validity hierarchy.

🔬 Systematic reviews and meta-analyses

🔬 What makes them valuable

Systematic reviews and meta-analyses are excellent resources for learning about a topic.

  • Why they matter: No one has time to keep up with the literature beyond a very narrow topic; even researchers benefit most from noting new individual studies only in their specific field.
  • For non-researchers: Public health professionals and clinicians not routinely engaging in research get a much better overall picture from systematic reviews/meta-analyses.
  • Reduced bias: Potentially less prone to the biases found in individual studies because they synthesize multiple sources.

✅ How to identify a well-done review

The excerpt provides clear criteria:

  • Title requirement: The paper should include either "systematic review" or "meta-analysis" in the title.
  • Methods requirement: The methods should mirror rigorous standards (the excerpt references methods outlined earlier in the chapter, such as comprehensive literature search, explicit inclusion/exclusion criteria, and quality assessment).
  • What to avoid: Review papers not explicitly labeled "systematic" are extremely prone to author biases and probably should be ignored.

⚠️ Common confusion: not all reviews are systematic

  • Papers labeled "integrative review," "literature review," or just "review" do not follow systematic methods.
  • Exception: "Metasynthesis" is a legitimate technique for systematic reviewing of qualitative literature.
  • Don't confuse: A paper called "review" is not the same as a "systematic review"—only the latter uses rigorous, transparent methods to minimize bias.

🧩 The four main study types

🧩 Core designs and their trade-offs

The excerpt identifies four main epidemiologic study types:

  1. Cross-sectional
  2. Case-control
  3. Cohort
  4. RCT (Randomized Controlled Trial)
  • Each has strengths and weaknesses.
  • Readers of epidemiologic literature should be aware of these differences to critically evaluate evidence.

📌 Design selection considerations

  • Validity vs. cost: Higher validity generally requires more resources.
  • Research question: Some questions are better suited to specific designs (e.g., rare diseases → case-control).
  • Practical constraints: Time, funding, and feasibility influence which design is used.

🎯 Implications for evidence-based practice

🎯 Using evidence to shape policy

  • The relative validity of study designs varies widely, affecting how their evidence should be used in policy decisions.
  • The excerpt emphasizes that understanding design strengths and weaknesses is essential for applying research findings appropriately.

🎯 Practical guidance for readers

  • For learning: Rely on systematic reviews and meta-analyses to get a comprehensive, less-biased picture of a topic.
  • For critical reading: Check that reviews are explicitly systematic; verify that methods are rigorous.
  • For interpretation: Recognize that individual studies have limitations; synthesized evidence from multiple studies is more reliable.
39

Cohorts

Cohorts

🧭 Overview

🧠 One-sentence thesis

Cohort studies are a strong epidemiologic design that follows nondiseased people over time to watch for incident disease, offering clear temporality and the ability to study rare exposures or multiple outcomes, though they cannot efficiently study rare diseases or conditions with very long latent periods.

📌 Key points (3–5)

  • What cohort studies do: start with a nondiseased sample, assess exposure, then follow over time for incident disease.
  • Key strength—temporality: exposure is measured before disease occurs, so we know exposure came first and avoid certain biases.
  • Rare exposures vs rare diseases: cohorts can deliberately sample exposed individuals for rare exposures, but cannot practically study rare diseases (would need impractically large samples).
  • Multiple outcomes advantage: the same cohort can be followed for many different diseases simultaneously, adding efficiency.
  • Common confusion—latent periods: for diseases with long latent periods, some "nondiseased" participants at baseline may actually be diseased but undiagnosed; epidemiologists often exclude early diagnoses to address this.

💪 Core strengths of cohort studies

🕐 Clear temporality

  • Because exposure is assessed in a nondiseased sample at the start, we know exposure came before disease.
  • This timing prevents misclassification of exposure differentially by disease status (since disease status is not yet known when exposure is measured).
  • Note: misclassification of disease status differentially by exposure is still possible.

📈 Incident disease measurement

  • Cohort studies look for new-onset (incident) disease, not existing (prevalent) disease.
  • This avoids conflating "having the disease" with "how long they have had it."
  • Recall: incidence, prevalence, and duration of disease are mathematically related; using incidence keeps these concepts separate.

🔬 Studying rare exposures

Cohort studies are the only study design that can be used to assess rare exposures.

  • If an exposure is uncommon (10% or fewer people exposed), cohorts can deliberately sample exposed individuals.
  • This ensures sufficient statistical power without needing an unreasonably large sample.
  • The smallest cell in the 2×2 table drives statistical power.

Example: To study chemical exposures in a particular factory, enroll exposed workers from that factory plus unexposed workers from a different (exposure-free) factory, then follow both groups for incident disease.

🎯 Multiple outcomes in one study

  • The same cohort can be followed for any reasonably common disease.
  • All outcomes of interest must be measured at baseline.
  • For each specific outcome analysis, exclude people who already had that disease at baseline (they were not at risk).

Example: In the factory study, if Person A has hypertension but not melanoma at baseline, exclude them from hypertension analyses but include them in melanoma analyses.

  • This ability adds efficiency—essentially conducting numerous studies at once.

🔄 Multiple exposures (with conditions)

  • Cohorts can study multiple exposures if all are common enough that deliberate sampling on exposure is not needed.
  • Simply sample from the target population and assess many exposures.
  • If also studying multiple outcomes, measure all disease states at baseline so analyses can be restricted to at-risk populations.

📚 Real-world example: Framingham Heart Study

  • Began in 1948 with over 5,000 adults in Framingham, Massachusetts.
  • Measured numerous exposures and outcomes, repeated every few years.
  • Later enrolled spouses, children, and grandchildren as the cohort aged.
  • Responsible for much knowledge about heart disease, stroke, and intergenerational effects of lifestyle habits.
  • Over 3,500 studies have been published using Framingham data.

⚠️ Limitations of cohort studies

❌ Cannot study rare diseases

  • Would require impractically large cohorts.

Example: Phenylketonuria affects about 1 in 10,000 infants in the US. To get even 100 affected individuals, you would need to enroll one million pregnant women—neither practical nor feasible.

💰 High cost for prospective cohorts

  • Following people over time requires significant effort and high personnel costs.
  • Cannot be used for diseases with decades-long induction or latent periods.

Example: Studying whether adolescent dairy consumption is associated with osteoporosis in 80-year-old women would require following teenagers for 60+ years—extremely difficult.

📉 Selection bias from loss to follow-up

  • The longer the follow-up, the more likely people move, change contact information, or stop participating.
  • More troubling: people who start feeling ill may quit responding.
  • If these people were about to be diagnosed with the outcome, this creates serious bias.
  • Despite this difficulty, a few long cohort studies like Framingham exist and have yielded rich knowledge.

🧪 Special considerations

⏳ Latent periods and baseline disease status

Latent period: the biological onset of disease occurs long before the disease is detected and diagnosed.

  • For diseases with long latent periods, some "nondiseased" participants at baseline may actually be diseased but undiagnosed.

Example: A cancer patient whose tumor is still too small to detect.

How epidemiologists address this:

  • Exclude participants diagnosed during the first several months of follow-up.
  • Theory: these individuals were not truly disease-free at baseline.

🔬 Randomized Controlled Trials (RCTs)

🎲 Conceptual relationship to cohorts

An RCT is conceptually just like a cohort, with one difference: the investigator determines exposure status.

  • All strengths and weaknesses of cohort studies apply to RCTs.
  • One exception for multiple exposures: to study multiple exposures, you would need to re-randomize for each exposure.
  • A few studies have successfully done this (e.g., the Women's Health Initiative randomized women to both hormone replacement and other exposures).
40

Randomized Controlled Trials

Randomized Controlled Trials

🧭 Overview

🧠 One-sentence thesis

Randomized controlled trials (RCTs) are the gold standard for minimizing confounding by randomly assigning exposure, but they face substantial limitations including high cost, ethical constraints, and generalizability issues.

📌 Key points (3–5)

  • Core design feature: The investigator determines exposure status through randomization, unlike cohort studies where exposure occurs naturally.
  • Key strength: Randomization eliminates all confounding (known, unknown, measured, and unmeasured) if the study is large enough and truly random.
  • Major limitations: RCTs are extremely expensive, often ethically impossible (e.g., cannot randomize smoking), and frequently have generalizability problems.
  • Common confusion: RCTs are called the "gold standard" for internal validity, but well-conducted observational studies should not be automatically discounted—RCTs have substantial drawbacks.
  • Practical constraint: RCTs require precisely specifying the exposure in advance; if the details are wrong, a real association might be missed.

🔬 What makes RCTs different from cohorts

🎲 The defining feature

An RCT is conceptually just like a cohort, with one difference: the investigator determines exposure status.

  • In cohort studies, people choose their own exposures (smoking, diet, where they live).
  • In RCTs, researchers assign exposure through randomization.
  • Example: Instead of observing who exercises naturally, an RCT would randomly assign some participants to a physical activity program and others to no intervention.

🔄 Shared characteristics with cohorts

  • All strengths and weaknesses of cohort studies apply to RCTs.
  • Both follow people forward in time to observe outcomes.
  • One exception: To study multiple exposures in an RCT, you would need to re-randomize for each exposure (rarely practical).
  • Example: The Women's Health Initiative successfully randomized women to both hormone replacement therapy and calcium supplements separately, but this is uncommon.

🛡️ The confounding advantage

🎯 How randomization eliminates confounding

Recall that a confounder must:

  1. Cause the outcome
  2. Be statistically associated with the exposure
  3. Not be on the causal pathway

By randomly assigning exposure, no variables are more common in the exposed group than the unexposed group.

  • This breaks the second condition: nothing is associated with exposure anymore.
  • Randomization accounts for all confounders: known, unknown, measured, and unmeasured.
  • In cohort studies, you can only control for measured confounders statistically—but what about unknown ones?

✅ Requirements for this to work

  • The study must be large enough (at least several hundred participants).
  • Exposure allocation must be truly random (not "every other person" or another predictable scheme).
  • When these conditions are met, there is no confounding.

🏆 Internal validity benefit

The benefits of this in terms of internal study validity cannot be overstated.

  • Internal validity means the study accurately measures the true relationship within the study population.
  • This is why RCTs are called the "gold standard" of epidemiologic and clinical research.

⚠️ Major limitations of RCTs

💰 Cost

  • RCTs are even more expensive than cohort studies.
  • They require active intervention and close monitoring.
  • Following people over time with assigned exposures takes substantial resources.

🚫 Ethical constraints

Many exposures cannot be randomized for ethical reasons:

ExposureWhy it cannot be randomized
SmokingHarms are so well-documented we cannot ask people to begin smoking for a study
Where people liveCannot randomize residence, yet location profoundly affects health
  • Observational studies remain ethically viable for these exposures because people have already chosen them; the epidemiologist merely measures existing exposures.

🌍 Generalizability issues

Problem 1: Self-selection bias

  • People willing to participate in a study where they don't choose their group are not a random subset of the population.
  • Example: If only retired people have time to participate in a physical activity intervention, can results generalize to younger working populations? Perhaps not.

Problem 2: Overly restrictive inclusion criteria

  • Investigators sometimes exclude too many people, limiting applicability.
  • Example: A well-known blood pressure trial in older adults excluded those with diabetes, cancer, and other comorbidities—but most older people have at least one chronic disease, so to whom can the results really apply?

🎯 Exposure specification challenge

In an RCT, you must precisely define the exposure in advance:

  • For a physical activity intervention: Walk? Yoga? Strength training? How much? How often? What intensity? For how many weeks?
  • In a cohort study, you assess the huge variety of physical activity people do naturally and categorize it later.
  • Risk: If you specify the exposure wrongly or apply it at the wrong time in the disease process, it could appear there's no association when one really exists.

🏥 Role in medicine and practice

💊 FDA requirements

  • The Food and Drug Administration requires multiple RCTs before approving new drugs and medical devices.
  • Because of strict FDA requirements, RCT protocols must be registered at clinicaltrials.gov before data collection begins.

📚 Changing clinical practice

RCTs can change practice when new, large, well-designed studies emerge.

Example: Aspirin and heart disease prevention in women (2005)

  • Before: Physicians assumed older women should take baby aspirin daily to prevent heart attacks, like men.
  • Dr. Paul Ridker's trial enrolled 20,000 women in each group.
  • Finding: Aspirin acts differently in women (gender is an effect modifier), and the aspirin-a-day regimen does not work for most women.
  • This absolutely changed how physicians thought about heart disease prophylaxis in women.

⚖️ RCTs vs. observational studies

  • RCTs are the "gold standard" because of their ability to minimize confounding.
  • However, their drawbacks are substantial.
  • Don't confuse: Well-conducted observational studies should not be discounted merely because they are not RCTs.
  • Outside pharmaceutical research, both designs play important roles in epidemiology.
41

Case-Control Studies

Case-Control Studies

🧭 Overview

🧠 One-sentence thesis

Case-control studies efficiently identify disease-exposure associations by comparing past exposures between people who already have a disease (cases) and comparable people who do not (controls), making them especially valuable for rare diseases despite challenges in control selection and recall bias.

📌 Key points (3–5)

  • Retrospective design: starts with cases (people with disease) and controls (people without disease), then looks backward to assess past exposures.
  • Common confusion: cases are NOT "people with disease who are exposed"—both cases and controls are recruited without regard for exposure status, then exposure is assessed afterward.
  • Key strengths: cheaper than cohorts/RCTs, efficient for rare diseases and diseases with long induction/latent periods, can assess multiple exposures.
  • Major challenges: recall bias (especially differential recall by case status), proper control selection is paramount but often difficult, and temporality issues for chronic diseases.
  • How to distinguish from cross-sectional: case-control deliberately samples by disease status and looks backward at exposure; cross-sectional samples the population once and measures both exposure and disease at the same time.

🏗️ Design structure and recruitment

🏗️ How case-control studies work

A case-control study is a retrospective design wherein we begin by finding a group of cases (people who have the disease under study) and a comparable group of controls (people who do not have the disease).

  • The study starts by identifying disease status first, then works backward to determine exposure.
  • Sequence: Find cases → Find controls → Assess past exposure in both groups.
  • This is fundamentally different from cohort studies, which start with exposure and follow forward to disease.

⚠️ The critical recruitment rule

  • Both cases and controls must be recruited without regard for exposure status.
  • To avoid differential misclassification, you cannot select cases based on whether they were exposed.
  • Only after identifying all cases and controls do you assess which people were exposed.
  • Example: If studying lung disease, you recruit all people with lung disease (cases) and all comparable people without lung disease (controls), then ask both groups about smoking history—you don't recruit "smokers with lung disease" as cases.

🎯 Strengths and appropriate uses

💰 Efficiency advantages

  • Much cheaper to conduct than cohorts or randomized trials because they do not require following people over time.
  • Fast to complete since you're looking at existing disease and past exposure, not waiting for disease to develop.

🔬 When case-control studies excel

SituationWhy case-control works well
Rare diseasesCan deliberately find enough cases; don't need huge population samples
Long induction/latent periodsDon't have to wait decades for disease to develop; look backward instead
Multiple exposuresCan assess many different past exposures for the same disease outcome

📏 Scope limitations

  • Limited to one outcome by definition (the disease that defines case status).
  • Cannot be used for rare exposures (you "get what you get" in terms of exposure distribution among your cases and controls).

🚧 Major methodological challenges

🧠 Recall bias

Case-control studies assess exposure in the past.

  • Past exposure data occasionally come from existing records (e.g., medical records for blood pressure history).
  • Usually rely on questionnaires, making them subject to recall bias more than prospective designs.

⚖️ Differential recall by case status

  • The key concern: people with a disease may recall past exposures differently than people without the disease.
  • It is plausible that people with a condition will have spent time thinking about what might have caused it.
  • Cases may report past exposures with greater detail than controls simply because they've thought more about potential causes.
  • This creates systematic measurement error that can bias results.

🎯 What questions can work

  • Cannot ask: "What exactly did you eat on June 15th, 2013?" (impossible to recall with certainty).
  • Can ask: "What kinds of foods did you usually eat on most days a decade ago?" (bigger-picture patterns).
  • Details are sacrificed in favor of bigger-picture accuracy (though validity still depends on memory quality).
  • Always ask: "Can people tell me this? Will people tell me this?"

🎲 Control selection challenges

🎯 The fundamental criterion

To avoid selection biases, cases and controls must come from the same target population—that is, if controls had been sick with the disease in question, they too would have been cases.

  • Proper control selection is paramount in case-control studies.
  • Who constitutes a "proper" control is not always immediately obvious.
  • The controls must represent the population that gave rise to the cases.

🏥 Example: Pediatric traumatic brain injury study

Scenario: Studying traumatic brain injury (TBI) in children in Oregon; cases recruited from Doernbecher Children's Hospital in Portland (a referral hospital receiving severe TBI cases from throughout the Pacific Northwest).

Option 1 - Other hospital patients as controls:

  • Use children at Doernbecher for conditions other than TBI.
  • ✓ Satisfies criterion that controls would get care at this hospital (they are getting care there).
  • ✗ Kids with other conditions might also have unusual exposure histories, leading to biased estimates.

Option 2 - Neighborhood children as controls:

  • Sample children who are not sick from Portland neighborhoods.
  • ✗ Creates selection bias because Doernbecher is a referral hospital with a several-hundred-mile radius, not just Portland.
  • If rural kids differ from city kids, estimates will be biased.

🛡️ Strategies to reduce control-selection bias

  • No perfect way to recruit controls exists.
  • Epidemiologists routinely critique each other's control groups at conferences (considered "good sport").
  • One solution: Recruit multiple control groups (e.g., one hospital-based and one community-based).
    • If results are not substantially different across control groups, selection biases may not be overly influencing results.
    • Provides a sensitivity check on findings.

⏰ Handling temporality and disease duration

📅 The temporality problem for chronic diseases

  • For long-lasting chronic diseases, disease duration complicates interpretation.
  • Must ensure exposures happened before disease onset, not after.
  • At minimum, need the date of diagnosis and must assess exposures that happened well before that date.

🔄 Strategy: Incident case recruitment

  • When induction and latent periods are unknown, recruit incident cases (newly diagnosed) over several months.
  • As soon as cases are recruited, ask about past exposures with confidence that diagnosis occurred after those exposures.
  • Don't confuse: Incident cases = newly diagnosed; prevalent cases = existing cases of any duration.

🕰️ Multiple time-window approach

  • Long latent periods might still be an issue even with incident cases.
  • Solution: Ask about exposures over multiple time periods (e.g., 0–5 years ago, 6–10 years ago, 11–15 years ago).
  • Compare results across these windows to identify relevant exposure periods.
  • Example: If association appears only for 10–15 years ago, suggests that's the relevant etiologic window.

🏆 Historical impact and contributions

📚 Major contributions to public health knowledge

Despite methodological difficulties, case-control studies have made substantial contributions to health knowledge over the years.

Example: Smoking and health

  • The surgeon general's 1964 report Smoking and Health was based on literature stemming from a case-control study.
  • Conducted by Richard Doll and Austin Bradford Hill.
  • This foundational work established the link between smoking and disease using the case-control design.
42

Cross-Sectional Studies

Cross-Sectional Studies

🧭 Overview

🧠 One-sentence thesis

Cross-sectional studies are the fastest and cheapest epidemiologic design, but their inability to establish temporality limits them to hypothesis generation rather than causal inference.

📌 Key points (3–5)

  • What cross-sectional studies do: draw a single sample from the target population and assess current exposure and disease status on everyone at the same point in time.
  • Main strength: fastest and cheapest to conduct, making them ideal for surveillance activities and situations with limited resources.
  • Key limitation—temporality: cannot determine whether exposure or disease came first because both are measured simultaneously.
  • Common confusion: cross-sectional studies vs. case-control studies—cross-sectional studies sample neither for exposure nor disease ("get what we get"), so they cannot study rare exposures or rare diseases.
  • Appropriate use: limited to hypothesis generation and surveillance; cannot support public health or clinical decisions on their own.

🔬 Study design and mechanics

🔬 How cross-sectional studies work

Cross-sectional study: a study in which a single sample is drawn from the target population and current exposure and disease status are assessed on everyone.

  • The design is simple: one sample, one measurement time point.
  • Researchers measure both exposure and disease prevalence at the same moment.
  • Example: A survey asks participants about their current smoking status and whether they currently have asthma—both measured today.

🎯 What cross-sectional studies cannot target

  • Cannot sample for exposure or disease: researchers "get what we get" when drawing the sample.
  • This means:
    • Cannot be used for rare exposures (not enough exposed people will appear in a random sample).
    • Cannot be used for rare diseases (not enough diseased people will appear in a random sample).
  • Don't confuse: case-control studies deliberately sample for disease status; cross-sectional studies do not.

💪 Strengths and applications

💪 Speed and cost advantages

  • Fastest and cheapest studies to conduct: only one sample, one data collection wave.
  • This makes them practical when:
    • Resources are limited.
    • Immediate answers are required.

📊 Surveillance use cases

Cross-sectional designs are used for many ongoing surveillance activities that repeat with a new sample each year:

Surveillance systemWhat it does
NHANES (National Health and Nutrition Examination Survey)Repeated cross-sectional study with new sample annually
PRAMS (Pregnancy Risk Assessment Monitoring System)Repeated cross-sectional study with new sample annually
BRFSS (Behavioral Risk Factor Surveillance System)Repeated cross-sectional study with new sample annually
  • These systems track population health trends over time by comparing results across years.

⚠️ Critical limitations

⏳ The temporality problem

  • No data on temporality: we do not know whether the exposure or the disease came first.
  • Both exposure and disease prevalence are measured at the same point in time.
  • Example: If a cross-sectional study finds that people with depression are more likely to be unemployed, we cannot tell whether:
    • Unemployment led to depression, or
    • Depression led to unemployment, or
    • Some third factor caused both.
  • This is the fundamental reason cross-sectional studies cannot establish causation.

🚫 Restricted to hypothesis generation

  • Cross-sectional studies (along with surveillance that only measures disease frequency without exposure/disease relationships) are limited to hypothesis generation activities.
  • Cannot make public health or clinical decisions based on evidence only from these studies (except for surveillance purposes).
  • They can suggest associations worth investigating with stronger designs, but cannot confirm causal relationships.

🔍 Related study types

📝 Case reports and case series

Case report: a short description of an interesting and unusual patient seen by a particular doctor or clinic. Case series: the same thing but describes more than one patient—usually only a few, but sometimes several hundred.

  • Little value for epidemiologists because they have no comparison groups.
  • Example: If a case series reports that 45% of patients with disease Y also have disease Z, this is not useful without knowing how many patients without disease Y also have disease Z.
  • However, valuable for public health professionals as sentinel surveillance—they can draw attention to new, emerging public health threats.

🚨 Historical examples of case series as early warning

  • 1941 rubella and birth defects: An Australian physician noticed an increase in eye birth defects and published a case series hypothesizing maternal rubella infection as the cause; other physicians confirmed similar observations, leading to current rubella screening in pregnancy.
  • Early 1980s HIV/AIDS: CDC case series reported unusual cancers and opportunistic infections in young, healthy populations—the first indication of the HIV/AIDS epidemic.
  • 2003 SARS: Case reports of unusual, deadly respiratory infection in travelers to Hong Kong led to immediate quarantine of affected individuals returning to other cities, preventing SARS from becoming a global pandemic.

🌍 Ecologic studies

Ecologic studies: studies in which group-level data (usually geographic) are used to compare rates of disease and/or disease behaviors.

  • Data are aggregated at the group level (e.g., by state, country, region) rather than individual level.
  • Example: Comparing seat belt use rates across different U.S. states—each state is one data point, not individual people.
  • Don't confuse: ecologic studies compare groups; cross-sectional studies measure individuals within a single sample (though both can be done at one time point).
43

Case Reports/Case Series

Case Reports/Case Series

🧭 Overview

🧠 One-sentence thesis

Case reports and case series lack comparison groups and thus have limited epidemiological value, but they serve as crucial early-warning systems for emerging public health threats.

📌 Key points

  • What they are: short descriptions of unusual patients seen by doctors—one patient (case report) or a few to several hundred (case series).
  • Why epidemiologists find them limited: they have no comparison groups, so percentages or frequencies cannot be interpreted meaningfully.
  • Why public health professionals value them: they act as sentinel surveillance, drawing attention to new or emerging health threats.
  • Common confusion: a case series showing "45% of patients with disease Y also have disease Z" sounds informative, but without knowing how many people without disease Y have disease Z, the 45% is not useful for epidemiological analysis.
  • Historical impact: case reports/series have identified major public health issues including rubella-related birth defects, HIV/AIDS, and SARS.

📋 What case reports and case series are

📝 Definitions and scope

Case report: a short description of an interesting and unusual patient seen by a particular doctor or clinic.

Case series: the same as a case report but describes more than one patient—usually only a few, but sometimes several hundred.

  • These are found frequently in clinical literature.
  • They focus on unusual or noteworthy clinical presentations.
  • By definition, they present data from unusual patients.

⚠️ Epidemiological limitations

🚫 The comparison group problem

  • Case reports and case series are not studies per se because they have no comparison groups.
  • Without a comparison group, frequencies and percentages cannot be interpreted.

Example: If a case series reports that 45% of patients with disease Y also have disease Z, an epidemiologist cannot use this information. The critical missing piece: How many patients who do not have disease Y also have disease Z? Without data on a comparable group without disease Y, the 45% figure is meaningless for analysis.

🔍 Don't confuse with other study designs

  • Unlike cross-sectional studies, cohort studies, or case-control studies, case reports/series provide no basis for comparison.
  • They cannot establish associations or test hypotheses about exposure-disease relationships.

🚨 Public health value

🔔 Sentinel surveillance function

Despite their epidemiological limitations, case reports and case series can be extremely useful for public health professionals because:

  • They present data from unusual patients.
  • They can act as a kind of sentinel surveillance.
  • They draw attention to new, emerging public health threats.

📚 Historical examples of impact

Year/PeriodCase report/seriesWhat it identifiedPublic health response
1941Australian physician noticed increase in birth defect affecting infant eyesMaternal rubella infection as causeCurrent practice of checking rubella antibodies in all pregnant women and vaccinating those without immunity
Early 1980sCDC case series in Morbidity and Mortality Weekly ReportUnusual cancers and opportunistic infections in young, healthy populationsFirst recognition of HIV/AIDS epidemic
2003Case reports of unusual, deadly respiratory infection in Hong Kong travelersSARS outbreakImmediate quarantine of affected individuals in Toronto prevented global pandemic

🎯 The early-warning mechanism

  • Case reports/series highlight patterns that individual clinicians notice.
  • When multiple physicians from different locations report similar unusual cases, it signals an emerging threat.
  • The rubella example shows this pattern: one physician published a case series with a hypothesis, then other physicians "chimed in" that they had seen similar increases, building the evidence base.
44

Ecologic Studies

Ecologic Studies

🧭 Overview

🧠 One-sentence thesis

Ecologic studies compare group-level data (usually geographic) to examine disease rates and behaviors, but they are prone to the ecologic fallacy and are best used only for quick, cheap hypothesis generation.

📌 Key points (3–5)

  • What ecologic studies are: studies that use group-level data (e.g., by state or country) to compare rates of disease or health behaviors, not individual-level data.
  • The ecologic fallacy: the logical error of assuming group-level statistics apply to any one individual within that group.
  • Common confusion: just because a group has a higher average does not mean every individual in that group has a higher value than every individual in a lower-average group.
  • Why conduct them despite problems: they are quick, cheap, and use preexisting data (census, product consumption, disease prevalence records).
  • Appropriate use: limited to hypothesis generation only, not for establishing causation or changing policy.

🔍 What ecologic studies measure

🔍 Group-level comparisons

Ecologic studies: studies in which group-level data (usually geographic) are used to compare rates of disease and/or disease behaviors.

  • The unit of analysis is a group (e.g., state, country), not individuals.
  • Example: comparing seat belt use rates across different U.S. states—each state has one aggregate percentage.
  • The excerpt emphasizes that this is not data from individuals; it is averaged or aggregated data for entire populations.

🗺️ Geographic variation

  • The excerpt provides a seat belt use map showing variation by state.
  • By comparing rates across states, researchers can spot patterns (e.g., Oregon has higher seat belt use than Idaho on average).
  • Example: a graph showing per-capita rice consumption and maternal mortality by country—each country is one data point.

⚠️ The ecologic fallacy

⚠️ What the fallacy is

The ecologic fallacy: ascribing group-level numbers to any one individual.

  • Just because a group has a certain average does not mean every member of that group matches that average.
  • The excerpt warns: "it assumes that everyone in a given state is exactly the same—obviously this is not true."

🚫 Don't confuse group averages with individual values

  • Common mistake: thinking "Oregon has higher seat belt use than Idaho" means "everyone in Oregon wears their seat belt more than everyone in Idaho."
  • The excerpt clarifies: "We could easily find someone in Oregon who never wears their seat belt and someone in Idaho who always does."
  • Example: if Country A consumes more rice and has higher maternal mortality, the fallacy is assuming that the rice consumers are the ones dying from pregnancy complications—we cannot know this from group-level data alone.

🔗 Exposure and disease patterns

  • The ecologic fallacy is especially problematic when looking at both exposure and disease using group-level data.
  • The rice consumption and maternal mortality graph shows a correlation, but we cannot tell whether the individuals eating rice are the same individuals experiencing maternal mortality.
  • Without individual-level data, we cannot link exposure to outcome within the same person.

🛠️ Why conduct ecologic studies

🛠️ Practical advantages

AdvantageWhat the excerpt says
Speed and costQuick and cheap—"even more so than cross-sectional studies"
Data availabilityAlways use preexisting data (no need to collect new data)
Data sourcesCensus estimates, product consumption tracked by sellers, disease prevalence from health ministries or WHO

💡 Appropriate use: hypothesis generation only

  • The excerpt is clear: "The use of ecologic studies is limited only to hypothesis generation."
  • They are "a good first step for a totally new research question" because they are so easy to conduct.
  • They should not be used to establish causation or to change public health policy.
  • Don't confuse: ecologic studies can suggest patterns worth investigating, but they cannot prove that those patterns apply to individuals.

📚 Systematic reviews and meta-analyses (context)

📚 Why multiple studies matter

  • The excerpt transitions to systematic reviews and meta-analyses by noting that epidemiology rarely changes policy based on one study.
  • Instead, researchers conduct many studies using different populations, designs, and exposure measurements.
  • If all studies show the same general results, the association may be considered causal (e.g., early smoking and lung cancer studies).

📚 What systematic reviews and meta-analyses do

Meta-analysis (or systematic review): a formal way of synthesizing results across all existing studies on a topic to arrive at "the" answer.

  • The excerpt outlines the procedure:
    1. Define the topic precisely (e.g., physical activity in all children vs. only grade-school kids).
    2. Systematically search the literature using documented search terms and limits (language, publication date) to make the search replicable.
    3. Narrow down results to only those directly addressing the topic (Step 3 is cut off in the excerpt).
  • Key principle: the search must be replicable and unbiased—not just papers the authors already know about.
45

Systematic Reviews and Meta-Analyses

Systematic Reviews and Meta-Analyses

🧭 Overview

🧠 One-sentence thesis

Systematic reviews and meta-analyses synthesize results from multiple epidemiologic studies to arrive at an overall conclusion that is less prone to individual study biases and provides a better evidence base for policy decisions than any single study.

📌 Key points (3–5)

  • Why we need them: Epidemiology relies on humans and is prone to bias and confounding, so we rarely change policy based on one study alone; instead, we build a body of evidence across multiple studies with different populations and designs.
  • What they do: Systematic reviews formally synthesize results across all existing studies on a topic using a documented, replicable search process; meta-analyses go further by statistically combining data to generate an overall measure of association.
  • Key distinction: Meta-analysis requires studies to be similar enough to pool statistically; if not, a systematic review synthesizes findings qualitatively without calculating an overall pooled estimate.
  • Common confusion: Not all review papers are systematic—only those explicitly labeled "systematic review" or "meta-analysis" with documented search methods should be trusted; other "literature reviews" or "integrative reviews" are prone to author bias.
  • Why they matter: They provide the best overall picture for public health professionals and clinicians who don't have time to track individual studies, and they are less prone to biases found in single studies.

📚 The rationale for synthesis

📚 Why single studies are not enough

  • Epidemiology is more prone to bias and confounding than other sciences because it relies on humans.
  • This does not render epidemiology useless, but requires a robust appreciation for assumptions and limitations.
  • Barring exceptionally well-done randomized controlled trials, we rarely change public health or clinical policy based on just one study.
  • Instead, we conduct multiple studies using better and better designs until there is a body of evidence from different populations, study designs, and exposure measurements.
  • When all studies show the same general results (like early studies on smoking and lung cancer), we start to think the association might be causal and implement changes.

🔀 When results are mixed

  • When existing studies on a topic show more mixed results, there is a formal way of synthesizing their results across all of them to arrive at "the" answer.
  • This is where meta-analysis or systematic review comes in.
  • The goal is to combine evidence systematically rather than relying on informal impressions or cherry-picked studies.

🔬 The systematic review process

🔬 Seven-step procedure

The procedure for systematic reviews and meta-analyses follows the same steps:

  1. Determine the topic precisely: Decide exactly what you care about (e.g., physical activity in all children vs. only grade-school kids? In PE class vs. at home vs. everywhere?). This must be decided ahead of time, like defining a target population.

  2. Systematically search the literature: Use and document specific search terms and place documented limits (language, publication date, etc.) on search results. The key is to make the search replicable by others. It is not acceptable to just include papers authors are aware of without searching—doing so results in a biased sample.

  3. Narrow down search results: Include only those studies directly addressing the topic from Step 1.

  4. Abstract key data: For each included study, extract the exposure definition and measurement methods, outcome definition and measurement methods, how the sample was drawn, the target population, the main results, and so on.

  5. Determine similarity for meta-analysis: Use formal statistical procedures to test whether papers are similar enough to pool.

    • If similar enough: Combine all data from all included studies and generate an overall measure of association and 95% confidence interval (this is meta-analysis).
    • If not similar enough: Synthesize studies in other meaningful ways, comparing and contrasting results, strengths, and weaknesses, and arrive at an overall conclusion. An overall measure of association is not calculated, but usually authors can conclude whether some exposure is or is not associated with some outcome (this is systematic review without meta-analysis).
  6. Assess publication bias: Use formal statistical methods to evaluate the likelihood of publication bias and how it may have affected results.

  7. Publish the results.

🛡️ Safeguards against bias

  • Ideally, at least 2 different investigators conduct steps 2–4 completely independently, checking in after each step and resolving discrepancies by consensus.
  • This provides a check against unconscious or subconscious bias on the part of the authors (remember: we're all human and therefore all biased).
  • For reviews conducted after 2015 or so, the protocol (search strategy, exact topic, etc.) should be registered prior to step 2 with a central registry such as PROSPERO.
  • Authors who deviate from preregistered protocols should provide very good reasons; such studies should be interpreted with extreme caution.

📊 Interpreting meta-analysis results

📊 Forest plots

  • Results from meta-analyses are often presented as forest plots.
  • Each included study's main result is plotted (with the size of the square corresponding to sample size).
  • An overall estimate of association is indicated as a diamond at the bottom.

🍫 Example: chocolate and blood pressure

The excerpt provides an example from a meta-analysis of chocolate consumption and systolic blood pressure (SBP):

  • The majority of studies showed a decrease in SBP for people who ate more chocolate, though not all studies found this.
  • Some point estimates are quite close to 0.0 (the "null" value here, because we're looking at change in a single number, not a ratio).
  • 10 of the confidence intervals cross 0.0, indicating they are not statistically significant.
  • However, 6 studies—the largest studies with the narrowest confidence intervals—are statistically significant, all in the direction of chocolate being beneficial.
  • The overall (or "pooled") change in SBP and 95% CI shown at the bottom (the black diamond) indicates a small (approximately 3 mm Hg) reduction in SBP for chocolate consumers.

⚠️ Statistical vs. clinical significance

  • Does a 3 mm Hg drop mean we should all start eating lots of chocolate? Not necessarily.
  • A 3 mm Hg ("millimeters of mercury"—still the units for blood pressure despite mercury not being involved for several decades) drop in SBP is not clinically significant.
  • A normal SBP is between 90 and 120, so a 3 mm Hg drop puts you at 87–117—likely not even a noticeable physiologic change.
  • Don't confuse: A result can be statistically significant (the confidence interval doesn't cross the null) but still not clinically meaningful in real-world terms.

🔍 Systematic review without meta-analysis

🔍 When pooling is not possible

  • Meta-analysis requires a certain similarity among studies that will be pooled (e.g., they need to control for similar, if not identical, confounders).
  • Often, this is not the case for a given body of literature.
  • In such cases, authors will systematically examine all the evidence and do their best to come up with "an" answer, taking into consideration the quality of individual studies, the overall pattern of results, and so on.

🏥 Example: risk-reducing mastectomy

The excerpt provides an example of a systematic review of risk-reducing mastectomy (RRM)—the prophylactic surgical removal of breasts in women who do not yet have breast cancer but who have the BRCA-1 or BRCA-2 genes and are at very high risk.

Overall results described:

  • Twenty-one studies looking at breast cancer incidence or disease-specific mortality (or both) reported reductions after bilateral RRM (both breasts removed), particularly for women with BRCA1/2 mutations.
  • Twenty studies assessed psychosocial measures; most reported high levels of satisfaction with the decision to have RRM but greater variation in satisfaction with cosmetic results.
  • Worry over breast cancer was significantly reduced after RRM compared both to baseline worry levels and to groups who opted for surveillance rather than RRM.
  • However, there was diminished satisfaction with body image and sexual feelings.

Conclusion:

  • While published observational studies demonstrated that RRM was effective in reducing both the incidence of and death from breast cancer, more rigorous prospective studies are suggested.
  • Because of risks associated with this surgery, RRM should be considered only among those at high risk of disease, for example, BRCA1/2 carriers.

Key point: No overall "pooled" estimate of the protective effect is provided, but the authors are nonetheless able to convey the overall state of the literature, including where the body of literature is lacking.

✅ Using reviews in practice

✅ Why they are excellent resources

  • Systematic reviews and meta-analyses are excellent resources for learning about a topic.
  • Realistically, no one has the time to keep up with the literature in anything other than a very narrow topic area.
  • Even in narrow areas, it is really only a boon to researchers in that field to take note of new individual studies.
  • For public health professionals and clinicians not routinely engaging in research, relying on systematic reviews and meta-analyses provides a much better overall picture that is potentially less prone to the biases found in individual studies.

⚠️ How to identify well-done reviews

Care must be taken to read well-done reviews:

FeatureWhat to look forWhat to avoid
TitleShould include "systematic review" or "meta-analysis"Papers called "integrative review," "literature review," or just "review"
MethodsShould mirror the seven-step procedure outlined aboveReviews that are not explicitly systematic
ReasonDocumented, replicable search process reduces author biasNon-systematic reviews are extremely prone to biases on the part of the authors and probably should be ignored

Exception: "Metasynthesis" is a legitimate technique for systematic reviewing qualitative literature. The papers to watch out for are the ones called "integrative review," "literature review," or just "review"—anything that is not "systematic review."

💰 Cost and validity trade-offs

The excerpt notes that systematic reviews and meta-analyses sit at the top of the validity hierarchy:

  • Review papers in and of themselves are not particularly expensive.
  • However, they cannot be done until numerous other studies have been published.
  • If you include those as indirect costs, they take a lot of time and money.
  • The "better" studies (in terms of validity for shaping policy) are generally the more expensive and time-consuming ones.
  • The 4 main study types (cross-sectional, case-control, cohort, and RCT) each have strengths and weaknesses, and there are occasions when one design is preferred independent of cost or validity (e.g., case-control for rare diseases).
46

Conclusions on Epidemiologic Study Designs

Conclusions

🧭 Overview

🧠 One-sentence thesis

Epidemiologic study designs vary widely in cost and internal validity, with more rigorous designs generally requiring greater resources, and each design has specific strengths that make it preferable for different research questions.

📌 Key points (3–5)

  • Hierarchy of evidence: Study designs range from single case reports (lowest validity) to meta-analyses (highest validity), with increasing cost and complexity.
  • Four main study types: Cross-sectional, case-control, cohort, and randomized controlled trials (RCTs) each have distinct strengths and weaknesses.
  • Context matters: The "best" design depends on the research question—for example, case-control studies are preferred for rare diseases regardless of cost or validity rankings.
  • Common confusion: Higher validity does not always mean "always use this design"—practical considerations like disease rarity, ethics, and feasibility determine the appropriate choice.
  • Systematic reviews are valuable but costly: While individual reviews are not expensive to conduct, they require numerous prior studies, making their indirect costs substantial.

📊 Study design hierarchy

📊 The validity-cost relationship

The excerpt presents a figure showing the relationship between study types, their relative cost, and internal validity:

  • Lowest tier: Case reports (single unusual patient)
  • Middle tiers: Cross-sectional, case-control, cohort studies, and RCTs
  • Highest tier: Meta-analyses and systematic reviews

Key pattern: "Better" studies in terms of validity are generally more expensive and time-consuming, with one exception noted below.

💰 The systematic review exception

  • Systematic reviews themselves are not particularly expensive to conduct
  • However, they cannot exist until numerous other studies have been published first
  • When indirect costs (all the prerequisite studies) are included, they represent substantial time and money investment
  • Example: A meta-analysis requires dozens of completed studies before it can be performed

🔬 The four main study designs

🔬 Core study types

The excerpt identifies four primary epidemiologic study designs:

Study TypeKey Characteristic
Cross-sectionalSnapshot at one point in time
Case-controlCompares cases to controls, looks backward
CohortFollows groups forward in time
RCTRandomized controlled trial with intervention

⚖️ Strengths and weaknesses

  • Each design has specific strengths and weaknesses
  • Readers of epidemiologic literature should be aware of these trade-offs
  • The excerpt emphasizes that understanding these differences is essential for interpreting research

🎯 When to choose each design

The excerpt notes there are occasions when one design is preferred independent of cost or validity considerations:

  • Case-control studies: Specifically preferred for rare diseases
  • The choice depends on the research question, not just on which design ranks "highest" in validity
  • Don't confuse: "highest validity" does not mean "always the right choice"—practical factors matter

📚 Implications for evidence use

📚 Using evidence to shape policy

  • The relative validity of different study types varies widely
  • This variation affects how their evidence should be used in policy decisions
  • With one exception (review papers that are not systematic), better studies require more resources
  • Policymakers and practitioners must understand these differences when evaluating research
47

Causes of Human Disease

Causes of Human Disease

🧭 Overview

🧠 One-sentence thesis

Any given case of human disease arises from multiple factors working together rather than from a single specific cause, and understanding this multicausality changes how we interpret disease prevention and the strength of individual risk factors.

📌 Key points (3–5)

  • Multicausality principle: Every case of disease has multiple causes acting in concert, not one specific cause.
  • Timing matters: Not all causes act at the same time; they accumulate throughout a person's life.
  • Many pathways to disease: Different collections of exposures can fill up a person's "jar" and cause the same disease in different people.
  • Common confusion: "Strong" vs "weak" causes—the apparent strength of a cause depends on the prevalence of other causes in the population, not just the measure of association.
  • Action implication: We don't need to identify all possible causes before taking preventive action; eliminating even one contributing cause can prevent some cases.

🫙 The jar model of disease causation

🫙 How the model works

The jar model: Think of disease risk as a jar that fills with liquid from adverse exposures and drains from protective exposures; disease begins when the jar overflows.

  • Each person has one jar for each potential disease.
  • The size of the jar is determined by nonmodifiable characteristics: genetics, family socioeconomic status during childhood, intrauterine environment.
  • As life progresses, adverse exposures add liquid to the jar, while protective exposures drain liquid out through a bottom spigot.
  • Disease onset occurs when the jar fills to the top.

🧬 Starting jar size (nonmodifiable factors)

  • Genetics determine baseline risk—someone with high genetic risk starts with a smaller jar.
  • Early-life factors also set jar size: intrauterine environment, family situation, laws and regulations during childhood.
  • Example: For breast cancer, jar size is determined by genetics, prenatal exposures, childhood environment, and genetically determined age at menarche and menopause.

⚖️ Filling and draining throughout life

  • Adverse exposures (add liquid): alcoholic drinks, hormonal birth control (both associated with increased breast cancer risk).
  • Protective exposures (drain liquid): physical activity, pregnancy (both associated with reduced breast cancer risk).
  • The balance of these exposures over time determines when—or if—the jar overflows.

🔄 Why different people have different outcomes

  • Someone with a smaller jar (strong family history) can withstand fewer adverse exposures before disease starts.
  • Each person has a slightly different set of exposures raising or lowering their jar level.
  • Example: Lung cancer can arise in nonsmokers (other exposures filled their jars), and some lifelong smokers never develop lung cancer (their jars were big enough that even thousands of cigarettes didn't fill them).

🧩 Three core tenets of human disease causality

🧩 Tenet #1: Multicausality

All cases of disease have multiple causes.

  • There is no single specific cause per se; rather, a multitude of factors work in concert.
  • Various theoretical models describe this: sufficient component cause model ("causal pies"), social-ecologic model, web of causation.
  • All models share the idea of multicausality despite differing in details.

⏰ Tenet #2: Causes act at different times

Not all causes act at the same time.

  • Causes accumulate throughout a person's life, not all at once.
  • Early-life factors (jar size) interact with later exposures (filling/draining).
  • The timing and sequence of exposures matter for disease onset.

🛤️ Tenet #3: Multiple pathways to the same disease

There are many different ways a jar could get filled; many different collections of exposures can cause a case of disease.

  • Different people can develop the same disease through different combinations of exposures.
  • No single "necessary" exposure exists for most diseases.
  • Example: Smoking causes lung cancer, but lung cancer also occurs in nonsmokers through other exposure combinations.

📊 Practical implications for public health

🎯 We don't need complete causal knowledge to act

  • We don't need to identify all possible causes before taking preventive action.
  • If we're reasonably sure an exposure contributes to disease in some people, eliminating it will prevent some cases.
  • Example: Knowing smoking contributes to lung cancer in some people is enough justification to eliminate smoking exposure, even though not all lung cancer cases include smoking in their causal pathway.

⏱️ Prevention means delaying, not eliminating disease

A preventive factor is one that either prevents disease altogether or delays disease onset for some length of time.

  • Disease is "caused" once it begins (when the jar overflows).
  • We often talk about "preventing death" from various causes, but we cannot prevent death—only delay it.
  • Prevention works by keeping jars from filling or slowing the rate of filling.

💪 Rethinking "strong" vs "weak" causes

  • People often call causes "strong" or "weak" based on population-level measures of association (e.g., odds ratios).
  • Don't confuse: This classification only works if the prevalence of all other causes stays constant.
  • Example: In the US, smoking has odds ratios around 40.0 for lung cancer; radon has odds ratios around 1.5–3.0. If we eliminate smoking, radon will suddenly look like a much stronger cause of lung cancer—not because radon changed, but because the comparison changed.

📉 The attributable fraction problem

Attributable fractions supposedly quantify the proportion of cases caused by—or "attributed to"—a particular exposure.

  • Because each disease case has multiple causes filling its jar, attributable fractions for all possible causes will sum to well over 100%.
  • This makes attributable fractions "rather less than useful" as a measure of association.
  • The multicausal nature of disease means you cannot cleanly partition cases into single-cause buckets.

🔬 Connection to epidemiologic research

🔬 Difficulty proving causation

  • The excerpt mentions it is difficult to use epidemiologic studies to "prove" an exposure/disease association is causal.
  • Randomized trials are occasionally an exception (discussed further in the source material).
  • Epidemiologic research looks for evidence that exposures and outcomes are associated, but association does not automatically mean causation given the multicausal framework.
48

Determining When Associations Are Causal in Epidemiologic Studies

Determining When Associations Are Causal in Epidemiologic Studies

🧭 Overview

🧠 One-sentence thesis

Epidemiologists must carefully assess whether statistical associations are truly causal by ruling out bias and confounding, then collectively weighing evidence through frameworks like Hill's considerations, with randomized controlled trials providing the strongest—but not always feasible—evidence.

📌 Key points (3–5)

  • The assessment sequence: first confirm the association is real (not due to bias, confounding, or chance), then evaluate whether it might be causal.
  • Collective consensus: determining causality is not done by one person or one study; the field moves toward consensus through published research, conferences, and cross-disciplinary consultation.
  • Hill's considerations are not rigid criteria: they are "things to think about" rather than a definitive checklist; some (like specificity) work well for infectious diseases but poorly for chronic diseases.
  • Common confusion—strength vs. context: a "strong" or "weak" cause depends on the prevalence of other causes in the population; eliminating one cause can make another appear stronger.
  • RCTs provide the best causal evidence: randomization eliminates confounding, but RCTs are not always feasible or ethical, and they remain vulnerable to bias and random error.

🔍 The causal assessment process

🔍 First step: ruling out artifacts

  • Before asking "is this causal?" epidemiologists must ask "is this association real?"
  • The excerpt provides a flow chart logic:
    1. Observe a statistical association between exposure and outcome.
    2. Rule out bias, confounding, and random chance.
    3. Only then begin to assess causality.
  • Example: if a study shows smoking is associated with lung cancer, first confirm the association isn't an artifact of measurement error or selection bias.

🤝 Collective, iterative consensus

  • Causality assessment requires:
    • Thorough understanding of the research question.
    • Knowledge of underlying biology and physiology.
    • Review of previous work on the topic.
  • The excerpt emphasizes this is not a solo activity:
    • Researchers publish studies, read others' work, discuss at conferences, and consult colleagues from related disciplines.
    • "Slowly, collectively, the broad public health field moves toward a consensus for a given exposure/disease causal relationship."
  • Don't confuse: one study cannot "prove" causality; consensus emerges over time across many studies and experts.

📋 Hill's considerations and their limits

📋 What Hill's considerations are

Hill's "causal considerations": a famous checklist-style list of items to think about when assessing whether an epidemiologic association is causal.

  • Hill himself was careful not to call them criteria—they are "just things to think about rather than a method for conclusively obtaining 'the' answer."
  • They are a useful exercise but "far from definitive in either direction."

⚠️ The specificity problem

  • Specificity means one cause leads to one effect.
  • Works well for infectious diseases:
    • Example: HIV causes AIDS, not also other unrelated conditions (though progression to AIDS requires additional causes like lack of antiretroviral drugs).
  • Fails for chronic diseases:
    • Smoking causes lung cancer, but also heart disease, oropharyngeal cancer, and other outcomes.
    • If we insist on specificity, smoking "cannot be a cause of lung cancer"—but we know this conclusion is untrue.
  • Don't confuse: specificity is helpful in some contexts but should not be applied rigidly across all disease types.

✅ Other Hill considerations that do apply

The excerpt gives smoking and lung cancer as an example where many considerations hold:

ConsiderationHow it applies to smoking and lung cancer
Dose-responseMore smoking correlates to higher risk of lung cancer
Biologic plausibilityCigarettes contain known carcinogens
ConsistencyAll studies on the topic reach the same conclusion

🧪 Randomized controlled trials and alternatives

🧪 Why RCTs provide the best causal evidence

  • RCTs offer the strongest evidence for causality, assuming correct conduct and an observed association.
  • Randomization eliminates confounding:
    • If everything is the same between the two groups except the intervention, then the intervention is almost certainly responsible for any difference in outcomes.
    • Hill's considerations include whether experimental (RCT) evidence exists on the topic.
  • But RCTs are not perfect:
    • They are not inherently free of bias or random error.
    • Readers must carefully evaluate methods and results before drawing firm conclusions.

🚫 When RCTs are not feasible

  • Numerous situations make RCTs not feasible or not ethical.
  • For these research topics, alternatives include:
    • Study design options (matching or enrolling a narrowly limited sample).
    • Statistical methods that simulate a randomized trial using observational data.

🔧 Statistical methods to simulate RCTs

The excerpt mentions two examples:

MethodWhat it does
Propensity score matchingAllows matching on dozens of variables at once (not possible with conventional matching)
Inverse probability weightingEach "type" of participant contributes to the final analysis according to how common they are
  • These methods aim to mimic the balance achieved by randomization when randomization is not possible.

🧩 Understanding "strength" of causes

🧩 Why "strong" and "weak" are context-dependent

  • People often call causes "strong" or "weak," usually referring to the population-level measure of association (e.g., odds ratio).
  • This idea only works if the prevalence of all causes in the population does not change.
  • Example from the excerpt:
    • In the US, smoking has odds ratios around 40.0 for lung cancer.
    • Radon has odds ratios around 1.5 or 3.0 for lung cancer.
    • If we eliminate smoking, radon will suddenly "look like a much stronger cause of lung cancer."
  • Don't confuse: the biological mechanism of radon hasn't changed; its apparent strength changes because the context (prevalence of other causes) has changed.

🧩 The attributable fraction problem

Attributable fractions: supposedly quantify the proportion of cases that were caused by—or can be "attributed to"—a particular exposure.

  • The problem: each case of disease has multiple causes (the "jar" metaphor from earlier in the text).
  • Because of this, the attributable fractions for all possible causes will sum to well over 100%.
  • The excerpt concludes this measure is "rather less than useful."

🕰️ Disease onset and prevention

🕰️ When disease is "caused"

  • A person accumulates causes until their "jar is full," then disease begins.
  • We have no way of knowing how many causes are "enough"; it may differ for each person.
  • A disease is considered "caused" once it begins.

🕰️ What prevention means

  • A preventive factor is one that either:
    • Prevents disease altogether, or
    • Delays disease onset for some length of time.
  • Don't confuse: we often talk about "preventing death" from various causes, but we cannot prevent death—we can only delay it.
49

Methods & Considerations

Methods & Considerations

🧭 Overview

🧠 One-sentence thesis

Determining causality in epidemiology requires moving beyond statistical associations through careful assessment of study design, biological plausibility, and collective scientific consensus, with randomized controlled trials providing the strongest evidence but observational methods offering alternatives when experiments are not feasible.

📌 Key points (3–5)

  • From association to causation: epidemiologists use careful, non-definitive language ("associated with") until determining whether an association is truly causal, not an artifact of bias, confounding, or chance.
  • Hill's causal considerations: checklists like Hill's provide useful guidance (dose-response, biological plausibility, consistency) but are not definitive criteria—some considerations (e.g., specificity) fail for chronic diseases like smoking-related illnesses.
  • RCTs as gold standard: randomized controlled trials provide the best causal evidence because randomization eliminates confounding, though bias and random error remain possible.
  • Common confusion: specificity (one cause → one effect) works for infectious diseases but not chronic diseases; smoking causes multiple outcomes (lung cancer, heart disease) yet is still causal for each.
  • When RCTs aren't possible: statistical methods like propensity score matching and inverse probability weighting simulate randomized trials using observational data.

🔬 The causal assessment process

🔬 Moving from association to causation

The excerpt presents a flow chart protocol:

  1. First, determine if the association is real (not due to bias, confounding, or random chance).
  2. Only then assess whether it might be causal.
  • Epidemiologists avoid causal language initially, using phrases like "associated with," "evidence in favor of," "possible."
  • Why this matters: public health and clinical policy cannot be based on associations alone—they require causal understanding.
  • The assessment is collective, not individual: researchers publish studies, read others' work, discuss at conferences, consult across disciplines, and slowly build consensus.

🧠 What the assessment requires

Three key elements for causal assessment:

  • Thorough understanding of the research question
  • Knowledge of underlying biology and physiology
  • Familiarity with previous work on the topic

Don't confuse: this is not a solo determination but a field-wide, gradual convergence toward consensus.

📋 Hill's causal considerations

📋 What they are (and aren't)

Hill's "causal considerations": things to think about rather than a method for conclusively obtaining "the" answer.

  • The excerpt emphasizes Hill was "very careful not to call them criteria."
  • They are checklist-style lists for determining whether an epidemiologic association is causal.
  • The most famous such list in the field.
  • Useful exercise but "far from definitive in either direction."

⚠️ The specificity problem

Hill's "specificity" consideration:

One cause leads to one effect.

Disease typeDoes specificity work?Example from excerpt
Infectious diseasesYesHIV causes AIDS (not also other things), though AIDS progression requires additional causes like lack of antiretroviral drugs
Chronic diseasesNoSmoking causes lung cancer AND heart disease AND oropharyngeal cancer—yet smoking certainly causes lung cancer
  • If we insist on specificity, smoking cannot be a cause of lung cancer because it also causes other outcomes.
  • We know this conclusion is untrue: smoking certainly causes lung cancer (and likely all the other conditions too).

✅ Considerations that do work for smoking

Other items from Hill's list that apply to the smoking/lung cancer relationship:

  • Dose-response association: more smoking correlates to higher risk of lung cancer
  • Biologic plausibility: cigarettes contain compounds known to be carcinogens
  • Consistency: all studies on the topic reach the same conclusion

Example: These three considerations together build a strong causal case even though specificity fails.

🎲 Randomized controlled trials (RCTs)

🎲 Why RCTs provide the best causal evidence

Of all non-review study designs, RCTs provide the best evidence in favor of causality (assuming correct conduct and an observed association).

How randomization works:

  • The randomization process renders confounding moot.
  • If everything is the same between the two groups except for the intervention, then that intervention almost certainly is responsible for any difference in outcomes.
  • Hill's considerations include whether experimental (RCT) evidence exists on the topic.

⚠️ RCT limitations

RCTs are not inherently free of problems:

  • Bias can still occur
  • Random error is still possible

Don't confuse: "best evidence" does not mean "perfect evidence." Readers must use the same caution as with other study types and carefully evaluate methods and results before drawing firm conclusions.

🚫 When RCTs aren't feasible

Numerous situations exist where RCTs are:

  • Not feasible
  • Not ethical

For these research topics, alternatives include:

  • Study design options from chapter 7: matching or enrolling a narrowly limited sample
  • Statistical methods that simulate randomized trials using observational data

🔢 Statistical methods for causal inference

🔢 Simulating randomization with observational data

When RCTs aren't possible, statistical methods aim to simulate a randomized trial using observational data.

🎯 Propensity score matching

Allows matching on dozens of variables at once, a feat that is not possible with conventional matching protocols.

  • Conventional matching is limited in how many variables can be matched simultaneously.
  • Propensity scores overcome this limitation.

⚖️ Inverse probability weighting

Each "type" of observed participant contributes to the final analysis according to how common that type of person is in the dataset and the target population.

Example: An underweight, 80-year-old Black woman with hypertension would contribute to the analysis based on how common that specific combination of characteristics is.

📚 Scope note

  • Doctoral students in epidemiology take entire courses on such causal inference methods.
  • Additional details are beyond the scope of this book.
  • Interested students can consult recent, introductory-level article series on this topic.

🎯 Practical implications

🎯 Why causality matters for public health

Public health and clinical professionals rely on knowing whether a particular exposure causes a particular disease because intervention and policy changes depend on this knowledge.

  • Associations alone are insufficient for policy decisions.
  • Causal knowledge is required to justify interventions.

🧩 The complexity of causation

Key realities about disease causation:

  • All cases of disease have multiple causes, and these do not act simultaneously.
  • Each case of disease likely has a slightly different mix of contributing causes.
  • We do not need to know all possible causes before taking action.
  • We can prevent some cases by intervening on even just a single known cause.

Example metaphor (implied): Think of disease like jars filling—we can "stop some jars from filling" by intervening on one known cause, even without knowing every contributing factor.

50

Conclusion

Conclusion

🧭 Overview

🧠 One-sentence thesis

Public health and clinical professionals need to determine causality from epidemiologic research to guide interventions and policy, but this requires integrating biological knowledge, existing studies, and careful evaluation of evidence—even though we can act on single known causes without knowing all contributing factors.

📌 Key points (3–5)

  • Why causality matters: intervention and policy changes depend on knowing whether an exposure causes a disease.
  • What causality determination requires: knowledge of biology/physiology/toxicology, awareness of in vitro/animal studies, and careful reading of epidemiologic literature.
  • Multiple causes reality: all disease cases have multiple causes that don't act simultaneously, and each case likely has a different mix of contributing causes.
  • Common confusion: we don't need to know all possible causes before taking action—intervening on even a single known cause can prevent some cases.
  • Best evidence hierarchy: RCTs provide the strongest causal evidence (when properly conducted), but statistical methods can simulate randomized trials when RCTs aren't feasible or ethical.

🔬 Evidence requirements for causality

🔬 What professionals must integrate

Determining causality is described as "a tricky proposition" that relies on three types of knowledge:

  • Biological foundation: underlying biology, physiology, and/or toxicology
  • Laboratory evidence: awareness of any existing in vitro or animal studies
  • Epidemiologic literature: careful readings of existing epidemiologic research on the topic

The excerpt emphasizes that all three domains must be considered together, not in isolation.

🎯 The smoking example revisited

The excerpt uses smoking and lung cancer to illustrate how multiple Hill criteria apply:

Hill criterionHow smoking meets it
Dose-responseMore smoking correlates to higher lung cancer risk
Biologic plausibilityCigarettes contain known carcinogens
ConsistencyAll studies reach the same conclusion

Don't confuse: The excerpt notes that smoking does not meet Hill's specificity criterion (because it causes multiple diseases, not just lung cancer), yet we know smoking certainly causes lung cancer—showing that not all criteria must be met.

🧪 Study design and causal inference

🧪 Why RCTs are strongest

RCTs provide the best evidence in favor of causality, assuming these studies were correctly conducted and showed an association.

How randomization helps:

  • The randomization process renders confounding moot
  • If everything is the same between the two groups except the intervention, then that intervention is almost certainly responsible for any difference in outcomes
  • Hill's considerations include whether experimental (RCT) evidence exists on the topic

Important caveat: RCTs are not inherently free of bias or random error. Readers must use the same caution as with other study types and carefully evaluate methods and results before drawing firm conclusions.

🔧 When RCTs aren't possible

The excerpt acknowledges "numerous situations in which RCTs are not feasible or not ethical."

Alternative approaches mentioned:

  • Matching or enrolling a narrowly limited sample (from chapter 7)
  • Statistical methods that simulate a randomized trial using observational data

Examples of causal inference methods:

MethodWhat it does
Propensity score matchingAllows matching on dozens of variables at once (not possible with conventional matching)
Inverse probability weightingEach "type" of participant contributes to analysis according to how common that type is in the dataset and target population

Example: An 80-year-old Black woman with hypertension who is underweight would contribute to the final analysis based on how common that specific combination of characteristics is.

Scope note: Doctoral students in epidemiology take entire courses on causal inference methods; details are beyond this book's scope.

🧩 The nature of disease causation

🧩 Multiple causes reality

All cases of disease have multiple causes, and these do not act simultaneously; each case of disease likely has a slightly different mix of contributing causes.

What this means:

  • No single cause operates alone
  • Causes don't all act at the same time
  • Individual variation exists—different people with the same disease may have reached that outcome through different causal pathways

🎯 The practical implication

We do not need to know all possible causes before taking action, as we can prevent some cases (stop some jars from filling) by intervening on even just a single known cause.

Key insight: Incomplete causal knowledge is not a barrier to action.

  • The "jars filling" metaphor suggests disease develops through accumulation of causal factors
  • Blocking even one contributing cause can prevent some disease cases
  • Public health interventions don't require complete understanding of all causal pathways

Don't confuse: "We can prevent some cases" ≠ "we can prevent all cases"—the excerpt is clear that single-cause interventions have partial, not complete, preventive effects.

51

How to Read an Epidemiologic Study

Introduction

🧭 Overview

🧠 One-sentence thesis

Epidemiologic studies follow a standard four-part structure—Introduction, Methods, Results, Discussion—that guides readers from the research gap through the study design to findings and interpretation.

📌 Key points (3–5)

  • Standard structure: Almost all epidemiology papers organize content in the same order (Introduction, Methods, Results, Discussion), whether or not these sections are explicitly labeled.
  • Introduction purpose: Establishes what is known, identifies the gap in knowledge, and states the specific study objective.
  • Methods content: Describes how the sample was obtained, inclusion/exclusion criteria, and the study design with relevant details.
  • Common confusion: Introductions are selective summaries chosen by authors, not exhaustive literature reviews—they can sometimes be biased or incomplete.

📖 The four-part structure

📖 Universal organization

  • Epidemiology studies consistently follow the same sequence: Introduction → Methods → Results → Discussion.
  • Section labels may vary (e.g., "Background" instead of "Introduction") or may be absent entirely.
  • Even unlabeled papers maintain this order.
  • This structure does not include the abstract, which is separate.

🔍 Introduction section

🔍 Three core components

The Introduction always contains three elements in sequence:

  1. What we already know about the topic
  2. What we don't know (the gap in the literature)
  3. What this study will do to address that gap

📚 Selective summary caveat

Introductions are NOT exhaustive literature reviews.

  • Authors choose which prior work to include (with some input from peer reviewers and editors).
  • This discretion means you may occasionally encounter biased or incomplete introductions.
  • Don't confuse: a selective introduction ≠ a complete review of all existing knowledge.

🎯 Finding the research question

  • The Introduction typically concludes with explicit statements like "our study question was…" or "our objective here was…"
  • This section answers: "What is the public health or clinical problem this study addresses?" and "What was their research question?"

🔬 Methods section

🔬 What Methods describe

The Methods section explains the procedures used to conduct the study.

🧪 Key components covered

Ideally, the Methods will describe:

ComponentWhat it includes
Sample selectionHow participants were obtained from the target population or what dataset was used
Inclusion/exclusion criteriaWho was eligible and who was excluded, with rationales when appropriate
Study designThe type of study and design-specific details (e.g., how randomization was performed in a randomized controlled trial)
  • The level of detail should allow readers to understand how the study was conducted.
  • Design-specific information varies depending on the study type.
52

Screening versus Diagnostic Testing

Screening versus Diagnostic Testing

🧭 Overview

🧠 One-sentence thesis

Screening tests identify disease in asymptomatic populations for early treatment, while diagnostic tests determine which condition a symptomatic patient has so they can be treated correctly.

📌 Key points

  • What screening is: testing people without symptoms to find disease early so it can be treated sooner.
  • What diagnostic testing is: testing symptomatic patients to figure out which condition they have from a list of possibilities (differential diagnosis).
  • Common confusion: the same test can be screening or diagnostic depending on context—mammogram without symptoms = screening; mammogram after finding a lump = diagnostic.
  • How differential diagnosis works: clinicians rule in or rule out conditions based on severity, test costs, and prevalence, prioritizing life-threatening conditions that need immediate treatment.
  • Key distinction from prevention: screening finds early disease, not prevention—it detects disease that has already started biologically but hasn't caused symptoms yet.

🔍 Screening: finding disease before symptoms

🩺 What screening means

Screening: testing an asymptomatic population for a particular condition in order to identify those who have the condition so that they can be treated early.

  • The target population has no symptoms—they are normal, everyday people.
  • The goal is to catch disease at an earlier stage than waiting for symptoms to appear.
  • Public health professionals often run or support screening programs.

📋 Common screening examples

The excerpt lists several routine screening programs in the US:

  • Cancer screenings: mammograms, pap smears, skin checks for high melanoma risk
  • Hypertension screening at doctor visits (routine blood pressure checks)
  • Hearing, vision, and dental screening in elementary schools
  • Annual tuberculosis and HIV screening for healthcare workers

⏰ When screening happens in disease progression

The excerpt describes the natural course of disease in stages:

  1. Biological onset: the disease starts (e.g., first cancerous mutation, virus begins replicating)—this is not observable
  2. Symptoms severe enough to seek treatment: the person goes to a clinic, emergency room, or pharmacy
  3. Outcome: they get better or they don't

Screening happens between biological onset and symptom-seeking.

  • It finds disease that has already started biologically but hasn't caused noticeable symptoms yet.
  • Don't confuse: screening is not primary prevention—it finds early disease, not prevents disease from starting.

🩹 Diagnostic testing: identifying the condition in symptomatic patients

🔬 What diagnostic testing means

Diagnostic testing: performed on a patient who is symptomatic in order to determine what condition they have.

  • The patient already has complaints or symptoms.
  • The purpose is to figure out which condition is causing the symptoms.
  • Clinicians use a process called differential diagnosis.

🧩 How differential diagnosis works

Differential diagnosis: the doctor, nurse practitioner, or other healthcare provider takes all known information from the patient's history and physical exam and decides what could be wrong.

The process:

  • List possible conditions that could explain the symptoms
  • Administer diagnostic tests designed to rule in or rule out each condition
  • Note: questions can be diagnostic tests (e.g., asking about recent head trauma)

📝 Example: 24-year-old with visual disturbances and severe headache

The excerpt walks through a detailed scenario:

Possible conditions (differential diagnosis list):

  • Concussion
  • Migraine with aura
  • Hemorrhagic stroke
  • Meningitis
  • Brain tumor

Testing sequence:

  1. Rule out concussion: Ask about recent head/neck trauma—if patient denies trauma (no contact sports, falls, accidents in last 24 hours), concussion is ruled out
  2. Rule out hemorrhagic stroke: Spinal tap to check for blood in cerebrospinal fluid—if clear, stroke is ruled out
  3. Rule out meningitis: Test cerebrospinal fluid—if clear, meningitis is ruled out
  4. Assume migraine: If all serious conditions ruled out, treat for migraine
  5. Return to list if needed: If patient doesn't improve in 24–48 hours, test for brain tumor

⚖️ What determines testing order

The excerpt identifies three factors:

  • Relative severity: life-threatening conditions tested first
  • Costs associated with tests: more expensive tests may be delayed if condition is less likely
  • Prevalence: rarer conditions may be tested later

Example from the scenario:

  • Stroke tested early even though rare in young people, because delaying treatment causes harm
  • Migraine can wait an hour without causing long-term disability (unpleasant but not dangerous)
  • Brain tumor very rare in 20s and delaying treatment by 24 hours doesn't matter, so tested later

🔄 Context determines screening vs. diagnostic

🎯 Same test, different purposes

The excerpt emphasizes that the distinction depends on whether the patient has symptoms:

ScenarioTest typeWhy
No symptoms of breast cancer, get mammogramScreening testTesting asymptomatic population
Find lump in breast, doctor orders mammogramDiagnostic testPatient is symptomatic
  • The physical test (mammogram) is identical.
  • The classification changes based on the clinical context.
  • Don't confuse: it's not about the technology—it's about whether symptoms are present.
53

Disease Critical Points and Other Things to Understand about Screening

Disease Critical Points and Other Things to Understand about Screening

🧭 Overview

🧠 One-sentence thesis

Screening is only useful when a disease has a critical point that falls between when screening can detect it and when patients seek treatment for symptoms, because only then does early detection improve outcomes.

📌 Key points (3–5)

  • Screening vs diagnosis: the same test can be screening (used on asymptomatic people) or diagnostic (used on symptomatic people), depending on context.
  • Screening is not primary prevention: screening finds early disease but does not prevent the disease from occurring; it may prevent poor outcomes (secondary prevention).
  • Critical point concept: every disease has a critical point—treat before it and you can change the outcome; treat after it and treatment has no effect.
  • Common confusion: screening is only beneficial when the critical point lies between screening detection and symptom-driven treatment seeking; if the critical point is too early or too late, screening wastes resources or causes harm.
  • Criteria for successful screening: a test must exist that detects pre-symptom disease, and the condition must be prevalent enough or serious enough to justify population-level screening.

🔬 Screening vs diagnostic testing

🔬 Context determines the label

Screening test: a test done in asymptomatic populations.

Diagnostic test: a test done in symptomatic populations.

  • The same physical test (e.g., a mammogram) can be either screening or diagnostic.
  • What matters is whether the person has symptoms when the test is ordered.
  • Example: A mammogram with no symptoms → screening. A mammogram after finding a lump → diagnostic.

🚫 Screening is not primary prevention

  • Screening finds disease that has already started (biological onset has occurred).
  • It does not prevent the disease from happening in the first place.
  • Screening may prevent poor outcomes (secondary prevention) by catching disease early, but it does not stop biological onset.
  • Don't confuse: primary prevention (stopping disease from occurring) vs secondary prevention (catching disease early to improve outcomes).

⏳ Natural history of disease and critical points

⏳ Three stages of disease progression

The natural course of a medical condition has three stages:

  1. Biological onset: the first mutation, the first viral replication, etc.; not observable.
  2. Treatment seeking: symptoms become severe enough that the person seeks help (clinic, emergency room, pharmacy).
  3. Outcome: the person either gets better or does not.

🔴 The critical point

Critical point: the moment in a disease's progression after which treatment no longer changes the outcome.

  • Treat before the critical point → you can cure the patient, extend life, or improve quality of life.
  • Treat after the critical point → treatment has no effect; further aggressive treatment may even be harmful (this is why hospice exists).
  • The timing of the critical point determines whether screening is useful.

🎯 When screening is useful (and when it is not)

❌ Critical point too late (Figure 11-3)

  • If the critical point occurs well after people seek treatment for symptoms, screening does not help.
  • By the time symptoms appear, there is still plenty of time to treat successfully.
  • Screening in this scenario wastes resources without causing physical harm.
  • Example: Prostate cancer and breast cancer probably fall into this category for most people (except high-risk groups); routine screening for most men and women may not be beneficial.

❌ Critical point too early (Figure 11-4)

  • If the critical point occurs before screening can detect the disease, screening is also not useful.
  • By the time screening detects the disease, it is already too late to treat.
  • Screening in this scenario causes emotional harm: people know they have an untreatable disease for a longer time.
  • Exception: For highly contagious conditions, we might still screen—not to treat the person, but so they can take precautions and not spread the disease.
  • Example: Before antiretroviral drugs existed, high-risk populations were screened for HIV to prevent transmission, not to cure.

✅ Critical point between screening and symptom-driven treatment (Figure 11-5)

  • Screening is most useful when the critical point lies between when screening can detect the disease and when patients seek treatment.
  • In this scenario, some patients will present with symptoms too late to treat, but screening can catch the disease early enough to help.
  • Even in this scenario, we might not implement population-wide screening if the disease is rare or the test is very expensive.
  • Key takeaway: Only consider screening when early detection makes a difference in outcomes.

📋 Criteria for a successful screening program

📋 Two main criteria

  1. A test must exist that can detect early, pre-symptom disease.

    • This is not always the case.
    • Example: We don't screen for ovarian cancer partly because no such test currently exists.
  2. The condition must be prevalent enough and/or the costs of not treating it must be high enough to justify population-level screening.

    • If prevalence is very low, a screening program may not be worth the resources.
    • Example: Ovarian cancer prevalence is very low, so screening would arguably not be cost-effective.

🧪 Test characteristics (introduction)

The excerpt introduces four test characteristics used to quantify test accuracy:

Test characteristicTypeBehavior with prevalence change
SensitivityFixedDoes not change
SpecificityFixedDoes not change
Positive predictive value (PPV)VariableChanges with prevalence
Negative predictive value (NPV)VariableChanges with prevalence
  • Sensitivity and specificity are "fixed test characteristics" because they do not change regardless of disease prevalence.
  • PPV and NPV change when disease prevalence in the underlying population changes.
  • Calculation requires arranging data in a 2×2 table with test result (positive/negative) on one axis and disease status (present/absent) on the other.
54

Accuracy of Screening and Diagnostic Tests

Accuracy of Screening and Diagnostic Tests

🧭 Overview

🧠 One-sentence thesis

Four test characteristics—sensitivity, specificity, positive predictive value, and negative predictive value—quantify how accurately a screening or diagnostic test identifies disease, with the first two remaining constant regardless of prevalence while the latter two change as disease prevalence changes.

📌 Key points (3–5)

  • Fixed vs. variable characteristics: Sensitivity and specificity do not change with disease prevalence, but PPV and NPV do change when prevalence changes.
  • What each characteristic measures: Sensitivity measures the probability of testing positive given disease presence; specificity measures the probability of testing negative given disease absence; PPV measures the probability of having disease given a positive test; NPV measures the probability of not having disease given a negative test.
  • Common confusion: Sensitivity (probability of T+ given D+) is different from PPV (probability of D+ given T+)—the order of conditioning matters.
  • Clinical application rules: SpIN (high specificity rules in when positive) and SnOUT (high sensitivity rules out when negative) guide test selection for ruling in or ruling out disease.
  • Prevalence impact: As disease becomes rarer, PPV decreases and NPV increases, making positive results less reliable in low-prevalence populations.

📊 The 2×2 screening table

📊 Table structure and cell meanings

The screening 2×2 table arranges data with test results (T+ or T−) on the left and disease status (D+ or D−) across the top:

D+D−Total
T+TP (true positives)FP (false positives)TP+FP
T−FN (false negatives)TN (true negatives)FN+TN
TotalTP+FNFP+TNTP+FP+TN+FN
  • True positives (TP): individuals who have the disease and test positive.
  • False positives (FP): individuals who test positive but do not actually have the disease.
  • False negatives (FN): individuals who have the disease but test negative.
  • True negatives (TN): individuals who do not have the disease and test negative.

🔬 How to obtain the data

Gold-standard diagnostic method: the definitive, most accurate way to determine whether someone truly has the disease.

  • Administer both the test being evaluated and a gold-standard diagnostic method to a large group of people.
  • Example: For Alzheimer's disease, the gold standard is the presence of certain brain plaques observable upon autopsy; researchers developed the Mini Mental State (MMS) test to screen living patients, then later confirmed disease status through autopsy data.
  • Example: For depression, the Beck Depression Inventory (BDI) is compared against a series of visits with a mental health professional qualified to definitively diagnose depression.
  • The reason for developing a test: the test is quicker, cheaper, or more feasible than the gold standard (e.g., the BDI is self-administered and suitable for large cohort studies, whereas clinician-mediated diagnosis is untenable for large or geographically disparate populations).

📈 Calculating prevalence from the table

Prevalence in the sample = (everyone with disease) / (everyone in the sample) = (TP+FN) / (total sample size).

🔍 Fixed test characteristics: Sensitivity and Specificity

🎯 Sensitivity (Sn)

Sensitivity: the probability that a patient tests positive given that they have the disease.

  • In probability notation: Sn = P(T+|D+) = TP / (TP+FN).
  • The denominator (TP+FN) is all individuals with the disease; the numerator (TP) is those who tested positive.
  • Expressed as a percentage.
  • Why it's fixed: sensitivity does not change if the prevalence of the condition in the sample changes.

🎯 Specificity (Sp)

Specificity: the probability that a patient tests negative given that they do not have the disease.

  • In probability notation: Sp = P(T−|D−) = TN / (FP+TN).
  • The denominator (FP+TN) is all individuals without the disease; the numerator (TN) is those who tested negative.
  • Expressed as a percentage.
  • Why it's fixed: specificity does not change if the prevalence of the condition in the sample changes.

📚 Published values and clinical use

  • Sensitivity and specificity values are published when new tests become available.
  • Clinicians use these values to decide what tests to order.

🩺 Clinical decision rules: SpIN and SnOUT

🔐 SpIN: High specificity rules in

SpIN: a test with high specificity, when positive, rules IN.

  • Why it works: The denominator for specificity is FP+TN (all individuals without disease), and the numerator is just TN. If specificity is high (close to 100%), then FP must be very low.
  • Implication: A patient with a positive result on a highly specific test is probably a true positive.
  • When to use: Clinicians choose a test with high specificity when they want to rule in a disease.

🔓 SnOUT: High sensitivity rules out

SnOUT: a test with high sensitivity, when negative, rules OUT.

  • Why it works: The denominator for sensitivity is TP+FN (all individuals with disease), and the numerator is just TP. If sensitivity is near 100%, then by definition there are few false negatives.
  • Implication: A negative result on a highly sensitive test is almost certainly a true negative.
  • When to use: Clinicians choose a test with high sensitivity when they want to rule out a disease (e.g., concussion or stroke).

🧪 Screening programs

  • For screening purposes, we test an asymptomatic population—we want to minimize false negatives.
  • Why: We wouldn't want to tell someone that they are disease-free if they're really not.
  • Result: Screening programs utilize tests with high sensitivities.

🔮 Variable test characteristics: PPV and NPV

✅ Positive Predictive Value (PPV)

Positive predictive value: the probability that you actually have the disease given that you tested positive.

  • In probability notation: PPV = P(D+|T+) = TP / (TP+FP).
  • Expressed as a percentage.
  • Don't confuse with sensitivity: Sensitivity is P(T+|D+), while PPV is P(D+|T+)—the order of conditioning is reversed.

❌ Negative Predictive Value (NPV)

Negative predictive value: the probability that you do not have the disease given that you tested negative.

  • In probability notation: NPV = P(D−|T−) = TN / (FN+TN).
  • Expressed as a percentage.

🔄 How prevalence affects PPV and NPV

  • Key difference from sensitivity/specificity: PPV and NPV do change as the prevalence of disease in the sample changes.
  • General pattern: As prevalence decreases, PPV will decrease and NPV will increase.
  • Intuitive explanation: As a condition becomes more rare, guessing that the patient does not have the disease becomes more and more likely to be correct.

🩺 Clinical interpretation

  • PPV and NPV are used to interpret test results once those results are known.
  • Important requirement: You must know something about the prevalence of disease in the target population to which an individual belongs before you can interpret their test results.
  • Example: If a patient tests positive for tuberculosis (TB) and the prevalence of TB in the patient's population is 10%, and the PPV given a 10% prevalence is 52.6%, then the interpretation is: there is a 52.6% chance that this patient has TB (and a 47.4% chance that the result was a false positive).

🚨 Real-world impact: Mammography recommendations

  • 2009 USPSTF recommendation: Women in their 40s should stop being screened for breast cancer unless they are extremely high-risk.
  • Rationale: The prevalence of disease among women in their 40s is very low (0.98%), so the PPV in this population is well under 1%.
  • Consequence: Greater than 99% of women who are sent for follow-up testing (breast biopsy, usually) are false positives and thus undergo this expensive, invasive follow-up (with its corresponding emotional stress) unnecessarily.
  • Exception: For women with a strong family history and/or who are known to be BRCA-1 or BRCA-2 carriers, mammography for 40-year-olds is still warranted—because these women come from an underlying population in which the prevalence (and thus the PPV) is much higher.

🧮 Worked example: Anemia test

📋 The scenario

A new test for anemia is developed that does not require a finger stick to obtain blood (no one likes needles!)—perhaps using a scanner that can detect hemoglobin levels through the thin skin on the underside of a wrist. The following 2×2 table is published:

D+D−Total
T+10115116
T−18866884
Total1198811000
  • Test results (T+ or T−) come from the wrist scanner.
  • Disease results (D+ or D−) come from the usual method of diagnosing anemia, which requires a blood draw.

🔢 Sample calculations

Sensitivity = TP / (TP+FN) = 101 / 119 = 84.9%

Specificity = TN / (FP+TN) = 866 / 881 = 98.3%

Positive Predictive Value = TP / (TP+FP) = 101 / 116 = 87.0%

Negative Predictive Value = TN / (FN+TN) = 866 / 884 = 98.0%

Prevalence = (TP+FN) / total = 119 / 1000 = 11.9%

🔄 Adjusting for different prevalence

  • Scenario: A patient from a population with lower prevalence of anemia than 11.9%—adolescent males, for example, in whom the prevalence is around 1%—takes the new test and tests positive.
  • Problem: The above PPV (87.0%) no longer applies because it was calculated for a population with 11.9% prevalence.
  • Solution: Since we know the sensitivity (84.9%) and specificity (98.3%), we can create a new 2×2 table for a population with 1% prevalence, from which we can calculate a new PPV for this lower prevalence population.
  • Method: Begin by deciding (arbitrarily) that we will again have 1,000 people in the table, then use the known sensitivity and specificity to fill in the cells based on the new prevalence.
55

Evaluating Research Quality and Epidemiological Study Designs

Example

🧭 Overview

🧠 One-sentence thesis

Research quality depends on appropriate authorship expertise, transparent conflict-of-interest disclosure, and genuine peer review, while epidemiological study designs differ fundamentally in how they select participants and measure disease-exposure relationships.

📌 Key points (3–5)

  • Author expertise matters: the author list should include specialists relevant to the topic (e.g., cardiologists for heart studies, statisticians for complex methods).
  • Funding and conflicts illuminate bias: funding sources (e.g., formula companies funding breastfeeding studies) reveal potential conflicts of interest.
  • Peer review takes time: submission-to-acceptance intervals shorter than six weeks suggest the peer review process may have been bypassed.
  • Common confusion—case-control vs cohort: case-control studies select by disease status first, then look back at exposure; cohort studies select by exposure (or representative sample) and follow forward for disease.
  • Bias types differ in mechanism: selection bias is about who you get/miss; misclassification bias is about putting people in the wrong category.

🔍 Assessing research credibility

👥 Author expertise

  • Check whether the author list matches the study's needs.
  • Mismatch signals:
    • A congestive heart failure treatment study without a cardiologist author.
    • A study using advanced statistical methods (beyond basic logistic or linear regression) but no authors with specialized statistical training.
  • Example: If a study applies complex modeling but only clinicians are listed, the statistical validity may be questionable.

💰 Funding and conflicts of interest

  • Usually found on the first page, last page, or between the conclusion and references.
  • Why it matters: funding sources can reveal incentives to reach particular conclusions.
  • Example: A study questioning breastfeeding benefits funded by the International Formula Council—the funder has a commercial interest in formula sales.

⏱️ Peer review timeline

  • Journals often list:
    • Submission date
    • Revised form received date (after initial review and author revisions)
    • Acceptance date
  • Red flag: If submission and acceptance are less than six weeks apart (realistically should be at least six months), the peer review process may have been circumvented.
  • Don't confuse: a short timeline doesn't always mean fraud, but it raises questions about thoroughness.

📊 Core epidemiological concepts

📋 2x2 table

A convenient way for epidemiologists to organize data, from which one calculates either measures of association or test characteristics.

  • It is a fundamental organizing tool, not a specific measure.
  • Used to cross-classify exposure and outcome status.

📏 Measures of association

TypeHow calculatedExample
AbsoluteFundamentally by subtractionRisk difference
Relative(Not detailed in excerpt)(Not specified)
  • Attributable fraction is misleading:
    • Claims to quantify the proportion of disease cases "attributed" to one exposure.
    • Problem: every disease has multiple causes, so attributable fractions for all relevant exposures sum to well over 100%.
    • Result: uninterpretable.

🧪 Baseline

The start of a cohort study or randomized controlled trial.

  • Marks the point from which follow-up begins.

🧩 Understanding bias

🎯 Selection bias

Systematic error stemming from poor sampling, poor response rate, differential treatment of groups, and/or unequal loss to follow-up.

  • Root causes:
    • Your sample is not representative of the target population.
    • Poor response rate from those invited.
    • Treating cases and controls (or exposed/unexposed) differently.
    • Unequal loss to follow-up between groups.
  • How to assess: Ask "Who did they get, and who did they miss?" Then ask "Does it matter?"
  • Sometimes the missing group doesn't affect conclusions; other times it does.

📦 Misclassification bias

Something (exposure, outcome, or confounder) was measured improperly, putting people into the wrong box in a 2x2 table.

  • Causes:
    • People unable to tell you something (e.g., don't remember).
    • People unwilling to tell you something (social desirability).
    • Objective measure systematically wrong (e.g., blood pressure cuff not zeroed correctly—always off in the same direction).
  • Subtypes mentioned:
    • Recall bias
    • Social desirability bias
    • Interviewer bias
  • All are examples of misclassification bias.

🔀 Differential vs non-differential misclassification

TypeDefinitionImpact
Non-differentialBoth exposed and unexposed have equal chance of being put in the wrong boxMisclassification equally distributed
DifferentialUnequal chance of misclassification between groupsMisclassification varies by group

🔬 Study design: Case-control studies

🏗️ How case-control studies work

An observational study that begins by selecting cases (people with the disease) from the target population, then selects controls (people without the disease) from the same target population, without regard to exposure status; after selection, previous exposure(s) are determined.

Step-by-step:

  1. Select cases (people with the disease) from the target population.
  2. Select controls (people without the disease) from the same target population.
    • Important: if a control suddenly developed the disease, they would qualify as a case.
  3. Selection of both cases and controls is done without regard to exposure status.
  4. After selecting both groups, determine their previous exposure(s).

⏪ Retrospective nature

  • This is a retrospective study design (looks backward in time).
  • Consequence: more prone to recall bias than prospective designs.

🎯 When to use case-control studies

  • Necessary when:
    • The disease is rare.
    • The disease has a long induction period.
  • Example: For a rare cancer with a 20-year induction period, waiting prospectively would be impractical; instead, select current cases and matched controls, then look back at past exposures.

📊 Measure of association

  • The excerpt states "the only appropriate measure of association is" but the sentence is cut off.
  • (The excerpt does not complete this statement.)

⚠️ Common confusions

🔄 Selection vs classification

  • Selection bias: about sampling—who is in your study and who is missing.
  • Misclassification bias: about measurement—people are in the study but placed in the wrong category.
  • Don't confuse: both are systematic errors (bias), but they arise at different stages (recruitment vs measurement).

🕰️ Retrospective vs prospective

  • Case-control studies are retrospective: select by disease status, then look back at exposure.
  • Cohort studies and randomized controlled trials start at baseline and follow forward.
  • Don't confuse: the direction of time matters for susceptibility to recall bias.
56

Screening and Diagnostic Testing in Epidemiology

Summary

🧭 Overview

🧠 One-sentence thesis

Screening and diagnostic tests are evaluated using sensitivity and specificity before use, and positive/negative predictive values after results are known, with predictive values depending critically on disease prevalence in the target population.

📌 Key points (3–5)

  • Context distinguishes screening from diagnostic testing: the difference is whether the tested person shows symptoms, not the test procedure itself.
  • Two sets of accuracy measures: sensitivity/specificity (test characteristics used to select tests) vs. PPV/NPV (used to interpret results after testing).
  • Prevalence is essential for interpretation: you must know disease prevalence in the target population to calculate and use positive and negative predictive values.
  • Common confusion: sensitivity/specificity are fixed test characteristics, but PPV/NPV change with prevalence—the same test can have very different predictive values in different populations.
  • Low prevalence dramatically lowers PPV: even with high sensitivity and specificity, a positive test in a low-prevalence population may have poor positive predictive value.

🔬 Test characteristics vs. predictive values

🔬 Sensitivity and specificity

Sensitivity and specificity: fixed test characteristics used ahead of time to pick the correct test.

  • These measures describe how the test performs regardless of the population.
  • Sensitivity = true positives divided by all diseased individuals (TP/D+).
  • Specificity = true negatives divided by all non-diseased individuals (TN/D-).
  • The excerpt shows sensitivity of 84.9% and specificity of 98.3% remain constant across populations.

📊 Positive and negative predictive values

PPV and NPV: measures used after test results are known to interpret them.

  • PPV (Positive Predictive Value) = probability of actually having the disease given a positive test.
  • NPV (Negative Predictive Value) = probability of not having the disease given a negative test.
  • Unlike sensitivity/specificity, these depend on disease prevalence.
  • Example from excerpt: NPV of 99.9% means a negative test virtually rules out disease; no follow-up blood draw needed.

🔄 When to use which measure

MeasureWhen to useWhat it tells you
Sensitivity/SpecificityBefore testing, to select appropriate testHow well the test performs (fixed characteristic)
PPV/NPVAfter testing, to interpret resultsProbability of disease given test result (varies with prevalence)

📉 The prevalence effect

📉 How prevalence changes predictive values

  • The excerpt demonstrates this with a worked example using 1% prevalence.
  • Starting with 1,000 people and 1% prevalence → 10 diseased, 990 non-diseased.
  • With 84.9% sensitivity: 8.49 true positives, 1.51 false negatives.
  • With 98.3% specificity: 973.17 true negatives, 16.83 false positives.
  • Final PPV = 8.49/25.32 = 33.5%.

⚠️ Low prevalence, low PPV

  • Even with high sensitivity (84.9%) and high specificity (98.3%), PPV is only 33.5% at 1% prevalence.
  • This means only a 33.5% chance the tested person actually has the condition despite a positive test.
  • Don't confuse: a "good" test (high sensitivity/specificity) can still produce many false positives in low-prevalence populations.
  • Example from excerpt: the male adolescent with positive skin scan for anemia needs confirmatory blood draw because PPV is so low.

🧮 Why prevalence matters mathematically

  • Prevalence determines how many diseased vs. non-diseased people enter the calculation.
  • In low-prevalence populations, even a small false-positive rate (1.7% in the example) applied to a large non-diseased group (990 people) produces many false positives (16.83).
  • These false positives can outnumber or approach the true positives, lowering PPV.

🩺 Screening vs. diagnostic testing

🩺 The key distinction

Screening and diagnostic testing are similar procedures; the difference depends on context (whether the tested person is symptomatic or not).

  • Not about the test itself: the same procedure can be screening or diagnostic.
  • Screening: applied to asymptomatic individuals.
  • Diagnostic: applied to symptomatic individuals.
  • The excerpt emphasizes this is a contextual distinction, not a procedural one.

🎯 Practical implications

  • Prevalence differs between symptomatic and asymptomatic populations.
  • The same test will have different PPV/NPV when used for screening vs. diagnosis.
  • Example: screening a general population (low prevalence) vs. testing someone with symptoms (higher pre-test probability).