Research Methods in Psychology

1

Understanding Science

Understanding Science

🧭 Overview

🧠 One-sentence thesis

Psychology qualifies as a science because it shares the same general approach as astronomy, biology, and chemistry—systematic empiricism, empirical questions, and public knowledge—rather than being defined by subject matter or equipment.

📌 Key points (3–5)

  • What makes something a science: not the subject matter or tools, but three fundamental features—systematic empiricism, empirical questions, and public knowledge.
  • Why psychology is a science: it applies the same general scientific approach to understanding human behavior that other sciences apply to their domains.
  • Pseudoscience vs. science: pseudoscience claims to be scientific but lacks one or more of the three features; it may ignore research, avoid publication, or make unfalsifiable claims.
  • Common confusion: falsifiable vs. unfalsifiable claims—scientific claims must be testable in a way that observations could count as evidence against them; claims that can explain any outcome are not scientific.
  • Why it matters: distinguishing science from pseudoscience protects people from harmful "treatments" and helps psychology students separate their field from pseudopsychology.

🔬 The three fundamental features of science

🔬 Systematic empiricism

Empiricism refers to learning based on observation, and scientists learn about the natural world systematically, by carefully planning, making, recording, and analyzing observations of it.

  • Not just casual observation: scientists don't rely on stereotypes or informal impressions; they plan and record observations carefully.
  • Checking ideas against reality: logical reasoning and creativity play roles, but scientists insist on testing their ideas against systematic observations.
  • Example: Mehl and colleagues didn't trust stereotypes about women talking more; they systematically recorded, counted, and compared words spoken by a large sample of women and men.
  • When observations conflict with beliefs: scientists trust their systematic observations over stereotypes or preconceptions.

❓ Empirical questions

Empirical questions are questions about the way the world actually is and, therefore, can be answered by systematically observing it.

  • What counts as empirical: questions that can be answered by observing the world—either something is true or it isn't, and observation can determine which.
  • Example: "Do women talk more than men?" is empirical because systematic observation can reveal whether they do or don't.
  • What science cannot answer: questions about values—whether things are good, bad, just, unjust, beautiful, or ugly, and how the world ought to be.
    • Whether a stereotype is accurate = empirical (science can answer).
    • Whether holding inaccurate stereotypes is wrong = value judgment (science cannot answer).
    • Whether criminal behavior has a genetic basis = empirical.
    • What actions ought to be illegal = not empirical.
  • Why this matters for psychology: researchers must be especially mindful of this distinction between empirical questions and value judgments.

📢 Public knowledge

After asking their empirical questions, making their systematic observations, and drawing their conclusions, scientists publish their work.

  • How publication works: scientists write articles for professional journals, putting their question in context, describing methods in detail, and clearly presenting results and conclusions.
  • Open access trend: increasingly, scientists publish in open access journals where articles are freely available to everyone, allowing publicly-funded research to create truly public knowledge.

Why publication is essential (two reasons):

ReasonExplanation
Science is socialA large-scale collaboration across time and space; current knowledge is based on many studies by many researchers who have shared their work publicly over years
Science is self-correctingIndividual scientists know their methods can be flawed and conclusions incorrect; publication allows others to detect and correct errors so knowledge increasingly reflects reality

🔄 Self-correction example: the Many Labs Replication Project

  • What it was: a large, coordinated effort by prominent psychological scientists worldwide to replicate findings from 13 classic and contemporary studies.
  • The handwashing study: Schnall and colleagues originally found that washing hands leads people to view moral transgressions as less wrong.
    • If reliable, this might explain why religious traditions associate physical cleanliness with moral purity.
  • The replication attempt: using the same materials and nearly identical procedures with a much larger sample, the Many Labs researchers could not replicate the original finding.
  • What this suggests: the original finding may have stemmed from a small sample size, which can lead to unreliable results.
  • Current status: we still cannot definitively conclude the effect doesn't exist, but the effort demonstrates the collaborative and cautious nature of scientific progress.

🎭 Science versus pseudoscience

🎭 What pseudoscience is

Pseudoscience refers to activities and beliefs that are claimed to be scientific by their proponents—and may appear to be scientific at first glance—but are not.

Definition criteria: A set of beliefs or activities is pseudoscientific if:

  • (a) its adherents claim or imply it is scientific, but
  • (b) it lacks one or more of the three features of science.

🔮 Example: biorhythms

  • The claim: people's physical, intellectual, and emotional abilities run in cycles from birth until death.
    • Physical cycle: 23-day period.
    • Intellectual cycle: 33-day period.
    • Emotional cycle: 28-day period.
  • Practical application: if scheduling an exam, you'd want to pick a time when your intellectual cycle is at a high point.
  • Why it seems scientific: the theory has been around for over 100 years; popular books and websites use impressive, scientific-sounding terms like "sinusoidal wave" and "bioelectricity."
  • The problem: there is simply no good reason to think biorhythms exist—relevant scientific research is ignored.

🚫 How pseudoscience fails the three features

Feature lackingHow it manifests
Systematic empiricismNo relevant scientific research exists, or existing research is ignored (as with biorhythms)
Public knowledgeProponents claim to have conducted research but never publish it in a way that allows others to evaluate it
Empirical questionsClaims are not falsifiable (see below)

⚖️ Falsifiability: Popper's criterion

Scientific claims must be falsifiable: expressed in such a way that there are observations that would—if they were made—count as evidence against the claim.

  • Falsifiable example: "Women talk more than men" is falsifiable because systematic observations could reveal either that they do or that they don't.
  • Unfalsifiable example: ESP and psychic powers.
    • Claim: psychic powers can disappear when observed too closely.
    • If a test shows better-than-chance predictions → consistent with psychic powers.
    • If a test shows no better-than-chance predictions → also consistent (powers disappeared under observation).
    • The problem: no possible observation would count as evidence against ESP.

⚠️ Why pseudoscience matters (three reasons)

  1. Clarifies science: learning about pseudoscience brings the fundamental features of science—and their importance—into sharper focus.
  2. Widespread and harmful: biorhythms, psychic powers, astrology, and other pseudoscientific beliefs are widely promoted on the Internet, TV, books, and magazines.
    • Far from harmless: believers often opt for "treatments" like homeopathy for serious medical conditions instead of empirically-supported treatments, resulting in great personal toll.
    • Learning what makes them pseudoscientific helps us identify and evaluate such beliefs when we encounter them.
  3. Pseudopsychology: many pseudosciences claim to explain human behavior and mental processes (biorhythms, astrology, graphology, magnet therapy for pain).
    • Psychology students must distinguish their field clearly from pseudopsychology.

📚 Examples of pseudoscience

PseudoscienceWhat it claims
CryptozoologyStudy of "hidden" creatures like Bigfoot, Loch Ness monster, chupacabra
Pseudoscientific psychotherapiesPast-life regression, re-birthing therapy, bioscream therapy
HomeopathyTreatment using natural substances diluted sometimes to the point of no longer being present
PyramidologyOdd theories about Egyptian pyramids (e.g., built by extraterrestrials); pyramids have healing and special powers

🧬 What sciences have in common

🧬 Not subject matter or tools

  • Different subjects: astronomers study celestial bodies, biologists study living organisms, chemists study matter and its properties.
  • Different equipment: few biologists would know what to do with a radio telescope; few chemists would know how to track a moose population in the wild.
  • The real commonality: philosophers and scientists who have thought deeply about this concluded that sciences share a general approach to understanding the natural world.

🧠 Psychology as a science

  • How it fits: psychology takes the same general approach to understanding one aspect of the natural world—human behavior.
  • Don't confuse: being a science is not about having labs or equipment; it's about the approach (systematic empiricism, empirical questions, public knowledge).
2

Scientific Research in Psychology

Scientific Research in Psychology

🧭 Overview

🧠 One-sentence thesis

Scientific research in psychology follows a cyclical model where research questions lead to empirical studies that produce published results, which in turn generate new questions, and this process is necessary because common sense and intuition often lead to incorrect beliefs about human behavior.

📌 Key points (3–5)

  • The research cycle: Questions from literature or observations → empirical study → data analysis → publication → new questions, forming a continuous loop.
  • Who does the research: Mostly doctoral-level researchers (PhDs) in universities, government agencies, and organizations, often collaborating with students.
  • Basic vs applied research: Basic research seeks understanding for its own sake; applied research addresses practical problems—though the distinction is not always clear-cut.
  • Common confusion: Folk psychology (common sense) feels accurate but is often wrong; scientific research reveals that widely-held beliefs (e.g., "venting anger relieves it") are frequently incorrect.
  • Why science is needed: Humans rely on mental shortcuts and confirmation bias, noticing only evidence that supports existing beliefs while ignoring contradictory cases.

🔄 The research cycle model

🔄 How the cycle works

The excerpt describes research as flowing through connected stages:

  • Formulate a research question
  • Conduct an empirical study
  • Analyze the data
  • Draw conclusions
  • Publish results (which become part of research literature)
  • Research literature generates new questions → cycle repeats

Where questions come from:

  • The research literature itself (most common source)
  • Informal observations
  • Practical problems needing solutions

Even when questions originate outside the cycle, researchers start by checking existing literature to see if the question has been answered and to refine it based on previous findings.

📱 Example: Cell phone use and driving

The excerpt illustrates the cycle with this case:

  • Question origin: As cell phones became widespread in the 1990s, people wondered about effects on driving
  • Literature check: Previous research showed verbal tasks impair simultaneous perceptual/motor tasks, but no specific studies on cell phones and driving existed
  • Study design: Researchers compared driving performance with and without cell phone use, both in lab and on road, under controlled conditions
  • Results: Cell phone use impaired hazard detection, reaction time, and vehicle control
  • Publication: Each new study was published and became part of growing literature on the topic
  • New questions: This work likely generated further research questions for other scientists

💬 Example: Gender and talkativeness

The excerpt references research by Mehl and colleagues:

  • Question: Are women more talkative than men? (suggested by stereotypes and published claims)
  • Literature check: Found the question had not been adequately addressed scientifically
  • Study: Conducted careful empirical study
  • Results: Found very little difference between women and men
  • Impact: Publication suggests new questions about reliability, cultural differences, etc.

👥 Who conducts psychological research

🎓 Credentials and roles

Typical researchers:

  • People with doctoral degrees (usually PhD) or master's degrees in psychology and related fields
  • Supported by research assistants with bachelor's degrees or relevant training

Where they work:

  • Majority: College and university faculty (often collaborating with graduate and undergraduate students)
  • Government agencies (e.g., Mental Health Commission of Canada)
  • National associations (e.g., Canadian Psychological Association)
  • Nonprofit organizations (e.g., Canadian Mental Health Association)
  • Private sector (e.g., product development)

🔬 Expertise areas

  • Most researchers are not trained/licensed clinicians
  • Instead, they have expertise in subfields: behavioral neuroscience, cognitive psychology, developmental psychology, personality psychology, social psychology, etc.
  • Doctoral-level researchers may conduct research full-time or combine it with teaching and service

💡 Why they do it

Motivations mentioned:

  • Professional reasons
  • Personal reasons
  • Intellectual and technical challenges
  • Satisfaction of contributing to scientific knowledge about human behavior
  • Natural curiosity about the world and behavior

For students: The excerpt notes you might enjoy the research process and find opportunities to participate as research assistants or study participants.

🎯 Types of research purposes

🔬 Basic research

Basic research in psychology is conducted primarily for the sake of achieving a more detailed and accurate understanding of human behavior, without necessarily trying to address any particular practical problem.

  • Goal: Understanding for its own sake
  • Example from excerpt: The Mehl study on gender and talkativeness falls into this category
  • Potential indirect applications: Basic research on sex differences in talkativeness could eventually affect marriage therapy practice

🛠️ Applied research

Applied research is conducted primarily to address some practical problem.

  • Goal: Solve specific practical problems
  • Example from excerpt: Research on cell phone use and driving was prompted by safety concerns and led to laws limiting the practice
  • Potential indirect contributions: Applied research on cell phones and driving could produce new insights into basic processes of perception, attention, and action

🔀 The blurry boundary

AspectBasic researchApplied research
Primary goalDetailed understandingSolve practical problem
ExampleGender differences in talkingCell phone use while driving
Indirect benefitMay inform practice laterMay reveal basic processes

Don't confuse: The distinction is convenient but not always clear-cut—basic research can have practical applications, and applied research can advance theoretical understanding.

🧠 Why common sense fails

🤔 Folk psychology limitations

Folk psychology: Intuitive beliefs we all have about people's behavior, thoughts, and feelings.

The problem: Although much folk psychology is probably reasonably accurate, much of it is not.

Examples of incorrect common beliefs from the excerpt:

Common beliefWhat research shows
Anger can be relieved by "letting it out" (punching/screaming)This approach leaves people feeling more angry, not less
No one would confess to a crime they didn't commit (unless tortured)False confessions are surprisingly common and occur for various reasons
People use only 10% of their brain powerIncorrect
Most people experience a midlife crisis in their 40s or 50sIncorrect
Students learn best when teaching styles match learning stylesIncorrect
Low self-esteem is a major cause of psychological problemsIncorrect
Psychiatric admissions and crimes increase during full moonsIncorrect

🧩 Why we get it wrong

The excerpt identifies several contributing factors (based on psychological research):

Cognitive limitations:

  • Forming detailed and accurate beliefs requires powers of observation, memory, and analysis beyond what we naturally possess
  • Example: We cannot mentally count words spoken by women and men we encounter, estimate daily totals, average them, and compare—all in our heads

Mental shortcuts (heuristics):

  • We rely on shortcuts because detailed tracking is impossible
  • If a belief is widely shared, endorsed by "experts," and makes intuitive sense, we assume it's true

⚠️ Confirmation bias

Confirmation bias: The tendency to focus on cases that confirm our intuitive beliefs and ignore cases that disconfirm them.

How it works:

  • Once we believe women are more talkative than men, we notice and remember talkative women and silent men
  • We ignore or forget silent women and talkative men
  • This selective attention reinforces the incorrect belief

Wishful thinking:

  • We hold incorrect beliefs partly because it would be nice if they were true
  • Example: Many believe calorie-reducing diets are effective long-term treatments for obesity, but thorough scientific review shows they are not
  • People may continue believing in dieting effectiveness because it gives them hope (if obese) or makes them feel better about their own situation

🔍 The role of skepticism

The excerpt emphasizes that scientific psychology requires skepticism—questioning intuitive beliefs and testing them empirically rather than accepting them based on common sense, expert endorsement, or wishful thinking.

Why scientific method matters:

  • Natural curiosity led to science, which became the best way to achieve detailed and accurate knowledge
  • Most phenomena and theories in psychology textbooks are products of scientific research
  • Examples: Specific cortical areas for language/perception, principles of conditioning, biases in reasoning/judgment, tendency to obey authority
  • What we know now only scratches the surface of what we can know
3

Science and Common Sense

Science and Common Sense

🧭 Overview

🧠 One-sentence thesis

Psychology must rely on scientific research rather than common sense because many widely held intuitive beliefs about human behavior turn out to be incorrect, and people naturally lack the cognitive abilities to form accurate beliefs without systematic observation.

📌 Key points (3–5)

  • Folk psychology limitations: Intuitive beliefs about behavior (folk psychology) are often inaccurate, as shown by research disproving common assumptions about anger release, false confessions, and brain usage.
  • Why we're wrong: Humans lack natural powers of detailed observation and memory, rely on mental shortcuts (heuristics), and fall prey to confirmation bias—noticing evidence that supports existing beliefs while ignoring contradictory cases.
  • Skepticism vs cynicism: Scientific skepticism means pausing to consider alternatives and search for evidence, not distrusting everything; it involves evaluating claims when stakes are high enough.
  • Common confusion: Skepticism ≠ cynicism or questioning every claim; it's selective critical thinking applied when warranted.
  • Tolerance for uncertainty: Scientists accept not knowing answers and view gaps in knowledge as research opportunities rather than problems.

🧩 Why common sense fails

🧩 Folk psychology errors

Folk psychology: intuitive beliefs people hold about behavior, thoughts, and feelings.

The excerpt shows folk psychology is "probably reasonably accurate" in some areas but clearly wrong in many others.

Examples of disproven beliefs:

  • Venting anger by punching or screaming reduces anger → research shows it leaves people feeling more angry
  • People only confess to crimes they committed (unless tortured) → research shows false confessions are surprisingly common
  • People use only 10% of brain power → myth
  • Most people have midlife crises in their 40s-50s → myth
  • Low self-esteem causes psychological problems → myth
  • Full moons increase psychiatric admissions and crimes → myth

🔍 Cognitive limitations

The excerpt identifies why detailed, accurate beliefs are hard to form naturally:

  • Forming accurate beliefs requires observation, memory, and analysis powers "to an extent that we do not naturally possess"
  • Example scenario: It would be "nearly impossible" to count words spoken by women and men you encounter, estimate daily averages, and compare them mentally
  • This is why people rely on mental shortcuts (heuristics) instead of systematic observation

🎯 How errors persist

🎯 Confirmation bias mechanism

Confirmation bias: the tendency to focus on cases that confirm intuitive beliefs while ignoring cases that disconfirm them.

How it works:

  • Once you believe something (e.g., "women are more talkative than men"), you notice and remember confirming cases (talkative women, silent men)
  • You ignore or forget disconfirming cases (silent women, talkative men)
  • This creates a self-reinforcing cycle that maintains incorrect beliefs

🎯 Other maintenance factors

Why we hold onto wrong beliefs:

FactorMechanism
Social validationIf a belief is widely shared or endorsed by "experts," we assume it's true
Intuitive appealIf it "makes intuitive sense," we accept it without evidence
Wishful thinkingWe believe things because "it would be nice if they were true"

Example from excerpt: People continue believing calorie-reducing diets work long-term for obesity despite scientific evidence showing they don't—partly because it gives hope or makes them feel good about their self-control.

🔬 The scientific alternative

🔬 Skepticism as attitude

Skepticism: pausing to consider alternatives and search for evidence—especially systematically collected empirical evidence—when there is enough at stake to justify doing so.

What skepticism is NOT:

  • Not cynicism or distrustfulness
  • Not questioning every belief or claim (which would be impossible)

What skepticism IS:

  • Selective critical evaluation when stakes warrant it
  • Asking whether alternative explanations exist
  • Checking what evidence supports a claim
  • Evaluating source credibility (is the author a scientific researcher? is evidence cited?)

📖 Skepticism in practice

Example from excerpt: If you read that weekly allowances help children develop financial responsibility:

  1. Pause to consider alternatives: Maybe allowances teach children to spend money or become materialistic instead
  2. Ask about evidence: What supports the claim? Is scientific evidence cited?
  3. Search the literature: If important enough, look for research studies on the topic

🤷 Tolerance for uncertainty

Tolerance for uncertainty: accepting that there are many things scientists simply do not know.

  • Scientists accept not having complete answers
  • They withhold judgment when insufficient evidence exists
  • Example: No scientific evidence exists that allowances make children financially responsible or materialistic
  • From a practical perspective, uncertainty is problematic (hard to decide what to do)
  • From a scientific perspective, uncertainty is exciting (creates research opportunities)

Don't confuse: Tolerance for uncertainty ≠ giving up on finding answers; it means recognizing current knowledge gaps while remaining open to future evidence.

🧪 Scientists' self-awareness

🧪 Vulnerability to bias

The excerpt emphasizes that "scientists—especially psychologists—understand that they are just as susceptible as anyone else to intuitive but incorrect beliefs."

  • This recognition is why they cultivate skepticism
  • They don't assume their training makes them immune to cognitive biases
  • They build systematic safeguards (empirical evidence collection) into their methods
4

Science and Clinical Practice

Science and Clinical Practice

🧭 Overview

🧠 One-sentence thesis

Clinical psychology must rely on scientific research rather than intuition to diagnose and treat psychological problems effectively, using empirically supported treatments that have been systematically tested.

📌 Key points (3–5)

  • Clinical practice as applied science: Clinical psychology applies scientific research to help people with psychological disorders, not just intuition or "art."
  • Psychological problems are empirically testable: Questions about disorders, their causes, and treatment effectiveness can be answered through systematic observation and scientific study.
  • Empirically supported treatments: Treatments proven effective through research (e.g., cognitive behavioral therapy, exposure therapy) outperform no treatment, placebos, or alternatives.
  • Common confusion: Folk psychology vs. evidence—plausible-sounding claims (e.g., adult children of alcoholics have distinct personality profiles) are often contradicted by research.
  • Why scientific literacy matters for clinicians: Even clinicians who don't conduct research must read and evaluate studies to base treatment decisions on the best available evidence.

🔬 The scientific foundation of clinical practice

🧪 Psychological problems as natural phenomena

Clinical practice of psychology: the diagnosis and treatment of psychological disorders and related problems.

  • Psychological disorders are part of the natural world, making questions about their nature, causes, and consequences empirically testable.
  • We cannot rely on intuition or common sense for accurate answers about behavioral problems.
  • The same scientific approach used for other human behavior questions applies to clinical issues.

🚫 When intuition misleads

The excerpt provides a concrete example of how plausible beliefs fail under scientific scrutiny:

  • Popular claim: Adult children of alcoholics have a distinct personality profile (low self-esteem, powerlessness, difficulty with intimacy).
  • Research finding: Scientific studies show adult children of alcoholics are no more likely to have these problems than anyone else.
  • Lesson: Even claims that "sound plausible" and appear in dozens of books and thousands of websites can be wrong.

Example: An organization might design support programs based on the assumption that adult children of alcoholics need special interventions for intimacy issues, but research shows this targeting would be misguided.

🧪 Testing treatment effectiveness

Questions about psychotherapy effectiveness are empirically testable through systematic observation:

  • The test: Does a group receiving a particular psychotherapy improve more than a similar group that does not receive it (or receives an alternative)?
  • The standard: Improvement must be measured systematically, not based on clinician impressions or client testimonials.
  • If systematic observation shows greater improvement, the treatment earns support; if not, it doesn't.

💊 Empirically supported treatments

📋 What makes a treatment empirically supported

Empirically supported treatment: one that has been studied scientifically and shown to result in greater improvement than no treatment, a placebo, or some alternative treatment.

  • The treatment must be compared against a control condition (no treatment, placebo, or alternative).
  • Many forms of psychotherapy have strong empirical support and can be as effective as standard drug therapies.
  • This is not about theory or clinical experience alone—it's about demonstrated outcomes in systematic studies.

🗂️ Examples of supported treatments

The excerpt lists specific therapies with strong evidence for particular disorders:

Treatment typeEffective for
Cognitive behavioral therapyDepression, panic disorder, bulimia nervosa, post-traumatic stress disorder
Exposure therapyPost-traumatic stress disorder
Behavioral therapyDepression
Behavioral couples therapyAlcoholism and substance abuse
Exposure therapy with response preventionObsessive-compulsive disorder
Family therapySchizophrenia

Don't confuse: "Empirically supported" does not mean "the only valid approach" or "works for everyone," but it does mean the treatment has passed systematic testing showing it works better than alternatives on average.

🏥 The debate in clinical psychology

⚖️ Competing views on scientific emphasis

The clinical psychology community has internal disagreement about the role of science:

One side argues:

  • The field has not paid enough attention to scientific research.
  • Clinicians fail to use empirically supported treatments often enough.
  • Changes are needed in how clinicians are trained and how treatments are evaluated and implemented.

The other side argues:

  • These claims are exaggerated.
  • The suggested changes are unnecessary.

🤝 Common ground

Despite disagreement about the extent of the problem, both sides agree on a fundamental point:

  • A scientific approach to clinical psychology is essential for diagnosing and treating psychological problems based on detailed and accurate knowledge.
  • Scientific research in clinical psychology must continue.
  • Clinicians who never conduct research themselves must be scientifically literate.

📚 Why scientific literacy matters for all clinicians

Even clinicians who focus purely on practice need scientific skills:

  • To read new research: Stay current with emerging findings about disorders and treatments.
  • To evaluate research quality: Distinguish strong studies from weak ones.
  • To make evidence-based decisions: Base treatment choices on the best available evidence, not just tradition or intuition.

Example: A clinician who is not scientifically literate might continue using a treatment that feels effective based on personal experience, even when research shows a different treatment produces better outcomes.

Don't confuse: Being scientifically literate does not mean clinicians must conduct studies themselves; it means they must be able to understand and apply research findings to their practice.

5

Basic Concepts in Psychological Research

Basic Concepts

🧭 Overview

🧠 One-sentence thesis

Research questions in psychology examine statistical relationships between variables, but correlation alone cannot prove causation because of directionality and third-variable problems that only experiments can resolve.

📌 Key points (3–5)

  • Variables are the foundation: research questions ask about quantities or qualities (quantitative or categorical) that vary across people or situations.
  • Two basic relationship forms: differences between group means and correlations between quantitative variables.
  • Correlation ≠ causation: a statistical relationship between X and Y does not prove X causes Y—Y might cause X, or a third variable Z might cause both.
  • Common confusion: journalists often misinterpret correlations as causal relationships; only experiments can establish causation by manipulating the independent variable.
  • Sampling matters: researchers study samples to draw conclusions about populations, ideally using random sampling (though convenience sampling is more common in psychology).

🔬 What variables are and how they're measured

🔬 Defining variables

A variable is a quantity or quality that varies across people or situations.

  • Not a fixed trait—it must differ from person to person or situation to situation.
  • Example: student height varies; chosen major varies (as long as not everyone has the same major).

📊 Two types of variables

TypeWhat it measuresHow it's measuredExamples
QuantitativeA quantityAssigning numbersHeight, talkativeness level, number of siblings
CategoricalA qualityAssigning category labelsMajor (Psychology, English, Nursing), nationality, occupation

📏 Operational definitions

An operational definition defines a variable in terms of precisely how it is to be measured.

  • Most variables can be operationally defined in many different ways.
  • Example: depression can be measured as scores on the Beck Depression Inventory, number of symptoms, or whether diagnosed with major depressive disorder.
  • When measured for an individual, the result is a score; a set of scores is data (plural).

👥 Populations, samples, and measurement

🌍 Population vs sample

The population is the very large group researchers want to draw conclusions about; the sample is the small subset actually studied.

  • Researchers study samples but want to generalize to populations.
  • Example: measuring talkativeness in a few hundred university students to draw conclusions about men and women in general.
  • Key requirement: the sample should be representative—similar to the population in important respects.

🎲 Sampling methods

Random sampling:

  • Every population member has an equal chance of selection.
  • Example: selecting 100 registered voters randomly from a city's voter list.
  • Difficult or impossible in most psychological research because populations are less clearly defined.

Convenience sampling:

  • The sample consists of individuals who happen to be nearby and willing to participate.
  • Example: introductory psychology students.
  • Problem: the sample might not be representative of the population.

📈 Statistical relationships between variables

📈 What a statistical relationship means

A statistical relationship exists between two variables when the average score on one differs systematically across the levels of the other.

  • Not about behaviors in isolation—tells us about potential causes, consequences, development, and organization.
  • Example: average exam score is higher among students who took notes longhand versus on laptop.

🔀 Two basic forms

Form 1: Differences between groups

  • Comparing mean scores of two (or more) groups on a variable of interest.
  • Example questions: Are women more talkative than men? Do people on cell phones have poorer driving abilities?
  • Usually described by giving mean score and standard deviation for each group.
  • Presented in bar graphs where bar heights represent group means.

Form 2: Correlations between quantitative variables

  • Average score on one variable differs systematically across levels of another.
  • Example questions: Is happiness associated with talkativeness? Does psychotherapy effectiveness depend on how much the patient likes the therapist?
  • Presented using scatterplots where each point represents one person's scores on both variables.

📉 Positive vs negative relationships

Positive relationship:

  • Higher scores on one variable tend to be associated with higher scores on the other.
  • Example: people under more stress tend to have more physical symptoms.

Negative relationship:

  • Higher scores on one variable tend to be associated with lower scores on the other.
  • Example: higher stress is associated with lower immune system functioning.

📐 Pearson's r statistic

  • Measures the strength of correlation between quantitative variables.
  • Ranges from −1.00 (strongest possible negative relationship) to +1.00 (strongest possible positive relationship).
  • A value of 0 means no relationship.
  • As the value moves toward −1.00 or +1.00, points on a scatterplot come closer to falling on a single straight line.
  • Limitation: only good for linear relationships; not appropriate for nonlinear relationships (where points are better approximated by a curved line).

⚠️ Why correlation does not imply causation

🧩 Independent and dependent variables

The independent variable (X) is thought to be the cause; the dependent variable (Y) is thought to be the effect.

  • Understanding causal relationships allows us to change behavior in predictable ways.
  • Example: psychotherapy (independent variable) causes reduction in depressive symptoms (dependent variable).
  • But not all statistical relationships reflect causal relationships.

🔄 The directionality problem

  • Two variables X and Y can be statistically related because X causes Y or because Y causes X.
  • Example: exercise and happiness are statistically related—people who exercise are happier on average.
    • Does exercising cause happiness?
    • Or does happiness cause exercise (perhaps by giving people more energy or leading them to seek social opportunities at the gym)?
  • The correlation alone cannot tell us which direction the causation flows.

🎯 The third-variable problem

  • Two variables X and Y can be statistically related because some third variable Z causes both X and Y.
  • Example: nations with more Nobel prizes have higher chocolate consumption (Pearson's r = 0.79).
    • This does not mean eating chocolate causes Nobel prizes.
    • Geography (third variable) may explain both: European countries have higher chocolate consumption and invest more in education and technology.
  • Example: exercise and happiness could both be caused by physical health (third variable).

🔬 How experiments solve these problems

An experiment is a study in which the researcher manipulates the independent variable.

  • Instead of measuring how much people exercise, a researcher randomly assigns half to run on a treadmill for 15 minutes and half to sit on a couch.
  • Why this matters:
    • Eliminates directionality problem: moods cannot affect how much they exercised (the researcher determined that).
    • Eliminates third-variable problem: a third variable cannot affect both exercise and mood (again, the researcher controlled exercise).
  • Experiments allow researchers to draw firm conclusions about causal relationships.

🚨 Don't confuse correlation with causation

  • Many journalists misinterpret correlations as showing causation.
  • Example headline: "Lots of Candy Could Lead to Violence" (based on a study showing children who ate candy daily were more likely to be arrested for violent offenses later).
  • Alternative explanations might exist (third variables, reverse causation).
  • Always ask: Could the headline variable really "lead to" the outcome, or are there other explanations?
6

Generating Good Research Questions

Generating Good Research Questions

🧭 Overview

🧠 One-sentence thesis

Good research questions arise from systematic thinking strategies—drawing on observations, practical problems, and prior research—and must be evaluated for both interestingness (whether the answer is in doubt, fills a gap, and has practical implications) and feasibility before pursuing.

📌 Key points (3–5)

  • Where ideas come from: informal observations, practical problems, and previous research are the three most common sources of inspiration.
  • How to generate testable questions: turn general ideas into empirical questions by asking about frequency, causes, effects, types of people, and types of situations.
  • What makes questions interesting: the answer must be in doubt, fill a gap in research literature, and have important practical implications.
  • Common confusion: a question that hasn't been studied isn't automatically interesting—there must be reasonable doubt about the answer, not just an obvious "yes."
  • Feasibility matters: time, money, equipment, technical skill, and access to participants all constrain which questions researchers can actually answer.

💡 Sources of research inspiration

💡 Informal observations

  • Direct observations of your own and others' behavior, plus secondhand observations from newspapers, books, and blogs.
  • Example: noticing you always seem to be in the slowest grocery store line, or reading about donations to a family whose house burned down.
  • These everyday observations can spark famous research—Stanley Milgram's obedience studies were inspired by journalistic reports of Nazi war criminal trials.

🔧 Practical problems

  • Real-world issues in law, health, education, and sports lead directly to applied research.
  • Example questions: Does taking notes by hand improve exam performance? How effective is psychotherapy versus drug therapy for depression? Do cell phones impair driving ability?

📚 Previous research

The most common inspiration for new research ideas is previous research.

  • Science is large-scale collaboration where researchers read each other's work and build on it.
  • Novice researchers can consult experienced researchers or browse professional journals.
  • Reading titles and abstracts in journals like Psychological Science exposes you to diverse topics.
  • Don't confuse: this isn't copying—it's finding gaps and unanswered questions in existing work.

🔬 Turning ideas into testable questions

🔬 What makes a question empirically testable

  • Must be expressed in terms of a single variable or a relationship between variables.
  • One strategy: look at the discussion section of recent articles, where researchers suggest directions for future research.

🎯 Starting with a single variable

If you have a behavior or characteristic in mind, first ask about frequency or intensity:

  • How many words do people speak per day on average?
  • How accurate are memories of traumatic events?
  • What percentage of people have sought help for depression?

🔗 Expanding to relationships between variables

Ask yourself these general questions and write down all possible answers:

  • What are possible causes of the behavior?
  • What are possible effects of the behavior?
  • What types of people might show more or less of it?
  • What types of situations might elicit more or less of it?

Each answer becomes a second variable, suggesting a statistical relationship question.

Example: If interested in talkativeness, you might ask whether family size causes it, or whether same-sex groups elicit more talkativeness than mixed-sex groups.

🔄 Refining questions already studied

If a question has been studied, don't give up—refine it by asking:

  • Are there other ways to operationally define the variables?
  • Are there types of people for whom the relationship might be stronger or weaker?
  • Are there situations (including practically important ones) where the relationship differs?

Example: Research shows women and men speak similar numbers of words per day among university students in the U.S. and Mexico—but you could study elderly people, other cultures, or different measures of talkativeness (number of different people spoken to).

⭐ Evaluating interestingness

⭐ Why some questions aren't interesting

Questions like "Do people feel pain when punched in the jaw?" or "Are women more likely to wear makeup than men?" are not interesting even though they're easy to study.

Interestingness means interesting to people generally and especially to the scientific community, not just personally.

❓ The answer must be in doubt

  • Questions already answered by research are no longer interesting.
  • But unanswered questions must have reasonable chance of surprising us.
  • Strategy: try to think of reasons to expect different answers, especially ones conflicting with common sense.
  • If you can think of reasons for at least two different answers, the question might be interesting.
  • If only one answer seems possible, it probably isn't interesting.

Example: "Are women more talkative than men?" is interesting because the stereotype suggests yes, but similar verbal abilities suggest no.

📖 Must fill a gap in research literature

  • The question should be natural for people familiar with existing research.
  • It's not just "hasn't been answered"—it should logically follow from what's already known.
  • Example: whether taking notes by hand improves exam performance naturally follows from research on notetaking and shallow processing.

🌍 Must have practical implications

  • The answer should matter for real-world decisions or policies.
  • Example: whether cell phone use impairs driving affects personal safety and legal debates about restrictions.
  • Example: notetaking research has implications for classroom technology policies.

🛠️ Evaluating feasibility

🛠️ Factors affecting feasibility

Researchers must consider whether they can actually complete the study:

FactorWhat it includes
TimeHow long the study will take
MoneyFunding for materials, participants, equipment
Equipment & materialsAccess to necessary tools
Technical knowledgeSkills needed to conduct the study
Research participantsAbility to recruit the right people

🎓 Complexity isn't required

  • Professional journals show both complicated studies (longitudinal designs, neuroimaging, complex statistics) and simple studies.
  • Complex research is often done by teams with government or private grants.
  • Simple studies with convenience samples (e.g., university students with paper-and-pencil tasks) can produce important results.
  • Don't confuse: difficulty doesn't equal importance.

♻️ Use tried-and-true methods

  • Generally good practice to use methods already used successfully by other researchers.
  • Example: to make people happy, use approaches proven by others (like paying a compliment).
  • Benefits: ensures feasibility (the approach works) and provides continuity with previous research, making results easier to compare and interpret.
7

Reviewing the Research Literature

Reviewing the Research Literature

🧭 Overview

🧠 One-sentence thesis

Reviewing the research literature early in the research process helps you refine your question, evaluate its interestingness, learn appropriate methods, and understand how your study fits into existing knowledge.

📌 Key points (3–5)

  • What the research literature is: published research in professional journals and scholarly books, not pop psychology, websites, or Wikipedia.
  • Why review early: it refines your question, tells you if it's already answered, evaluates interestingness, suggests methods, and shows how your work fits in.
  • How to search: use databases like PsycINFO, follow reference lists, search cited-by links, and consult experts.
  • Common confusion: not all published sources count—self-help books and encyclopedia entries are unreliable because they lack peer review and formal expertise.
  • What to prioritize: recent work (past 5 years), review articles for overview, empirical reports for methods, and classic articles cited repeatedly.

📚 What counts as research literature

📖 Definition and scope

The research literature in any field is all the published research in that field.

  • In psychology, this means millions of scholarly articles and books dating back to the field's beginning.
  • The boundaries are somewhat fuzzy, but the literature consists almost entirely of two types: articles in professional journals and scholarly books.

❌ What does NOT count

The excerpt explicitly excludes:

  • Self-help and pop psychology books
  • Dictionary and encyclopedia entries
  • Websites intended for the general public
  • Wikipedia

Why these are excluded:

  • They are not reviewed by other researchers.
  • They are often based on little more than common sense or personal experience.
  • Wikipedia's authors are anonymous, may lack formal training, and the content continually changes.

Don't confuse: "published" does not mean "part of the research literature"—only peer-reviewed scholarly sources count.

📰 Types of sources in the literature

📰 Professional journals

Professional journals are periodicals that publish original research articles.

  • Thousands exist in psychology and related fields.
  • Published monthly or quarterly in issues; issues are organized into volumes (usually one calendar year).
  • Available in hard copy only, both hard copy and electronic, or electronic only.

Two basic article types:

TypeWhat it does
Empirical research reportsDescribe one or more new studies: introduce a question, explain why it's interesting, review previous research, describe method and results, draw conclusions.
Review articlesSummarize previously published research on a topic and usually present new ways to organize or explain results. When devoted primarily to presenting a new theory, called a theoretical article.

🔍 Double-blind peer review

Most professional journals in psychology use this process:

  1. Researchers submit a manuscript to the editor (an established researcher).
  2. The editor sends it to two or three experts on the topic.
  3. Each reviewer reads the manuscript, writes a critical but constructive review, and sends it back with recommendations.
  4. The editor decides: accept, ask for revisions and resubmission, or reject outright.
  5. The editor forwards reviewers' comments to the researchers so they can revise.

Why "double-blind":

  • Reviewers do not know the identity of the researcher(s).
  • Researchers do not know the identity of the reviewers.

Purpose: ensures the work meets basic standards of the field before entering the research literature.

Recent variation: some newer open-access journals (e.g., Frontiers in Psychology) use open peer review—reviewers' identities remain concealed during review but are published alongside the article to increase transparency and accountability.

📕 Scholarly books

Scholarly books are books written by researchers and practitioners mainly for use by other researchers and practitioners.

Two main types:

TypeDescription
MonographWritten by a single author or small group; gives a coherent presentation of a topic, much like an extended review article.
Edited volumesHave an editor or small group of editors who recruit many authors to write separate chapters on different aspects of the same topic. Can be coherent, but not unusual for each chapter to take a different perspective or for authors to openly disagree.
  • Scholarly books undergo a peer review process similar to professional journals.

🔎 How to search the literature

🗄️ Using PsycINFO and other databases

Primary method: use electronic databases.

Examples of databases:

  • Academic Search Premier, JSTOR, ProQuest (all academic disciplines)
  • ERIC (education)
  • PubMed (medicine and related fields)
  • PsycINFO (produced by the APA)—the most important for psychology

PsycINFO is so comprehensive—covering thousands of professional journals and scholarly books going back more than 100 years—that for most purposes its content is synonymous with the research literature in psychology.

🧩 How PsycINFO works

Structure:

  • Individual records for each article, book chapter, or book.
  • Each record includes: basic publication information, an abstract or summary, and a list of other works cited.
  • Each record also contains lists of keywords and index terms.

Index terms are especially helpful because they are standardized:

  • Research on differences between women and men is always indexed under "Human Sex Differences."
  • Research on notetaking is always indexed under "Learning Strategies."
  • If you don't know the appropriate index terms, PsycINFO includes a thesaurus to help you find them.

🎯 Search strategy tips

The challenge: nearly four million records in PsycINFO require careful search term selection.

Example scenario: You want to know if women and men differ in ability to recall experiences from when they were very young.

Search termResultsProblem
"memory for early experiences"6 recordsToo few, most not relevant
"memory"149,777 recordsFar too many to look through
"early memories" (from thesaurus)1,446 recordsStill too many
"early memories" + "human sex differences"37 articlesManageable, many highly relevant

Strategy: try a variety of search terms in different combinations and at different levels of specificity.

🔗 Other search techniques

Beyond entering search terms into databases:

  1. Follow reference lists: If you have one good article (especially a recent review article), look through its reference list for other relevant works. Do this with any relevant article or book chapter you find.

  2. Use cited-by links: Start with a classic article, find its record in PsycINFO, and link to a list of other works that cite that classic article. This works because other researchers working on your topic are likely aware of the classic article and cite it.

  3. General Internet search: Use search terms related to your topic or the name of a researcher. This might lead directly to works in the research literature (e.g., articles in open-access journals or posted on researchers' websites). Google Scholar is especially useful for this purpose. A general Internet search might also lead to websites that provide references to works that are part of the research literature.

  4. Talk to people: Your instructor or other faculty members in psychology who know something about your topic can suggest relevant articles and book chapters.

💾 Accessing full text

Depending on the vendor that provides the interface to PsycINFO:

  • You may be able to save, print, or e-mail the relevant records.
  • Records might contain links to full-text copies of the works.
  • PsycARTICLES is a database that provides full-text access to articles in all journals published by the APA.
  • If not available electronically, find out if your library carries the journal or book in hard copy. Ask a librarian if you need help.

🎯 What to search for

🎯 Be selective

Core principle: Not every article, book chapter, and book that relates to your research idea will be worth obtaining, reading, and integrating.

Focus on sources that help you do four basic things:

  1. Refine your research question
  2. Identify appropriate research methods
  3. Place your research in the context of previous research
  4. Write an effective research report

📅 Focus on recent research

What counts as "recent" depends on the topic:

  • For newer topics actively being studied: published in the past year or two.
  • For older topics receiving less attention: within the past 10 years.
  • Good general rule: start with sources published in the past five years.

Main exception: classic articles that turn up in the reference list of nearly every other source. If other researchers think this work is important, even though it is old, include it in your review.

📊 Types of sources to prioritize

Source typeWhy it's useful
Review articlesProvide a useful overview—often discussing important definitions, results, theories, trends, and controversies—giving you a good sense of where your own research fits.
Empirical research reportsAddress your question or similar questions; give you ideas about how to operationally define your variables and collect your data. As a general rule, it is good to use methods that others have already used successfully unless you have good reasons not to.
Sources that argue for interestingnessProvide information that can help you argue for the interestingness of your research question. Example: for a study on effects of cell phone use on driving ability, look for information about how widespread cell phone use is, how frequent and costly motor vehicle crashes are, etc.

🔢 How many sources are enough?

Difficult to answer—depends on:

  • How extensively your topic has been studied
  • Your own goals

Benchmark: One study found that across a variety of professional journals in psychology, the average number of sources cited per article was about 50.

  • This gives a rough idea of what professional researchers consider adequate.
  • As a student, you might be assigned a much lower minimum number of references, but the principles for selecting the most useful ones remain the same.

🎓 Why review the literature early

🎓 Five key reasons

The excerpt emphasizes reviewing the research literature early in the research process:

  1. Refine your research question: It can help you turn a research idea into an interesting research question.

  2. Avoid duplication: It can tell you if a research question has already been answered.

  3. Evaluate interestingness: It can help you evaluate the interestingness of a research question.

  4. Learn methods: It can give you ideas for how to conduct your own study.

  5. Contextualize your work: It can tell you how your study fits into the research literature.

📝 Connection to writing

An empirical research report written in American Psychological Association (APA) style always includes a written literature review.

  • Reviewing early prepares you to write an effective research report.
  • The review is not just a formality—it is integral to the research process from the beginning.
8

Moral Foundations of Ethical Research

Moral Foundations of Ethical Research

🧭 Overview

🧠 One-sentence thesis

Ethical research in psychology requires balancing three core moral principles—respect for persons, concern for welfare, and justice—across three affected groups: research participants, the scientific community, and society at large.

📌 Key points (3–5)

  • Three core principles: Respect for persons (autonomy and informed consent), concern for welfare (weighing risks vs. benefits), and justice (fair treatment and equitable distribution).
  • Three affected groups: Every ethical decision must consider impacts on research participants, the scientific community, and society more generally.
  • Ethical conflict is unavoidable: Research always involves tradeoffs between risks and benefits, and what helps one group may harm another.
  • Common confusion: Informed consent vs. mere agreement—participants must be told everything that might reasonably affect their decision, not just asked to sign a form.
  • Responsibility framework: Researchers must minimize risks, explain their ethical decisions, seek feedback, and ultimately take responsibility for their choices.

🧩 The three core principles

🧩 Respect for persons

Respect for persons includes respecting the autonomy of research participants by ensuring free, informed, and ongoing consent, as well as protecting those incapable of exercising autonomy.

What autonomy means:

  • The right to make one's own choices and take one's own actions free from coercion.
  • Participants must be able to decide for themselves whether to participate, without pressure or manipulation.

Informed consent requirement:

  • Researchers must obtain and document agreement to participate after informing participants of everything that might reasonably affect their decision.
  • Example: In the Tuskegee study, men were told they were being treated for "bad blood" but were not told they had syphilis or that treatment would be withheld—this was not true informed consent.
  • Consent must be voluntary and ongoing; researchers must inform participants of any changes that might affect their willingness to continue.

🛡️ Vulnerable populations

  • Some groups cannot grant consent due to age or capacity: children, adults with cognitive impairments, coma patients.
  • For children: If they can understand language, researchers ask for their assent after receiving consent from parents or guardians.
  • For adults with diminished capacity: Extra measures are needed to protect their interests and well-being.
  • Don't confuse: Research with vulnerable populations can be valuable, but requires additional safeguards, not blanket prohibition.

⚖️ Concern for welfare

Concern for welfare includes ensuring participants are not exposed to unnecessary risks, considering privacy, maintaining confidentiality, and providing enough information to assess risks and benefits.

The risk-benefit calculation:

  • Research is ethical only if risks are outweighed by benefits.
  • Risks to participants: treatment failure, physical or psychological harm, privacy violations.
  • Benefits to participants: helpful treatment, learning, satisfaction of contributing, compensation.
  • The challenge: Risks and benefits may not be directly comparable—risks often fall on participants while benefits accrue to science or society.

Example from Milgram's obedience study:

  • Participants believed they were administering real electric shocks to another person.
  • Many showed extreme stress: sweating, trembling, stuttering, nervous laughter, even convulsive seizures.
  • One participant was "reduced to a twitching, stuttering wreck" within 20 minutes.
  • The scientific finding was important for understanding obedience to authority, but came at the cost of severe psychological stress to participants.

🔒 Privacy and confidentiality

  • Privacy: Participants' right to decide what information about them is shared with others.
  • Confidentiality: An agreement not to disclose participants' personal information without their consent or appropriate legal authorization.
  • Researchers must maintain confidentiality as part of respecting welfare.

⚖️ Justice

Justice refers to the obligation to treat people fairly and equitably, including considering vulnerability and ensuring historically marginalized groups are not unjustly excluded from research opportunities.

Fair treatment of participants:

  • Adequate compensation for participation.
  • Benefits and risks distributed fairly across all participants.
  • Example: If a new psychotherapy proves effective, it would be fair to offer it to control group participants after the study ends.

Historical injustices:

  • Some groups have historically faced more than their fair share of research risks: institutionalized people, disabled people, racial or ethnic minorities.
  • The Tuskegee study targeted poor African American men who were particularly vulnerable due to their status in society.
  • Researchers must now consider justice and fairness at the societal level.

🔍 Acting with integrity

🔍 What integrity requires

  • Researchers must act responsibly and with integrity: carry out research thoroughly and competently, meet professional obligations, and be truthful.
  • Why integrity matters: It promotes trust, which is essential for all effective human relationships.
  • Participants must trust that researchers are honest, will keep promises (like maintaining confidentiality), and will maximize benefits while minimizing risks.

🎭 The deception dilemma

  • Some research questions are difficult or impossible to answer without deceiving participants.
  • Example: Milgram's study required participants to believe they were administering real shocks; telling them the truth beforehand would have made the study impossible.
  • The conflict: Acting with integrity can conflict with doing research that advances scientific knowledge and benefits society.
  • This creates unavoidable ethical tension that researchers must navigate carefully.

🔬 Trust in the scientific community

  • The scientific community and society must trust that researchers conducted their research thoroughly and reported honestly.
  • When trust is violated: The fraudulent MMR-autism study led other researchers to waste resources on unnecessary follow-up research and caused people to avoid vaccines, putting children at risk.
  • Other examples of fraud mentioned: fabricated data in studies on gay marriage attitudes, with false claims about grants, awards, and ethical approval.

🤝 The framework in practice

🤝 Three groups, three principles

The excerpt presents a framework using a table structure:

Core PrincipleResearch ParticipantsScientific CommunitySociety
Respect for persons(considerations)(considerations)(considerations)
Concern for welfare(considerations)(considerations)(considerations)
Justice(considerations)(considerations)(considerations)

How to use the framework:

  • A thorough ethical consideration must examine how each of the three core principles applies to each of the three groups of people.
  • This ensures no important ethical dimension is overlooked.

⚠️ Unavoidable ethical conflict

Why conflict is inevitable:

  • Almost no psychological research is completely risk-free, so there will always be conflict between risks and benefits.
  • Research beneficial to one group (e.g., scientific community) can be harmful to another (e.g., participants).
  • Being completely truthful with participants can make it difficult or impossible to conduct scientifically valid studies on important questions.

Example of disagreement:

  • A study on "personal space" secretly observed men in a public restroom to see if urination took longer when another man was nearby.
  • Some critics found this an unjustified assault on human dignity.
  • The researchers had carefully considered the ethical conflicts and concluded benefits outweighed risks (they had interviewed preliminary participants who were not bothered by being observed).
  • The point: Competent and well-meaning researchers can disagree about how to resolve ethical conflicts.

🛠️ How to deal with ethical conflict responsibly

Even though ethical conflict cannot be eliminated completely, researchers can deal with it constructively:

  1. Think thoroughly: Carefully consider all ethical issues raised by the research.
  2. Minimize risks: Reduce risks as much as possible.
  3. Weigh risks against benefits: Make a reasoned judgment about whether benefits justify risks.
  4. Explain decisions: Be able to articulate ethical decisions to others.
  5. Seek feedback: Get input from others on ethical choices.
  6. Take responsibility: Ultimately accept responsibility for ethical decisions made.

📜 Historical context and accountability

The Tuskegee apology (1997):

  • 65 years after the study began and 25 years after it ended, US President Bill Clinton formally apologized.
  • Key excerpt: "Men who were poor and African American, without resources and with few alternatives, they believed they had found hope when they were offered free medical care by the United States Public Health Service. They were betrayed."
  • This illustrates the long-term harm of unethical research and the importance of accountability.

Milgram's debriefing efforts:

  • To his credit, Milgram went to great lengths to debrief participants and return their mental states to normal.
  • He showed that most participants thought the research was valuable and were glad to have participated.
  • Still, this research would be considered unethical by today's standards.
  • Don't confuse: Good debriefing does not automatically make harmful research ethical.

🎯 Practical implications

🎯 What researchers must do

  • Consider all three principles across all three groups before beginning research.
  • Document informed consent properly—not just signatures, but evidence that participants truly understood what they were agreeing to.
  • Protect vulnerable populations with extra safeguards.
  • Maintain confidentiality and respect privacy.
  • Ensure fair distribution of risks and benefits.
  • Be prepared to justify ethical decisions and accept feedback.

🎯 What this means for research design

  • Some research questions may be too risky to pursue, even if scientifically interesting.
  • Deception may sometimes be necessary, but requires extra justification and thorough debriefing.
  • Researchers must balance scientific validity with ethical constraints—sometimes this means modifying or abandoning a study design.
  • The goal is not to eliminate all risk, but to ensure risks are justified by benefits and minimized as much as possible.
9

From Moral Principles to Ethics Codes

From Moral Principles to Ethics Codes

🧭 Overview

🧠 One-sentence thesis

Ethics codes translate broad moral principles into detailed, enforceable guidelines that help researchers navigate specific ethical dilemmas in psychological research.

📌 Key points (3–5)

  • Historical evolution: Ethics codes developed from the Nuremberg Code (1947) through the Declaration of Helsinki (1964) and the Belmont Report (1978) to today's Tri-Council Policy Statement in Canada.
  • Core structure: The TCPS 2 and APA Ethics Code provide specific standards on informed consent, deception, debriefing, and scholarly integrity, all grounded in the three core principles (respect for persons, concern for welfare, and justice).
  • Institutional oversight: Research ethics boards (REBs) review research protocols to ensure compliance, with different levels of review depending on risk.
  • Common confusion: Informed consent is not just signing a form—it requires genuine understanding and ongoing communication with participants.
  • Practical application: Deception is sometimes permitted when benefits outweigh risks, alternative methods don't exist, and participants are debriefed promptly.

📜 Historical development of ethics codes

📜 Early codes: Nuremberg and Helsinki

  • Nuremberg Code (1947): Created during trials of Nazi physicians who conducted cruel experiments on concentration camp prisoners.

    • Emphasized carefully weighing risks against benefits.
    • Established the fundamental importance of informed consent.
    • Many defendants were convicted and imprisoned or sentenced to death based on violations of these principles.
  • Declaration of Helsinki (1964): Created by the World Medical Council; added new requirements.

    • Introduced the concept of a written protocol (detailed research description).
    • Required independent committee review of research proposals.
    • Has been revised multiple times, most recently in 2013.

🇨🇦 Development in North America

  • Belmont Report (1978): U.S. federal guidelines created in response to concerns about the Tuskegee study and similar research.

    • Explicitly recognized the principle of seeking justice.
    • Emphasized fair distribution of research risks and benefits across different societal groups.
    • Influenced ethical guidelines in both the U.S. and Canada.
  • Tri-Council Policy Statement (TCPS): Canada's formal ethics code for research involving humans.

    • First edition published in 1998, replacing all previous individual agency guidelines.
    • Second edition (TCPS 2) published in 2010 with consolidated principles, clarified guidelines, and updated terminology.
    • "Tri-Council" refers to three federal research granting agencies: SSHRC, CIHR, and NSERC.

🏛️ Institutional oversight and review

🏛️ Research Ethics Boards (REBs)

Research ethics board (REB): a committee responsible for reviewing research protocols for potential ethical problems.

Composition requirements:

  • At least five members with varying backgrounds.
  • At least two members with expertise in relevant research disciplines.
  • At least one member knowledgeable in ethics.
  • At least one community member with no institutional affiliation.

REB responsibilities:

  • Ensure risks are minimized.
  • Verify benefits outweigh risks.
  • Confirm research is conducted fairly.
  • Evaluate adequacy of informed consent procedures.

📋 Levels of review

Review typeWhen it appliesProcess
Full REB reviewDefault requirement for all research involving humansFull committee reviews protocol
Minimal risk reviewWhen probability and magnitude of harms are no greater than everyday lifeREB may delegate to one or more members
Course-based researchStudent research conducted as part of courseworkREB may delegate to relevant department or faculty

Don't confuse: Minimal risk doesn't mean "no risk"—it means risks comparable to those in everyday aspects of life related to the research topic.

🎓 TCPS 2 tutorial requirement

  • Detailed online tutorial covers specific TCPS 2 guidelines.
  • Takes up to 3 hours to complete.
  • Provides a certificate upon completion.
  • Many universities and research institutions now require this certificate before evaluating research proposals.

🤝 Informed consent

🤝 What informed consent means

Informed consent: obtaining and documenting people's agreement to participate in a study, having informed them of everything that might reasonably be expected to affect their decision.

Required information includes:

  • Details of the procedure.
  • Risks and benefits of the research.
  • Right to decline participation or withdraw from the study.
  • Consequences of declining or withdrawing.
  • Any legal limits to confidentiality (e.g., mandatory reporting of child abuse or crimes).

📝 Beyond the consent form

Common misconception: Many people think informed consent = signing a form.

Reality: The written consent form is not sufficient by itself because:

  • Many participants don't actually read consent forms.
  • Some read them but don't understand them.
  • Participants often mistake consent forms for legal documents.
  • Some mistakenly believe signing means giving up their right to sue the researcher.

Best practices for competent adults:

  • Tell participants about risks and benefits verbally.
  • Demonstrate the procedure.
  • Ask if they have questions.
  • Remind them of their right to withdraw at any time.
  • In addition to having them read and sign a consent form.

🚫 When informed consent is not necessary

Informed consent may be waived when:

  • Research is not expected to cause any harm AND the procedure is straightforward.
  • Study is conducted in the context of people's ordinary activities.

Example: Observing whether people hold doors open for others outside a public building does not require informed consent.

Example: A professor comparing two legitimate teaching methods across course sections would not need consent unless planning to publish results in a scientific journal.

🎭 Deception in research

🎭 Forms of deception

Deception can include:

  • Misinforming participants about the study's purpose.
  • Using confederates (people pretending to be participants).
  • Using phony equipment (e.g., Milgram's fake shock generator).
  • Presenting false feedback about performance.
  • Not informing participants of the full design or true purpose, even without active misinformation.

Example: An incidental learning study might tell participants to prepare for a "memory test" of words, but the actual test asks about the room's contents or the research assistant's appearance.

⚖️ The debate over deception

Arguments against deception:

  • Prevents truly informed consent.
  • Fails to respect participants' dignity.
  • Has potential to upset participants.
  • Makes participants distrustful and less honest in responding.
  • Damages the reputation of researchers in the field.

Moderate approach (TCPS 2 and APA Ethics Code): Deception is allowed when:

  1. Benefits of the study outweigh the risks.
  2. Participants cannot reasonably be expected to be harmed.
  3. The research question cannot be answered without deception.
  4. Participants are informed about the deception as soon as possible.

🔍 Degrees of deception

Don't confuse: Not all deception is equally problematic.

TypeSeverityExample
Severe deceptionHigh psychological stress, multiple significant deceptionsMilgram's obedience study
Minor deceptionSlight difference from expectations, minimal stressIncidental learning study with unexpected memory test format

Justification: Some scientifically and socially important research questions are difficult or impossible to answer without deception, because knowing the true purpose would change participants' behavior and eliminate generalizability to real-world situations.

💬 Debriefing

💬 What debriefing involves

Debriefing: the process of informing research participants as soon as possible of the purpose of the study, revealing any deception, and correcting any other misconceptions they might have as a result of participating.

Key components:

  • Explain the true purpose of the study.
  • Reveal any deception used.
  • Correct misconceptions.
  • Minimize any harm that might have occurred.

🩹 Minimizing harm through debriefing

Example: A study on the effects of sad mood on memory might induce sadness by having participants think sad thoughts, watch a sad video, or listen to sad music. Debriefing would involve returning participants' moods to normal by having them think happy thoughts, watch a happy video, or listen to happy music.

Timing: Debriefing should occur as soon as possible after participation, ideally immediately following the study procedures.

📚 Scholarly integrity

📚 Core integrity standards

Obvious prohibitions:

  • Researchers must not fabricate data.
  • Researchers must not plagiarize.

Plagiarism: using others' words or ideas without proper acknowledgement.

Proper acknowledgement requires:

  • Indicating direct quotations with quotation marks.
  • Providing citations to the source of any quotation or idea used.

🔄 Publication and data sharing standards

Additional requirements:

  • Do not publish the same data a second time as though it were new.
  • Share data with other researchers who request it for verification purposes.
  • As peer reviewers, keep unpublished research confidential.

✍️ Authorship ethics

Authorship order must reflect contribution:

  • Authors' names and their order should reflect the importance of each person's contribution.
  • It is unethical to include someone who made only minor contributions (e.g., analyzing some data).
  • It is unethical for a faculty member to make themselves first author on research largely conducted by a student.

Don't confuse: Institutional position (e.g., department chair) does not justify authorship credit—only actual contribution to the research does.

10

Putting Ethics Into Practice

Putting Ethics Into Practice

🧭 Overview

🧠 One-sentence thesis

Conducting ethical psychological research requires researchers to actively identify and minimize risks and deception throughout the entire research process, from design through publication, while maintaining informed consent and debriefing procedures.

📌 Key points (3–5)

  • Core responsibility: Researchers must know and accept their ethical responsibilities, including understanding TCPS 2, institutional policies, and when to seek clarification on ethical issues.
  • Risk management: Identify all risks (physical, psychological, confidentiality violations) and minimize them through design modifications, prescreening, and confidentiality safeguards.
  • Deception principles: Deception is only acceptable when necessary to answer the research question, and even mild forms (like withholding information) should be minimized and disclosed during debriefing.
  • Risk-benefit weighing: Minimal-risk research requires only small benefits to justify it, while higher-risk research demands stronger scientific or practical benefits; harm that is more than minor or long-lasting is rarely justified.
  • Ongoing vigilance: Ethical responsibilities continue after REB approval—monitor participants, protect confidentiality, maintain scholarly integrity, and address unanticipated reactions promptly.

🛡️ Identifying and minimizing risks

🔍 What counts as risk

Researchers must list all potential risks, including:

  • Physical harm
  • Psychological harm (e.g., distress, boredom, frustration)
  • Violations of confidentiality

Common pitfall: Researchers often underestimate risks or overlook them completely because they view the study from their own perspective rather than participants' perspectives.

Example: An emergency medical technician researcher wanted to show gruesome crime scene photos to test sensitivity to violence, but greatly underestimated how disturbing these images would be to most people who lack her professional exposure.

🎯 Three strategies to reduce risk

StrategyHow it worksExample from excerpt
Modify research designShorten procedures, replace upsetting materials with milder versionsBurger's 2009 Milgram replication stopped shocks at 150-V instead of 450-V, avoiding severe stress while still allowing comparison with original results
PrescreeningIdentify and eliminate high-risk participants through informed consent warnings or data collectionBurger used questionnaires and clinical psychologist interviews to exclude participants with physical or psychological problems
Confidentiality safeguardsKeep consent forms separate from data, collect only necessary personal information, prevent unintentional sharingAdminister surveys individually in private rather than in public settings where responses might be overheard

👥 Seeking multiple perspectives

  • Consult research collaborators, experienced researchers, and even non-researchers who can better take the participant's perspective
  • Some risks apply only to certain participants (e.g., crime surveys might upset crime victims even if most people have no problem)
  • Input from diverse sources helps identify overlooked risks

🎭 Managing deception

🎭 Forms of deception

Deception includes not only actively misleading participants, but also allowing them to make incorrect assumptions or withholding information about the full design or purpose of the study.

All forms of deception should be identified and minimized, not just active lies.

✅ When deception is acceptable

According to TCPS 2 and APA Ethics Code:

  • Deception is ethically acceptable only if there is no way to answer the research question without it
  • Always consider whether deception is truly necessary

Example: To study whether professor age affects teaching expectations, you could avoid deception by asking participants to imagine photos are of professors and rate them as if they were, rather than falsely claiming the photos are actually of professors.

🕐 Withholding the research question

  • Generally acceptable to wait until debriefing to reveal the specific research question
  • Must still describe the procedure, risks, and benefits during informed consent
  • Revealing the research question early can invalidate results (participants might respond based on what they think you want)

Minimizing even mild deception: Inform participants (orally or in writing) that although you've accurately described procedure, risks, and benefits, you will reveal the research question afterward—participants essentially consent to having information withheld temporarily.

⚖️ Weighing risks against benefits

📊 Identifying all benefits

Consider benefits to:

  • Research participants themselves
  • Science (advancing knowledge)
  • Society (practical applications)
  • Student researchers (learning how to conduct research, career advancement)

⚖️ The balancing standard

Risk levelBenefit requirementRationale
Minimal risk (no more than daily life or routine exams)Even small benefits justify the researchLow threshold because risks are negligible
More than minimal riskRequires greater benefits; study should be well-designed and answer scientifically interesting questions or have clear practical implicationsCannot justify subjecting people to pain, fear, or embarrassment merely to satisfy personal curiosity
More than minor harm or long-lasting harmRarely justified by any benefitsMilgram's study, though interesting and important, would be considered unethical by today's standards

📋 Creating consent and debriefing procedures

📝 Informed consent process

Three-part approach:

  1. Recruitment stage: Provide as much information as possible (word of mouth, advertisements, participant pool) so those who might find the study objectionable can avoid it

  2. Oral explanation: Prepare a script or talking points to explain the study in simple everyday language, covering:

    • Description of the procedure
    • Risks and benefits
    • Right to withdraw at any time
    • (If appropriate) The fact that some information is being withheld until debriefing
  3. Written consent form: Participants read and sign after the oral explanation; should cover all participant rights and what to expect

Important: Decide first whether informed consent is even necessary (e.g., observations in public places may not require it).

💬 Debriefing procedures

Debriefing cannot rely solely on written forms—participants may not read or understand them.

What to include:

  • Reveal the research question and full study design
  • Explain what happened in other conditions if participants were tested under only one
  • If deception was used: reveal it as soon as possible, apologize, explain why it was necessary, and correct any misconceptions
  • Provide additional benefits: practical information, referrals to counseling or other resources

Example: In a domestic abuse attitudes study, provide pamphlets about domestic abuse and referral information to university counseling.

Timing: Schedule plenty of time—informed consent and debriefing cannot be effective if rushed.

🔄 Ongoing ethical responsibilities

📜 Institutional approval

  • Submit a protocol describing: purpose, research design and procedure, risks and benefits, steps to minimize risks, informed consent and debriefing procedures
  • View the REB process as an opportunity to consult with experienced others, not merely an obstacle
  • Address questions or concerns promptly and in good faith, even if it means further modifications

👀 During the research

Monitor and respond:

  • Watch for unanticipated participant reactions
  • Seek feedback during debriefing
  • Make adjustments if problems arise

Criticism of Milgram: Although he didn't know participants would have severe negative reactions initially, he certainly knew after testing the first several participants and should have made adjustments then.

Stick to the protocol: Follow the approved protocol or seek additional approval for anything beyond minor changes.

🔒 Protecting confidentiality

  • Keep consent forms and data safe and separate from each other
  • Ensure no one (intentionally or unintentionally) has access to any participant's personal information
  • Remain alert for potential violations throughout the study

📚 Publication and beyond

Maintain scholarly integrity:

  • Address publication credit (authorship and author order) with collaborators early
  • Avoid plagiarism in writing
  • Never fabricate data or alter results—your scientific duty is to report honestly and accurately
  • Remember: unexpected results are often as interesting or more so than expected ones

Don't confuse: Your personal hopes for results vs. your scientific goal to learn how the world actually is.

11

Phenomena and Theories

Phenomena and Theories

🧭 Overview

🧠 One-sentence thesis

Scientific theories organize observed phenomena into coherent explanations that go beyond mere description by including unobserved variables and processes, serving to organize knowledge, predict outcomes, and generate new research.

📌 Key points (3–5)

  • Phenomena vs theories: phenomena are reliably observed empirical results; theories are explanations that include concepts not directly observed.
  • Replication matters: phenomena become established through repeated studies; failures to replicate may reveal important boundary conditions.
  • Psychology has effects, not laws: unlike physics or chemistry, psychological phenomena often have exceptions and cultural dependencies.
  • Common confusion: "theory" in science does not mean "untested guess"—theories can be extensively tested and well-supported (e.g., evolution, germ theory).
  • Theories serve multiple purposes: organizing phenomena efficiently, predicting new situations, and generating research questions.

🔬 Understanding phenomena

🔬 What counts as a phenomenon

Phenomenon: a general result that has been observed reliably in systematic empirical research; an established answer to a research question.

  • It is not a one-time finding but a pattern confirmed through systematic observation.
  • Examples from the excerpt: expressive writing improves health, cell phone usage impairs driving, people recall list items at beginning and end better than middle items.
  • Phenomena often receive names that become widely known (e.g., bystander effect, placebo effect, mere exposure effect).

🔁 The role of replication

Replication: conducting a study again—either exactly as originally conducted or with modifications—to ensure it produces the same results.

  • Researchers replicate their own studies before publishing.
  • Other researchers conduct independent replications of interesting results.
  • A single empirical result may be called a phenomenon, but the term is more likely used for replicated findings.
  • Example: expressive writing's positive health effects and cell phone usage's negative driving effects have been replicated many times by different researchers.

⚠️ When replications differ

The excerpt describes how replication failures can be informative rather than just problematic:

  • Different results might mean the original or replication was a fluke (occurred by chance).
  • Differences might reveal important conditions not initially recognized.
  • Example: early studies showed people performed better when watched; later replications showed worse performance when watched. Researcher Zajonc identified the key difference: highly practiced tasks vs unpracticed tasks, leading to two distinct phenomena (social facilitation and social inhibition).

🌍 Psychology has effects, not laws

ScienceType of generalizationsUniversality
Physics/ChemistryLaws (e.g., laws of motion, conservation of mass)Universally true
PsychologyEffectsOften have exceptions; culturally dependent
  • Laws imply universal truth; psychology rarely finds phenomena without exceptions.
  • Example: the fundamental attribution error is committed more frequently in North America than in East Asia.
  • Don't confuse: an "effect" in psychology is still a real, established phenomenon—it just isn't claimed to be universal across all contexts.

🧩 Understanding theories

🧩 What theories are

Theory: a coherent explanation or interpretation of one or more phenomena that goes beyond the phenomena by including variables, structures, processes, functions, or organizing principles not observed directly.

  • Theories explain why or how phenomena occur, not just that they occur.
  • They include concepts not directly observed (e.g., "arousal," "dominant response").
  • Example: Zajonc's drive theory explains both social facilitation and inhibition by proposing that being watched creates physiological arousal, which increases the likelihood of the dominant response—correct for practiced tasks, incorrect for unpracticed tasks.

🎯 Theory vs common usage

Critical distinction: In everyday language, "theory" often means an untested guess; in science, it means an explanation that may be extensively tested and well-supported.

  • The theory of evolution by natural selection is a theory because it explains life's diversity, not because it's untested—evidence is overwhelmingly positive.
  • The "germ theory" of disease is a theory because it explains disease origins, not because there's doubt about microorganisms causing disease.
  • Don't confuse: scientific theories can range from untested to universally accepted; the term itself says nothing about the level of support.

📚 Related terminology

The excerpt notes researchers use several related terms, often interchangeably:

TermScopeDescription
PerspectiveBroadGeneral approach to explaining phenomena (e.g., biological perspective, behavioral perspective)
ModelSpecificPrecise explanation often expressed in equations, programs, or biological processes
HypothesisNarrowExplanation relying on few key concepts; more commonly a prediction based on a theory
Theoretical frameworkVariableCan be as broad as a perspective or as specific as a model; the context for understanding a phenomenon
  • Example: drive theory could also be called drive model or drive hypothesis without being wrong.
  • The biopsychosocial model of health is really more like a perspective (health determined by biological, psychological, and social factors interacting).
  • Key distinction: observations vs interpretations remains most important.

🎯 What theories accomplish

📊 Organization of phenomena

Theories organize phenomena to help people think clearly and efficiently:

  • Drive theory organizes seemingly contradictory results about performance when watched.
  • The multistore model of memory summarizes: limited capacity of attended information, importance of rehearsal, serial-position effect, and more.
  • Intelligence theory (general mental ability g plus specific abilities) summarizes statistical relationships: all mental ability tests somewhat positively correlated, certain subsets more correlated than others.

Parsimony principle (Occam's razor): a theory should include only as many concepts as necessary to explain the phenomena.

  • Simpler theories organize phenomena more efficiently than complex ones.
  • Theories are useful to the extent they organize more phenomena with greater clarity and efficiency.
  • Don't confuse: a theory can be accurate without being useful (e.g., "expressive writing helps people deal with emotions" is too vague), or useful without being entirely accurate (the multistore memory model is still cited despite known inaccuracies).

🔮 Prediction in new situations

Theories allow predictions about what will happen in situations not yet studied:

  • Example from excerpt: a gymnastics coach wonders whether a student will perform better or worse in competition than when practicing alone.
  • Even if this specific question has never been studied, Zajonc's drive theory suggests an answer.
  • The excerpt indicates the theory would predict based on whether the task is highly practiced (the text cuts off before completing the example).

🔬 Generation of new research

Though the excerpt mentions this as a third purpose, it does not elaborate on how theories generate new research before the text ends.

12

The Variety of Theories in Psychology

The Variety of Theories in Psychology

🧭 Overview

🧠 One-sentence thesis

Psychological theories vary along three key dimensions—formality, scope, and theoretical approach—and researchers use different types of theories depending on the stage of research and the level of detail needed.

📌 Key points (3–5)

  • Three dimensions of variation: formality (how precisely specified), scope (how many phenomena explained), and theoretical approach (functional vs. mechanistic vs. organizational).
  • Formality trade-off: informal theories are easier to create and understand but less precise; formal theories (equations, programs) are harder to grasp but make more testable predictions.
  • Scope trade-off: broad theories explain many phenomena but are vague and hard to test; narrow theories are more precise but cover fewer phenomena.
  • Common confusion: functional vs. mechanistic theories—functional theories explain why (purpose/function), mechanistic theories explain how (variables, structures, processes).
  • Context matters: different theory types suit different research stages—informal and broad theories work well early on; formal and narrow theories fit later stages when phenomena are well-described.

📏 Formality: How precisely is the theory specified?

📝 Informal theories

Formality: the extent to which the components of the theory and the relationships among them are specified clearly and in detail.

  • What informal theories are: simple verbal descriptions of a few important components and relationships.
  • The excerpt gives examples: habituation theory of expressive-writing effects on health, drive theory of social facilitation and inhibition.
  • Strengths: easier to create and understand.
  • Weaknesses: less precise predictions, harder to test.
  • When to use: especially appropriate in early research stages when phenomena have not yet been described in detail.

🔢 Formal theories

  • What formal theories are: expressed in mathematical equations or computer programs.
  • The excerpt lists well-known examples:
    • ACT-R: a comprehensive theory of human cognition akin to a programming language.
    • Prospect theory: a formal theory of decision making under uncertainty (Daniel Kahneman won the Nobel Prize in economics partly for this work with Amos Tversky).
    • Rescorla-Wagner model: a theory of classical conditioning with an equation describing how association strength changes when unconditioned and conditioned stimuli are paired.
  • Strengths: more precise predictions, easier to test.
  • Weaknesses: more difficult to create and understand; may require mathematical or programming background.
  • When to use: especially appropriate in later research stages when phenomena have been described in detail.

⚖️ Both have their place

  • The excerpt emphasizes that both informal and formal theories are valuable in psychological research.
  • The choice depends on the research stage and the level of detail available.

🌍 Scope: How many phenomena does the theory explain?

🌐 Broad-scope theories

Scope: the number and diversity of the phenomena a theory explains or interprets.

  • Historical context: many early psychological theories attempted to interpret essentially all human behavior (e.g., Freud applied his theory to psychological disorders, slips of the tongue, dreaming, sexuality, art, politics, and civilization).
  • Why they fell out of favor: tend to be imprecise, difficult to test, not particularly successful at organizing or predicting human behavior at the level of detail researchers seek; large theories that attempt to explain everything often end up vague and seldom make specific predictions.
  • Contemporary broad theories: still exist but are more limited in scope than the grand theories of the past.
  • Example from the excerpt: cognitive dissonance theory (Leon Festinger, 1956)—assumes that when people hold inconsistent beliefs, the discomfort motivates them to reduce it by changing one or both beliefs; applied to diverse phenomena like persistence of irrational beliefs (e.g., smoking), persuasion techniques (asking for a small favor before a big one), and placebo effects.

🔬 Narrow-scope theories

  • What narrow theories are: apply to a small number of closely related phenomena.
  • Example from the excerpt: subitizing—people's ability to quickly and accurately perceive the number of objects in a scene without counting, as long as the number is four or fewer.
  • One theory of subitizing: small numbers of objects are associated with easily recognizable patterns (e.g., three objects form a "triangle" that is quickly perceived).
  • Strengths: tend to be more formal and more precise in their predictions.
  • Weaknesses: organize fewer phenomena.

⚖️ Trade-offs in scope

ScopePhenomena organizedFormalityPrecision
BroadMany diverse phenomenaTend to be less formalLess precise predictions
NarrowFew closely related phenomenaTend to be more formalMore precise predictions
  • Both broad and narrow theories have their place in psychological research.

🔧 Theoretical approach: What kind of explanation does the theory provide?

🎯 Functional theories

Functional theories: explain psychological phenomena in terms of their function or purpose.

  • Focus: why the phenomenon occurs—what purpose it serves.
  • Does not focus on: how the phenomenon happens (the mechanism).
  • Example from the excerpt: repeated self-injury theory—people engage in self-injury (e.g., cutting) because it produces a short-term reduction in the intensity of negative emotions they are feeling.
  • Evolutionary psychology: theories from this perspective tend to be functional, assuming human behavior evolved to solve specific adaptive problems faced by our distant ancestors.

🧬 Evolutionary example: sex differences in mating strategies

  • Phenomenon: men are somewhat more likely than women to seek short-term partners and value physical attractiveness over material resources; women are somewhat more likely to seek long-term partners and value material resources over physical attractiveness.
  • Functional explanation:
    • Male investment in becoming a parent is relatively small → men reproduce more successfully by seeking several short-term partners who are young and healthy (signaled by physical attractiveness).
    • Female investment in becoming a parent is quite large → women reproduce more successfully by seeking a long-term partner who has resources to contribute to raising the child.
  • This explains why the differences exist in terms of reproductive success.

⚙️ Mechanistic theories

Mechanistic theories: focus on specific variables, structures, and processes, and how they interact to produce the phenomena.

  • Focus: how the phenomenon occurs—identifying a mechanism or explanation and providing context for when or how intense the phenomenon happens.
  • Examples from the excerpt: drive theory of social facilitation and inhibition, multistore model of human memory.
  • Detailed example: hypochondriasis theory—an extreme form of health anxiety where people misinterpret ordinary bodily symptoms (e.g., headaches) as signs of serious illness (e.g., brain tumor).
    • Key variables and relationships:
      1. People high in neuroticism (negative emotionality) start to pay excessive attention to negative health information.
      2. This is especially true if they had a significant illness experience as a child (e.g., a seriously ill parent).
      3. This attention to negative health information leads to health anxiety and hypochondriasis.
      4. The effect is especially strong among people low in effortful control (the ability to shift attention away from negative thoughts and feelings).
    • This theory specifies several key variables and the relationships among them.

🧠 Biological mechanistic theories

  • With advances in genetics and neuroscience, mechanistic theories increasingly focus on biological structures and processes.
  • Research is often criticized when it does not contain a mechanism.
  • Example: researchers are constructing and testing theories that specify the brain structures associated with storage and rehearsal of information in short-term memory, transfer to long-term memory, etc.
  • Schizophrenia: explained in terms of several biological theories focusing on genetics, neurotransmitters, brain structures, and even prenatal exposure to infections.

🔄 Functional vs. mechanistic: the key distinction

  • Don't confuse: functional theories provide the "why" (purpose/function); mechanistic theories provide the "how" (variables, structures, processes, interactions).
  • Both are valuable and address different aspects of understanding psychological phenomena.

🗂️ Organizational theories

  • The excerpt mentions that there are also theoretical approaches that provide organization without necessarily providing a functional or mechanistic explanation.
  • These theories help structure and categorize phenomena without explaining why or how they occur.

🔍 Case study: Dissociative Identity Disorder (DID)

🤼 Two competing theories

The excerpt provides an example of competing theories for the same phenomenon:

TheoryExplanationEvidence
Sociocognitive theoryDID comes about because patients are aware of the disorder, know its characteristic features, and are encouraged to take on multiple personalities by their therapists• DID diagnosis greatly increased after the book and film Sybil in the 1970s<br>• DID is extremely rare outside North America<br>• A very small percentage of therapists diagnose the vast majority of DID cases<br>• Treatment literature includes practices that encourage patients to act out multiple personalities (e.g., bulletin boards for personalities to leave messages)<br>• Normal people can easily re-create DID symptoms with minimal suggestion in simulated clinical interviews
Post-traumatic theoryMultiple personalities develop as a way of coping with sexual abuse or some other trauma(The excerpt does not list supporting evidence for this theory)

📊 Theory testing in action

  • The excerpt states that "there are now several lines of evidence that support the sociocognitive model over the post-traumatic model."
  • This illustrates the scientific process: multiple theories are considered, and evidence determines which theory is retained and which is abandoned or revised.
  • Theories that fare well are assumed to be more accurate and are retained; those that fare poorly are assumed to be less accurate and are abandoned.
  • Scientists generally do not believe theories provide perfectly accurate descriptions of the world, but assume this process produces theories that come closer and closer to that ideal.
13

Using Theories in Psychological Research

Using Theories in Psychological Research

🧭 Overview

🧠 One-sentence thesis

Researchers use theories through a cyclical process of deriving testable hypotheses, conducting empirical studies, and revising theories based on results—a practice that transforms theories from optional additions into essential ingredients of psychological research.

📌 Key points (3–5)

  • The hypothetico-deductive method: researchers start with a theory, derive a hypothesis (prediction), test it empirically, then reevaluate and revise the theory based on results.
  • Confirming vs. disconfirming: a confirmed hypothesis strengthens a theory but never "proves" it; a disconfirmed hypothesis weakens a theory but doesn't automatically disprove it.
  • Common confusion: disconfirming a hypothesis doesn't mean researchers immediately abandon the theory—it could be a fluke, a design flaw, or a minor unstated assumption that wasn't met.
  • Competing theories: the best hypotheses distinguish between rival theories by making opposite predictions so only one can be confirmed.
  • Practical incorporation: distinguish phenomena from theories, identify multiple plausible explanations, and use theories to generate interesting research questions.

🔄 The hypothetico-deductive cycle

🔄 How the cycle works

The hypothetico-deductive method: a cyclical process where researchers construct or choose a theory, derive a hypothesis, test it empirically, and revise the theory based on results.

  • Step 1: Start with a set of phenomena and either construct a new theory or choose an existing one.
  • Step 2: Make a prediction (hypothesis) about a new phenomenon that should occur if the theory is correct.
  • Step 3: Conduct an empirical study to test the hypothesis.
  • Step 4: Reevaluate the theory in light of results and revise if necessary.
  • Step 5: Derive a new hypothesis from the revised theory and repeat.

This approach integrates with the general model of scientific research to create "theoretically motivated" or "theory-driven" research.

🪳 Example: Zajonc's social facilitation research

Zajonc started with contradictory results from the literature and constructed drive theory:

  • Theory: Being watched while performing a task causes physiological arousal, which increases the tendency to make the dominant response.
  • Prediction: Social facilitation for well-learned tasks; social inhibition for poorly learned tasks.
  • Hypothesis: Presence of others should improve performance on simple tasks but inhibit performance on difficult versions of the same task.
  • Test: Cockroaches ran down either a straight runway (easy) or cross-shaped maze (difficult) while alone or with other cockroaches in "audience boxes."
  • Result: Cockroaches reached the goal faster in the straight runway with an audience but slower in the maze with an audience—confirming the hypothesis and supporting drive theory.

Don't confuse: The theory organized previous results meaningfully, but Zajonc still needed to test it with new predictions.

🏗️ Constructing and choosing theories

🏗️ Building a new theory

Constructing theories is creative but requires preparation:

  • Know the phenomena of interest in detail through thorough literature review.
  • Know existing theories thoroughly.
  • The new theory must provide coherent explanation and have some advantage: more formal/precise, broader scope, more parsimonious, or a new perspective.
  • If no existing theory exists, almost any theory is a step forward.

Formality, scope, and approach depend partly on the phenomena and partly on the researcher's interests and abilities (e.g., neural theories require neuroscience training).

🔍 Working with existing theories

More commonly, researchers start with someone else's theory (giving proper credit):

  • This practice exemplifies collective advancement of scientific knowledge.
  • Researchers may derive and test a hypothesis from the existing theory.
  • Or they may modify the theory to account for new phenomena and test the modified version.

Many theories in psychology are informal, narrow, and accessible even to beginning researchers.

🎯 Deriving and testing hypotheses

🎯 What hypotheses are

Hypothesis: a prediction about a new phenomenon that should be observed if a particular theory is accurate.

  • Theories and hypotheses always have an if-then relationship: "If drive theory is correct, then cockroaches should run faster in a straight runway when others are present."
  • Hypotheses are usually statements but can be rephrased as questions: "Do cockroaches run faster when others are present?"
  • Deriving hypotheses from theories is an excellent way to generate interesting research questions.

🛠️ Three ways to derive hypotheses

MethodDescriptionExample
Question-firstGenerate a research question, then ask if any theory implies an answerDoes expressive writing about positive experiences improve health? Habituation theory implies no, because it wouldn't cause habituation to negative thoughts
Component-focusFocus on a theory component not yet directly observedTest the habituation process itself—people should show fewer signs of distress with each writing session
Theory-competitionDistinguish between competing theoriesNumber-of-examples vs. ease-of-retrieval theories make opposite predictions about assertiveness judgments

🏆 Best hypotheses: distinguishing competing theories

Example from Schwarz and colleagues:

  • Two theories of self-judgment: based on number of examples vs. ease of retrieval.
  • Test: Asked people to recall 6 (easy) or 12 (difficult) times they were assertive, then judge their assertiveness.
  • Opposite predictions: Number theory predicts 12-example group judges themselves more assertive; ease theory predicts 6-example group judges themselves more assertive.
  • Result: 6-example group judged themselves more assertive—particularly convincing evidence for ease-of-retrieval theory.

Don't confuse: When theories make opposite predictions, only one can be confirmed, providing stronger evidence.

⚖️ Evaluating and revising theories

⚖️ What confirmation means

  • Confirmed hypothesis: strengthens the theory—it made an accurate prediction and now accounts for a new phenomenon.
  • Important limitation: confirmation can never "prove" a theory.
  • Scientists avoid the word "prove" when discussing theories.

Two reasons confirmation isn't proof:

  1. Other plausible theories might imply the same hypothesis, so confirmation strengthens all equally.
  2. Future tests might disconfirm the hypothesis or new hypotheses from the theory—the "problem of induction" (observing white swans doesn't prove all swans are white; a black swan might appear).

Scientists view even highly successful theories as subject to revision based on new observations.

⚖️ What disconfirmation means

  • Disconfirmed hypothesis: weakens the theory—it made an inaccurate prediction and there's a new phenomenon it doesn't account for.
  • Formal logic: "If A then B" and "not B" necessarily leads to "not A" (theory is incorrect).
  • Practice: scientists don't abandon theories so easily.

Why researchers don't immediately give up:

  • Could be a fluke or faulty research design (failed manipulation or measurement).
  • Could mean an unstated minor assumption wasn't met.
  • Example: If Zajonc hadn't found social facilitation in cockroaches, he could have concluded drive theory still applies but only to animals with complex nervous systems.

Important: This flexibility doesn't mean ignoring disconfirmations. If researchers can't improve designs or modify theories to account for repeated disconfirmations, they eventually abandon and replace theories.

🧰 Incorporating theory into your research

🧰 Distinguish phenomena from theories

Critical first step: separate what you observe from explanations of it.

  • Beware "fusion": conflating a phenomenon with a commonsense theory.
  • Bad example 1: "Cell phone usage distracts people from driving" (fuses phenomenon with vague explanation).
  • Bad example 2: "Dealing with emotions through writing makes you healthier" (fuses phenomenon with commonsense explanation).
  • Problem: This conflation gives the impression the phenomenon is already explained and closes off further inquiry into precisely why or how.

Better approach (Burger and colleagues' example):

  • Phenomenon: People are more willing to comply with requests from familiar vs. unfamiliar people.
  • Multiple theories: (1) complying creates positive feelings, (2) we anticipate needing something in return, (3) we like them more and follow an automatic rule to help people we like.

📚 Identify existing theories

  • Turn to the research literature to find existing theories of your phenomena.
  • There will usually be more than one plausible theory (complementary or competing).
  • If no existing theories, generate two or three of your own—even if informal and limited.

Habit to develop: Describe phenomena followed by two or three best theories, whether speaking or writing.

Example script: "It's about the fact that we're more likely to comply with requests from people we know [phenomenon]. This is interesting because it could be because it makes us feel good [Theory 1], because we think we might get something in return [Theory 2], or because we like them more and have an automatic tendency to comply with people we like [Theory 3]."

🔬 Use theories to generate hypotheses

  • For each research question, ask what each plausible theory implies about the answer.
  • If one theory implies a particular answer, you have an interesting hypothesis to test.
  • Example: Burger and colleagues tested requests from strangers participants sat next to briefly with no interaction or future expectation—if familiarity creates liking and liking increases compliance (Theory 3), this should still increase compliance (it did).
  • If no theory implies an answer, this gap might suggest constructing a new theory or modifying existing ones—excellent discussion points for research reports.

✍️ Two formats for including theory in reports

FormatWhen to useStructure
Question-firstApplied research or questions existing theories don't addressRaise research question → conduct new study → offer theories to explain/interpret results
Theory-firstTesting or extending existing theoriesDescribe existing theories → derive hypothesis from one → test in new study → reevaluate theory

Both formats are valid; choice depends on your research question and available theories.

14

Understanding Psychological Measurement

Understanding Psychological Measurement

🧭 Overview

🧠 One-sentence thesis

Psychological measurement assigns scores to represent constructs that cannot be directly observed, and researchers must systematically demonstrate that their measures are both reliable and valid rather than simply assuming they work.

📌 Key points (3–5)

  • What measurement is: assigning scores to individuals so the scores represent some characteristic, using systematic procedures rather than specific instruments.
  • Psychological constructs: variables like personality traits, emotions, and abilities that cannot be directly observed because they represent tendencies or internal processes.
  • Operational definitions: multiple ways to measure the same construct (self-report, behavioral, physiological), with converging operations strengthening evidence.
  • Common confusion: reliability vs. validity—a measure can be highly consistent (reliable) yet measure the wrong thing entirely (invalid).
  • Levels of measurement: nominal, ordinal, interval, and ratio scales communicate different amounts of quantitative information and determine appropriate statistical analyses.

🔍 What measurement means in psychology

🔍 The core definition

Measurement: the assignment of scores to individuals so that the scores represent some characteristic of the individuals.

  • This definition applies equally to physical measurement (bathroom scales, thermometers) and psychological measurement (psychometrics).
  • Key insight: Measurement does not require particular instruments—it requires a systematic procedure for assigning scores.
  • Example: A cognitive psychologist measures working memory capacity using a backward digit span task (repeating digits in reverse order); the longest correct list length is the score representing that person's capacity.

🔍 Why systematic procedures matter

  • The procedure must be consistent and rule-based, not arbitrary.
  • Example: A clinical psychologist uses the Beck Depression Inventory (21 self-report items rated over the past 2 weeks); the sum represents current depression level.
  • Don't confuse: You don't need fancy equipment—you need a clear, replicable method for turning observations into scores.

🧩 Psychological constructs

🧩 What constructs are

Psychological constructs: variables that cannot be directly observed, including personality traits (extraversion), emotional states (fear), attitudes (toward taxes), and abilities (athleticism).

  • Why they can't be observed directly:
    • They represent tendencies to think, feel, or act in certain ways across situations.
    • They involve internal processes (thoughts, feelings, physiological responses).
  • Example: Saying someone is "highly extraverted" doesn't mean they're talking right now—they might be quietly reading—it means they have a general tendency toward extraverted behavior.

🧩 Constructs as summaries

  • Neither extraversion nor fear "reduces to" any single thought, feeling, or behavior.
  • Each construct is a summary of a complex set of behaviors and internal processes.
  • Example: Fear involves nervous system activation, certain thoughts and feelings, and behaviors—none necessarily obvious to an outside observer.

🧩 The Big Five example

The excerpt presents five broad personality dimensions, each defined by six more specific facets:

DimensionSample Facets
Openness to experienceFantasy, Aesthetics, Feelings, Actions, Ideas, Values
ConscientiousnessCompetence, Order, Dutifulness, Achievement/Striving, Self-discipline, Deliberation
ExtraversionWarmth, Gregariousness, Assertiveness, Activity, Excitement seeking, Positive emotions
AgreeablenessTrust, Straightforwardness, Altruism, Compliance, Modesty, Tender-mindedness
NeuroticismWorry, Anger, Discouragement, Self-consciousness, Impulsivity, Vulnerability

📐 Conceptual vs. operational definitions

📐 Conceptual definitions

Conceptual definition: describes the behaviors and internal processes that make up a construct, along with how it relates to other variables.

  • Example: Neuroticism is people's tendency to experience negative emotions (anxiety, anger, sadness) across situations; it has a strong genetic component, remains fairly stable over time, and correlates positively with experiencing pain and physical symptoms.
  • Why not just use the dictionary?: Many scientific constructs lack everyday counterparts, and researchers develop more detailed, precise definitions through empirical testing.
  • Research literature often contains different conceptual definitions of the same construct—researchers test and revise definitions, sometimes replacing older ones with better-fitting versions.

📐 Operational definitions

Operational definition: a definition of a variable in terms of precisely how it is to be measured.

Three broad categories of operational definitions:

  1. Self-report measures: Participants report their own thoughts, feelings, actions (e.g., Rosenberg Self-Esteem Scale).
  2. Behavioral measures: Observing and recording participants' behavior in structured tasks or natural settings (e.g., counting acts of physical aggression toward a Bobo doll during 20 minutes of play).
  3. Physiological measures: Recording processes like heart rate, blood pressure, hormone levels, or brain activity.

📐 Multiple definitions and converging operations

  • For any construct, there will be multiple operational definitions.
  • Example: Stress can be measured via the Social Readjustment Rating Scale (stressful life events with severity points), the Daily Hassles and Uplifts Scale (everyday stressors), the Perceived Stress Scale (feelings of stress), or physiological variables (blood pressure, cortisol levels).
  • Converging operations: Using multiple operational definitions that "converge" on the same construct.
  • When scores from different operational definitions correlate with each other and produce similar patterns, this provides good evidence the construct is being measured effectively.
  • Example: Various stress measures all correlate with each other and with immune system functioning, allowing the general conclusion "stress is negatively correlated with immune system functioning."

📊 Levels of measurement

📊 Why levels matter

Psychologist S. S. Stevens identified four levels of measurement that communicate different amounts of quantitative information. The level affects which statistical procedures are appropriate and what conclusions can be drawn.

📊 Nominal level

Nominal level: used for categorical variables; assigns scores that are category labels.

  • Communicates whether two individuals are the same or different on the variable.
  • Does not imply any ordering among responses.
  • Example: Typing whether each participant is male or female, or which ethnicity they identify with.
  • Lowest level of measurement—only the mode can be used as a measure of central tendency.

📊 Ordinal level

Ordinal level: assigns scores representing rank order of individuals.

  • Communicates whether individuals are the same/different AND whether one is higher or lower.
  • Example: Microwave satisfaction ratings ("very dissatisfied," "somewhat dissatisfied," "somewhat satisfied," "very satisfied").
  • Key limitation: The difference between adjacent levels cannot be assumed equal.
  • Example: The gap between "very dissatisfied" and "somewhat dissatisfied" may not equal the gap between "somewhat dissatisfied" and "somewhat satisfied."
  • Don't confuse: You know the order but not whether intervals are equal—just like knowing runners' finishing positions (1st, 2nd, 3rd) without knowing the time gaps between them.

📊 Interval level

Interval level: assigns scores using numerical scales in which intervals have the same interpretation throughout.

  • Equal intervals represent equal differences in the underlying quantity.
  • Example: Fahrenheit or Celsius temperature scales—the difference between 30° and 40° represents the same temperature difference as between 80° and 90°.
  • Key limitation: No true zero point (zero is arbitrary).
  • Because there's no true zero, ratios don't make sense—you can't say 80° is "twice as hot" as 40° because the claim depends on where you arbitrarily start the scale.
  • Example in psychology: Intelligence quotient (IQ) is often considered interval-level.

📊 Ratio level

Ratio level: assigns scores with a true zero point representing complete absence of the quantity.

  • Has all properties of earlier scales plus meaningful ratios.
  • Examples: Height in meters, weight in kilograms, counts of discrete objects (number of siblings, correct exam answers).
  • True zero allows meaningful ratio statements.
  • Example: Someone with 50 cents has twice as much money as someone with 25 cents (money has a true zero point—zero money means absence of money).
  • Temperature example: The Kelvin scale has absolute zero, making it a ratio scale; if one temperature is twice another on the Kelvin scale, it has twice the kinetic energy.

📊 Summary table

LevelCategory labelsRank orderEqual intervalsTrue Zero
NominalX
OrdinalXX
IntervalXXX
RatioXXXX

🔬 Reliability: consistency of measurement

🔬 What reliability means

Reliability: the consistency of a measure.

  • Critical principle: Psychologists do not simply assume their measures work—they collect data to demonstrate reliability and validity. If research doesn't demonstrate a measure works, they stop using it.
  • Example: If your bathroom scale says you gained 10 pounds after a month of dieting (when clothes fit loosely and friends notice weight loss), you'd conclude it's broken—the same logic applies to psychological measures.

🔬 Test-retest reliability

Test-retest reliability: the extent to which a measure produces consistent scores across time for constructs assumed to be stable.

  • Assessed by measuring the same group twice and examining the correlation between the two sets of scores.
  • Example: Intelligence is assumed consistent over time, so a good intelligence measure should produce similar scores for the same person next week.
  • How to assess: Graph data in a scatterplot and compute Pearson's r; a correlation of +.80 or greater indicates good reliability.
  • Example: Rosenberg Self-Esteem Scale administered twice, one week apart, produced r = +.95.
  • Don't confuse: Low test-retest correlation is only a problem for constructs assumed to be stable—mood changes by nature, so a mood measure with low test-retest correlation over a month is not concerning.

🔬 Internal consistency

Internal consistency: the consistency of people's responses across items on a multiple-item measure.

  • All items should reflect the same underlying construct, so responses should correlate with each other.
  • Example: On the Rosenberg Self-Esteem Scale, people who agree they are "a person of worth" should tend to agree they have "a number of good qualities."
  • Applies to all measure types: If someone makes a series of bets in a simulated roulette game (measuring risk-seeking), their bets should be consistently high or low across trials.

Assessment methods:

  • Split-half correlation: Split items into two sets (e.g., first/second halves or even/odd items), compute a score for each set, examine the relationship. A correlation of +.80 or greater indicates good internal consistency.
  • Cronbach's α (alpha): The mean of all possible split-half correlations for a set of items. A value of +.80 or greater indicates good internal consistency.

🔬 Interrater reliability

Interrater reliability: the extent to which different observers are consistent in their judgments.

  • Important when behavioral measures involve observer judgment.
  • Example: To measure social skills, video-record students meeting someone new, then have multiple observers rate each student's social skills—ratings should be highly correlated.
  • Example: In the Bobo doll study, different observers' counts of aggressive acts by the same child should be highly positively correlated.
  • Assessment: Use Cronbach's α for quantitative judgments or Cohen's κ (kappa) for categorical judgments.

✅ Validity: measuring what you intend

✅ What validity means

Validity: the extent to which scores from a measure represent the variable they are intended to measure.

  • Reliability is necessary but not sufficient for validity.
  • Example (absurd): Measuring self-esteem by index finger length would have excellent test-retest reliability but absolutely no validity—finger length indicates nothing about self-esteem.
  • Validity is judged by considering multiple types of evidence beyond reliability.

✅ Face validity

Face validity: the extent to which a measurement method appears "on its face" to measure the construct of interest.

  • Usually assessed informally, based on whether the measure seems related to the construct.
  • Example: A self-esteem questionnaire with items about being "a person of worth" and having "good qualities" has good face validity; the finger-length method has poor face validity.
  • Weakest form of evidence: Based on intuitions about human behavior, which are frequently wrong.
  • Don't confuse: Many established measures work well despite lacking face validity.
  • Example: The MMPI-2 measures personality characteristics using over 567 statements, many with no obvious relationship to what they measure (e.g., "I enjoy detective stories" and "Blood doesn't frighten me" both measure suppression of aggression).

✅ Content validity

Content validity: the extent to which a measure "covers" the construct of interest.

  • The measure should reflect all aspects of the conceptual definition.
  • Example: If test anxiety is defined as involving both sympathetic nervous system activation (nervous feelings) AND negative thoughts, the measure should include items about both.
  • Example: Attitudes involve thoughts, feelings, and actions—so a measure of attitudes toward exercise must reflect all three aspects to have good content validity.
  • Usually assessed by carefully checking the measurement method against the conceptual definition, not quantitatively.

✅ Criterion validity

Criterion validity: the extent to which people's scores on a measure correlate with other variables (criteria) one would expect them to correlate with.

  • Concurrent validity: Criterion measured at the same time as the construct.
  • Predictive validity: Criterion measured in the future (scores "predict" a future outcome).

Examples of expected correlations:

  • Test anxiety scores should be negatively correlated with exam performance and course grades, positively correlated with general anxiety and blood pressure during exams.
  • Physical risk-taking scores should correlate with participation in extreme activities (snowboarding, rock climbing), speeding tickets, and number of broken bones.

✅ Convergent validity

Convergent validity: the extent to which scores on a measure correlate with other measures of the same construct.

  • New measures should be positively correlated with existing measures of the same construct.
  • Example: The Need for Cognition Scale (measuring how much people value and engage in thinking) was shown to be positively correlated with academic achievement test scores and negatively correlated with dogmatism (tendency toward obedience).
  • Over the years, it has been correlated with many other variables (advertisement effectiveness, interest in politics, juror decisions), building evidence of validity.

✅ Discriminant validity

Discriminant validity: the extent to which scores on a measure are NOT correlated with measures of conceptually distinct variables.

  • Demonstrates the measure is not confusing the target construct with something else.
  • Example: Self-esteem (a stable general attitude toward the self) should not be highly correlated with mood (how one feels right now). If a new self-esteem measure highly correlates with mood, it may be measuring mood instead of self-esteem.
  • Example: Need for Cognition Scale showed only weak correlation with cognitive style (analytic vs. holistic thinking) and no correlation with test anxiety or social desirability—evidence it reflects a conceptually distinct construct.

🔄 The ongoing process of validation

🔄 Validation is cumulative

  • Reliability and validity are not established by a single study but by the pattern of results across multiple studies.
  • Assessment of reliability and validity is an ongoing process.
  • Researchers continuously collect evidence to demonstrate their measures work; if they cannot, they stop using them.

🔄 Four broad steps in measurement

The excerpt introduces (but does not fully develop) a four-step measurement process:

  1. Conceptually defining the construct
  2. Operationally defining the construct
  3. Implementing the measure
  4. Evaluating the measure
15

Reliability and Validity of Measurement

Reliability and Validity of Measurement

🧭 Overview

🧠 One-sentence thesis

Establishing the reliability and validity of a psychological measure is an ongoing process that requires multiple studies and careful attention to conceptual clarity, operational decisions, implementation conditions, and continuous evaluation.

📌 Key points (3–5)

  • Reliability and validity are never "done": No single study establishes a measure's quality; the pattern of results across multiple studies builds the evidence over time.
  • Four-step measurement process: conceptually defining the construct, operationally defining it, implementing the measure, and evaluating it.
  • Use existing vs. create new: Existing measures save time and allow comparison with prior research, but new measures may be needed when none exist or when testing convergent validity.
  • Common confusion: Reliability/validity are not properties of the measure itself in isolation—they depend on the sample, testing conditions, and context of use.
  • Reactivity threatens validity: Participants may respond in socially desirable ways, pick up on demand characteristics, or be influenced by researcher expectations, reducing score validity.

📐 The four-step measurement process

📐 Step 1: Conceptually defining the construct

A clear and complete conceptual definition of a construct is a prerequisite for good measurement.

  • Why it matters: Without clarity, you cannot decide how to measure the construct.
  • Example: If you only vaguely want to measure "memory," you won't know whether to test vocabulary recall, photograph recognition, skill execution, or long-ago experiences.
  • How to do it: Read the research literature carefully and pay attention to how others have defined the construct.
  • Modern psychology often breaks broad constructs into semi-independent systems (e.g., memory is not one thing but a set of systems like long-term semantic memory for facts).
  • Example: If you are interested in long-term semantic memory, having participants remember a word list makes sense; having them execute a newly learned skill does not.

📐 Step 2: Operationally defining the construct

This step involves deciding whether to use an existing measure or create a new one (covered in detail in the next section).

📐 Step 3: Implementing the measure

  • Test everyone under similar, quiet, distraction-free conditions when possible.
  • Be aware that group testing is efficient but can create distractions.
  • Use previous research as a guide for testing conditions.

📐 Step 4: Evaluating the measure

  • Collect data and assess reliability and validity based on your sample and conditions.
  • Even well-established measures should be re-evaluated in your specific context.
  • Add your evidence to the research literature.

🔄 Deciding: existing measure vs. new measure

🔄 Advantages of using an existing measure

AdvantageExplanation
Saves time and effortYou don't have to create your own from scratch.
Existing validity evidenceIf it has been used successfully before, there is already some evidence it works.
ComparabilityYour results can be compared and combined with previous results.
ExpectationOther researchers may expect you to use a reliable and valid existing measure unless you have a good reason not to.

🔄 Choosing among existing measures

When multiple measures exist, you might choose based on:

  • Commonality: the most widely used one.
  • Evidence quality: the one with the best reliability and validity evidence.
  • Aspect of interest: the one that best measures a particular facet (e.g., a physiological measure of stress if you care about underlying physiology).
  • Practicality: the easiest to use.

Example: The Ten-Item Personality Inventory (TIPI) measures all Big Five personality dimensions with just 10 items. It is less reliable and valid than longer measures, but a researcher might choose it when testing time is severely limited.

🔄 Where to find existing measures

  • Research articles: Measures created for scientific research are usually described in detail in published articles and are free to use with proper citation.
  • Later articles: May describe the measure briefly and reference the original article; you must get details from the original.
  • Directories: The Directory of Unpublished Experimental Measures catalogs measures used in previous research.
  • Proprietary measures: Some (especially clinical measures) are owned by publishers and must be purchased (e.g., standard intelligence tests, Beck Depression Inventory, MMPI). Details can be found in Tests in Print and the Mental Measurements Yearbook, often available in university libraries.

🔄 When to create your own measure

Reasons to create a new measure:

  • No existing measure of the construct exists.
  • Existing measures are too difficult or time-consuming.
  • You want to test convergent validity by seeing if a new measure works the same way as existing ones.

Don't confuse: Creating a "new" measure usually means adapting or varying an existing one, not inventing from scratch.

🛠️ Guidelines for creating new measures

🛠️ Start with the literature

  • Most new measures are variations of existing ones.
  • Look for ideas: modify an existing questionnaire, convert a computerized measure to paper-and-pencil (or vice versa), or adapt a measure used for another purpose.
  • Example: The Stroop task (quickly naming colors that color words are printed in) has been adapted for social anxiety research—socially anxious people are slower at color naming when words have negative social connotations like "stupid."

🛠️ Strive for simplicity

  • Participants are not as interested in your research as you are.
  • They vary widely in ability to understand and carry out tasks.
  • Clear instructions: Use simple language, present in writing or read aloud (or both).
  • Practice items: Include one or more so participants become familiar with the task.
  • Opportunity for questions: Build in a chance for participants to ask before continuing.
  • Brevity: Keep the measure brief to avoid boredom or frustration that reduces reliability and validity.

🛠️ Multiple items vs. single items

Why multiple items are better:

  1. Content validity: Multiple items are often required to cover a construct adequately.
  2. Reliability: Single-item responses can be influenced by irrelevant factors (misunderstanding, distraction, simple errors). When several responses are summed or averaged, these irrelevant factors cancel out, producing more reliable scores.

Important: Multiple items must be structured so they can be combined into a single overall score by summing or averaging.

Don't confuse: Asking about annual income, credit score, and "thriftiness" rating does not create a true multiple-item measure of "financial responsibility" because there is no obvious way to combine these into one score. Instead, ask people to rate 10 statements about financial responsibility on the same five-point scale.

🛠️ Pilot testing

  • Test several people (family and friends often serve this purpose).
  • Observe them as they complete the task, time them, and ask afterward about ease, clarity of instructions, and anything else.
  • Better to discover problems before large-scale data collection.

⚠️ Threats to reliability and validity during implementation

⚠️ Testing conditions

  • Maximize reliability and validity by testing everyone under similar conditions—ideally quiet and free of distractions.
  • Group testing is efficient but can create distractions.
  • Use previous research as a guide.

⚠️ Participant reactivity

Participant reactivity: people react in various ways to being measured that reduce the reliability and validity of scores.

Forms of reactivity:

  • Socially desirable responding: Agreeable participants respond in ways they believe are expected or socially appropriate, not how they truly feel.
    • Example: People with low self-esteem agree "I feel I am a person of worth" not because they feel this way, but because they believe it is the socially appropriate response and don't want to look bad to the researcher.
  • Demand characteristics: Subtle cues that reveal how the researcher expects participants to behave.
    • Example: A participant whose attitude toward exercise is measured immediately after reading a passage about heart disease dangers might reasonably conclude the passage was meant to improve her attitude, so she responds more favorably because she believes she is expected to.
  • Researcher expectation bias: Your own expectations can bias participants' behaviors in unintended ways.

⚠️ Precautions to minimize reactivity

PrecautionHow it helps
Clear and brief procedureParticipants are not tempted to vent frustrations on your results.
Guarantee anonymityMake it clear to participants; seat them far apart in groups so they cannot see each other's responses; give all the same writing implement; allow sealing questionnaires into envelopes or drop box.
Avoid revealing hypothesisInformed consent requires telling what they will do, not your hypothesis or information suggesting how you expect them to respond. (E.g., title a questionnaire "Money Questionnaire" instead of "Are You Financially Responsible?")
Blind administrationHave a helper who is unaware of the measure's intent or hypothesis administer it.
Standardize interactionsAlways read the same instructions word for word.

🔍 Evaluating the measure with your data

🔍 Why re-evaluate even established measures

  • Even if a measure has been used extensively and shown evidence of reliability and validity, do not assume it worked as expected for your particular sample and testing conditions.
  • You now have additional evidence to add to the research literature.

🔍 Assessing test-retest reliability

  • In most research designs, participants are tested only once, so test-retest reliability cannot be assessed.
  • For a new measure, you might design a study specifically to test the same participants at two separate times.
  • Sometimes a study designed for another question still allows test-retest assessment.
    • Example: A psychology instructor measures students' attitude toward critical thinking at the beginning and end of the semester to see if there is change. Even if there is no change, he can look at the correlation between scores at the two times to assess test-retest reliability.

🔍 Assessing internal consistency

  • Customary for any multiple-item measure.
  • Usually done by looking at split-half correlation or Cronbach's alpha.

🔍 Assessing convergent and discriminant validity

  • If your study included more than one measure of the same construct or measures of conceptually distinct constructs, look at correlations among these measures to ensure they fit your expectations.
  • Experimental manipulation as criterion validity evidence: A successful manipulation provides evidence of criterion validity.
    • Example: MacDonald and Martineau manipulated participants' moods by having them think positive or negative thoughts. After the manipulation, their mood measure showed a distinct difference between the two groups. This simultaneously provided evidence that the mood manipulation worked and that the mood measure was valid.

🔍 What if data cast doubt on reliability or validity?

Ask why:

  • Something wrong with your measure or how you administered it.
  • Something wrong with your conceptual definition.
  • Your experimental manipulation failed.
    • Example: If a mood measure showed no difference between people instructed to think positive vs. negative thoughts, maybe participants did not actually think the thoughts they were supposed to, or the thoughts did not actually affect their moods.
  • Implication: "Back to the drawing board" to revise the measure, revise the conceptual definition, or try a new manipulation.

🔁 The ongoing nature of assessment

🔁 No single study is enough

The reliability and validity of a measure is not established by any single study but by the pattern of results across multiple studies.

  • Assessment of reliability and validity is an ongoing process.
  • Each new study adds evidence.
  • Scores must be correlated with variables they are expected to be correlated with and not correlated with conceptually distinct variables.
16

Practical Strategies for Psychological Measurement

Practical Strategies for Psychological Measurement

🧭 Overview

🧠 One-sentence thesis

Measuring psychological constructs effectively requires a four-step process—conceptual definition, operational definition, implementation, and evaluation—with careful attention to reliability and validity at every stage.

📌 Key points (3–5)

  • The four-step measurement process: conceptually define the construct, operationally define it (choose or create a measure), implement the measure, and evaluate its reliability and validity.
  • Use existing measures when possible: they save time, already have validity evidence, and allow comparison with prior research—unless you have a clear reason to create something new.
  • Multiple items beat single items: multiple items improve content validity (cover the construct better) and reliability (irrelevant factors cancel out when responses are combined).
  • Common confusion: reliability and validity are not proven by one study—they are assessed through an ongoing process across multiple studies with accumulating evidence.
  • Reactivity threatens validity: participants may respond in socially desirable ways or according to demand characteristics, so anonymity, brevity, and blinding help minimize bias.

🎯 The four-step measurement process

🎯 Step 1: Conceptually defining the construct

A clear and complete conceptual definition of a construct is a prerequisite for good measurement.

  • Without a precise definition, you cannot make sound decisions about how to measure.
  • Example: if you only vaguely want to measure "memory," you won't know whether to test vocabulary recall, photo recognition, skill execution, or long-ago experiences.
  • Why it matters: psychologists now view memory as semi-independent systems (e.g., long-term semantic memory for facts vs. procedural memory for skills), so you must specify which system you mean.
  • How to do it: read the research literature on the construct and pay close attention to how others have defined it—there is no substitute for this step.

🎯 Step 2: Operationally defining the construct

This step means deciding whether to use an existing measure or create your own, then selecting or designing the specific procedure.

🎯 Step 3: Implementing the measure

Carry out the measurement under conditions that maximize reliability and validity (covered in detail below).

🎯 Step 4: Evaluating the measure

After collecting data, assess the measure's reliability and validity based on your new evidence—even if it has been used successfully before.

📚 Using an existing measure

📚 Why use an existing measure

Advantages:

  • Saves time and effort creating your own.
  • Already has some validity evidence if it has been used successfully.
  • Your results can be compared and combined with previous findings.

Expectation: if a reliable and valid measure already exists, other researchers will expect you to use it unless you have a good and clearly stated reason not to.

📚 Choosing among existing measures

When multiple measures exist, you might choose based on:

  • Most common: widely used in the field.
  • Best evidence: strongest reliability and validity data.
  • Best fit: measures the particular aspect you care about (e.g., a physiological measure of stress if you are interested in underlying physiology).
  • Easiest to use: practical constraints matter.

Example: The Ten-Item Personality Inventory (TIPI) measures all Big Five personality dimensions with just 10 items. It is less reliable and valid than longer measures, but a researcher might choose it when testing time is severely limited.

📚 Where to find existing measures

Source typeDetails
Published research articlesMeasures created for scientific research are usually described in detail and free to use with proper citation; later articles may describe them briefly and reference the original.
Directory of Unpublished Experimental MeasuresPublished by the American Psychological Association; extensive catalog of measures used in previous research.
Proprietary measuresOwned by publishers; must be purchased (e.g., standard intelligence tests, Beck Depression Inventory, MMPI).
Reference booksTests in Print and Mental Measurements Yearbook provide details about proprietary measures and how to obtain them; often available in university libraries.

🛠️ Creating your own measure

🛠️ When to create a new measure

  • No existing measure of the construct exists.
  • Existing measures are too difficult or time-consuming.
  • You want to evaluate convergent validity by seeing if a new measure works the same way as existing ones.

🛠️ Start with existing measures as a foundation

Most new measures are variations of existing ones, so look to the research literature for ideas:

  • Modify an existing questionnaire.
  • Create a paper-and-pencil version of a computerized measure (or vice versa).
  • Adapt a measure traditionally used for another purpose.

Example: The famous Stroop task (people quickly name the colors that color words are printed in) has been adapted for social anxiety research—socially anxious people are slower at color naming when words have negative social connotations like "stupid."

🛠️ Strive for simplicity

  • Participants are not as interested in your research as you are.
  • They vary widely in ability to understand and carry out tasks.
  • Clear instructions: use simple language, present in writing or read aloud (or both).
  • Practice items: include one or more so participants become familiar with the task.
  • Opportunity for questions: build this in before continuing.
  • Keep it brief: avoid boring or frustrating participants to the point that responses become less reliable and valid.

🛠️ Multiple items vs. single items

It is nearly always better to include multiple items rather than a single item.

Two reasons:

  1. Content validity: multiple items are often required to adequately cover a construct.
  2. Reliability: single-item responses can be influenced by irrelevant factors (misunderstanding, momentary distraction, checking the wrong option), but when several responses are summed or averaged, these irrelevant factors cancel each other out.

Important constraint: multiple items must be structured so they can be combined into a single overall score by summing or averaging.

  • Bad example: to measure "financial responsibility," asking about annual income, credit score, and self-rated "thriftiness"—no obvious way to combine these into one score.
  • Good example: asking people to rate 10 statements about financial responsibility on the same five-point scale—responses can be summed or averaged.

🛠️ Pilot test your measure

The very best way to ensure your measure works: test it on several people (family and friends often serve this purpose).

  • Observe them as they complete the task.
  • Time them.
  • Ask afterward: Was it easy or difficult? Were instructions clear? Any other feedback?
  • Why: better to discover problems before large-scale data collection begins.

⚙️ Implementing the measure

⚙️ Maximize reliability and validity during testing

  • Similar conditions: test everyone under conditions that are ideally quiet and free of distractions.
  • Group testing: efficient but can create distractions that reduce reliability and validity; use previous research as a guide—if others have successfully tested in groups with your measure, consider doing so too.

⚙️ Participant reactivity

Participant reactivity: people react in various ways to being measured that reduce the reliability and validity of scores.

Forms of reactivity:

  • Socially desirable responding: agreeable participants respond in ways they believe are socially appropriate rather than truthfully.
    • Example: people with low self-esteem agree "I feel I am a person of worth" not because they really feel this way, but because they believe it is the socially appropriate response and don't want to look bad.
  • Demand characteristics: subtle cues that reveal how the researcher expects participants to behave.
    • Example: a participant whose attitude toward exercise is measured immediately after reading a passage about heart disease dangers might reasonably conclude the passage was meant to improve her attitude, so she responds more favorably because she believes the researcher expects it.
  • Researcher expectation bias: your own expectations can bias participants' behaviors in unintended ways.

Don't confuse: most reactivity is not intentional disruption—disagreeable participants might intentionally disrupt, but reactivity more often takes the opposite form (being too agreeable).

⚙️ Precautions to minimize reactivity

PrecautionHow it helps
Clear and brief procedureParticipants are not tempted to vent frustrations on your results.
Guarantee anonymityMake it clear you are doing so; seat participants far apart in groups so they cannot see each other's responses; give everyone the same writing implement; allow sealing questionnaires in envelopes or using a drop box.
Avoid revealing hypothesisInformed consent requires telling participants what they will do, but not your hypothesis or information suggesting how you expect them to respond (e.g., title a questionnaire "Money Questionnaire" instead of "Are You Financially Responsible?").
Blind administrationHave a helper who is unaware of the measure's intent or any hypothesis administer it; if not possible, standardize all interactions (e.g., always read the same instructions word for word).

🔍 Evaluating the measure

🔍 Why evaluate even established measures

  • Even if a measure has been used extensively and shown reliability and validity evidence, do not assume it worked as expected for your particular sample and testing conditions.
  • You now have additional evidence to add to the research literature.

🔍 Assessing test-retest reliability

  • Challenge: most research designs test participants only once, so test-retest reliability cannot be assessed.
  • Solution for new measures: design a study specifically to test the same participants at two separate times.
  • Opportunistic assessment: sometimes a study designed for another question still allows test-retest assessment.
    • Example: a psychology instructor measures students' attitude toward critical thinking at the beginning and end of the semester to see if there is change; even if there is no change, he can look at the correlation between scores at the two times.

🔍 Assessing internal consistency

Customary for any multiple-item measure: look at split-half correlation or Cronbach's alpha.

🔍 Assessing convergent and discriminant validity

  • If your study included more than one measure of the same construct, look at correlations to ensure they fit expectations (high correlation = convergent validity).
  • If your study included measures of conceptually distinct constructs, check that correlations are low (discriminant validity).
  • Experimental manipulation as criterion validity evidence: a successful manipulation provides evidence of criterion validity.
    • Example: MacDonald and Martineau manipulated moods by having participants think positive or negative thoughts; their mood measure showed a distinct difference between groups, simultaneously providing evidence that the manipulation worked and that the mood measure was valid.

🔍 What if data cast doubt on reliability or validity?

Ask why:

  • Something wrong with your measure or how you administered it?
  • Something wrong with your conceptual definition?
  • Your experimental manipulation failed?
    • Example: if a mood measure showed no difference between people instructed to think positive vs. negative thoughts, maybe participants did not actually think the thoughts or the thoughts did not affect their moods.

Next step: "back to the drawing board"—revise the measure, revise the conceptual definition, or try a new manipulation.

🔍 Reliability and validity as an ongoing process

Key principle from the excerpt: The reliability and validity of a measure is not established by any single study but by the pattern of results across multiple studies. Assessment is an ongoing process.

17

Experiment Basics

Experiment Basics

🧭 Overview

🧠 One-sentence thesis

Experiments are designed to establish causal relationships by manipulating an independent variable and controlling extraneous variables, making them high in internal validity even though they must balance trade-offs with external validity and other forms of validity.

📌 Key points (3–5)

  • What defines an experiment: manipulation of the independent variable (creating conditions) and control of extraneous variables to support causal conclusions.
  • Internal validity: experiments are high in internal validity because their design supports the conclusion that the independent variable caused observed differences in the dependent variable.
  • Four validities to assess: internal, external, construct, and statistical validity—researchers must prioritize because high validity in all four is often not possible.
  • Common confusion: correlation vs. causation—two variables being statistically related does not mean one causes the other; experiments address this by creating highly similar conditions that differ only in the independent variable.
  • Confounding variables: extraneous variables that differ across levels of the independent variable provide alternative explanations and threaten internal validity.

🔬 What makes a study an experiment

🔬 Two fundamental features

An experiment is a type of study designed specifically to answer the question of whether there is a causal relationship between two variables—whether changes in an independent variable cause changes in a dependent variable.

Feature 1: Manipulation of the independent variable

  • Researchers systematically vary the level of the independent variable.
  • The different levels are called conditions.
  • Example: Darley and Latané told participants there were either one, two, or five other students in a discussion—this is one independent variable (number of witnesses) with three conditions (not three independent variables).
  • Don't confuse: the number of conditions with the number of independent variables.

Feature 2: Control of extraneous variables

  • Researchers hold constant or minimize variability in variables other than the independent and dependent variables.
  • Example: Darley and Latané tested all participants in the same room, exposed them to the same emergency situation, and randomly assigned participants to conditions.

🔄 Manipulation vs. control

  • In everyday language, these words are similar, but researchers distinguish them clearly:
    • Manipulate the independent variable = systematically change its levels.
    • Control other variables = hold them constant.

⚠️ Active intervention required

  • Comparing groups that already differ before the study begins is not manipulation.
  • Example: comparing people who already keep a journal vs. those who don't is not an experiment—those groups likely differ in other ways (conscientiousness, introversion, stress levels).
  • Without active manipulation, the third-variable problem remains.

🎯 The four big validities

🔒 Internal validity

An empirical study is high in internal validity if the way it was conducted supports the conclusion that the independent variable caused any observed differences in the dependent variable.

Why experiments are high in internal validity:

  • The logic: if the researcher creates highly similar conditions and manipulates only the independent variable to produce one difference, then any later difference between conditions must have been caused by that variable.
  • Example: because the only difference in Darley and Latané's conditions was the number of students participants believed were involved, that difference in belief must have caused differences in helping behavior.

The assumption:

  • If two or more conditions are highly similar except for the manipulated variable, then that variable is responsible for observed differences.

🌍 External validity

An empirical study is high in external validity if the way it was conducted supports generalizing the results to people and situations beyond those actually studied.

Two types of realism:

  • Mundane realism: participants and situations are similar to those the researchers want to generalize to and that people encounter everyday.
    • Example: studying shoppers' cereal choices in a real grocery store = high mundane realism and high external validity.
  • Psychological realism: the same mental process is used in both the laboratory and the real world, even if the situation seems artificial.
    • Example: students judging color appeal on a computer screen may have low mundane realism but high psychological realism if the visual processing is the same.

Common critique and response:

  • Critique: experiments are often conducted under artificial conditions (undergraduates in labs filling out questionnaires).
  • Response 1: experiments need not be artificial—field experiments conducted entirely outside the lab can have high external validity (e.g., the hotel towel reuse study).
  • Response 2: experiments are often conducted to learn about psychological processes that operate across many people and situations, not just the specific scenario tested.
    • Example: Fredrickson's swimsuit study found women performed worse on math tests when wearing swimsuits due to self-objectification—a process likely to operate in many situations, even if no one else takes a math test in a swimsuit.

🏗️ Construct validity

Construct validity refers to the quality of the experiment's manipulations—how well the research question is operationalized in the study design.

Operationalization:

  • Converting the research question into an experiment design.
  • Example: Darley and Latané's research question was "does helping behavior become diffused?" They operationalized diffusion of responsibility by increasing the number of potential helpers.
  • Their construct validity was very high because the manipulations clearly spoke to the research question: there was a crisis, a way to help, and a way to test diffusion.

How the number of conditions affects construct validity:

  • Too few conditions may not clearly demonstrate the phenomenon.
    • Example: only two conditions (one or two students) might show the presence of others but not clearly demonstrate diffusion of responsibility—it could be social inhibition instead.
  • More conditions don't always increase construct validity—they may not reveal more about the phenomenon or may change it into something different.

📊 Statistical validity

Statistical validity speaks to whether the statistics conducted in the study support the conclusions that are made.

Common critique:

  • "The study didn't have enough participants."
  • This is actually about statistical validity, not external validity—small samples make it difficult to determine whether predicted differences or relationships were found.

Power analysis:

  • The number of conditions and total participants determine the overall effect size.
  • A power analysis determines whether you are likely to find a real difference.
  • Best practice: conduct a power analysis when designing a study to recruit the appropriate number of participants.

⚖️ Prioritizing validities

  • It is often not possible to have high validity in all four areas—researchers must prioritize.
  • Example: Cialdini's hotel towel study had high external validity but more modest statistical validity.
  • General pattern: most psychology studies have high internal and construct validity but sometimes sacrifice external validity.

🎛️ Manipulation and control in practice

🎛️ How to manipulate the independent variable

  • Manipulation = changing the level of the independent variable systematically so different groups are exposed to different levels (or the same group is exposed to different levels at different times).
  • Example: to study whether expressive writing affects health, instruct some participants to write about traumatic experiences and others to write about neutral experiences.
  • Researchers give conditions short descriptive names: "traumatic condition" and "neutral condition."

✅ Manipulation checks

  • When the independent variable is a construct that can only be manipulated indirectly, researchers include a manipulation check.
  • A manipulation check is a separate measure of the construct the researcher is trying to manipulate.
  • Example: if trying to manipulate stress levels by telling participants they must give a speech, give them a stress questionnaire or measure blood pressure afterward to verify the manipulation worked.

🔧 Controlling extraneous variables

An extraneous variable is anything that varies in the context of a study other than the independent and dependent variables.

Examples:

  • Participant variables (individual differences): writing ability, diet, shoe size, IQ.
  • Situational or task variables: time of day, whether writing by hand or on computer, weather.

Why control them:

  • Many extraneous variables are likely to affect the dependent variable.
  • Example: participants' health is affected by many things other than expressive writing.
  • Controlling extraneous variables by holding them constant makes it easier to separate the effect of the independent variable from the effects of extraneous variables.

🔊 Extraneous variables as noise

🔊 How noise obscures effects

  • Extraneous variables add variability or "noise" to the data, making it harder to detect the effect of the independent variable.
  • Example: in a mood and memory experiment, ideally every participant in the happy mood condition would recall exactly four events and every sad mood participant exactly three—the effect would be obvious.
  • In reality, participants vary: some in the happy condition recall fewer events (fewer memories to draw on, less effective strategies, less motivated); some in the sad condition recall more.
  • Although the mean difference is the same, it is much less obvious in the context of greater variability.

🛡️ Holding variables constant

Situational/task variables:

  • Test all participants in the same location, give identical instructions, treat them the same way.

Participant variables:

  • Limit participants to one specific category.
  • Example: many language studies limit participants to right-handed people, who generally have language areas in the left hemisphere; left-handed people may have language areas in the right hemisphere or distributed across both, which adds noise.

Trade-off:

  • Limiting participants to a very specific category (e.g., 20-year-old, heterosexual, female, right-handed psychology majors) reduces noise but lowers external validity.
  • It may be unclear whether results from younger heterosexual women apply to older homosexual men.
  • In many situations, the advantages of a diverse sample outweigh the reduction in noise.

⚠️ Confounding variables

⚠️ What makes a variable confounding

A confounding variable is an extraneous variable that differs on average across levels of the independent variable.

Acceptable extraneous variation:

  • If participants with lower and higher IQs are present at each level of the independent variable so that average IQ is roughly equal, the variation is acceptable (even desirable).

Unacceptable confounding:

  • If participants at one level have substantially lower IQs on average and participants at another level have substantially higher IQs, IQ becomes a confounding variable.

🤔 Why confounding is a problem

  • "Confound" means to confuse.
  • Confounding variables differ across conditions—just like the independent variable—so they provide an alternative explanation for any observed difference in the dependent variable.
  • Example: if participants in a positive mood condition scored higher on a memory task than those in a negative mood condition, but the positive mood group also had higher IQs on average, it is unclear whether the higher scores were caused by positive moods or higher IQs.

🛡️ Avoiding confounding variables

Method 1: Hold extraneous variables constant

  • Example: limit participants to those with IQs of exactly 100.
  • Downside: not always desirable (reduces external validity).

Method 2: Random assignment to conditions

  • A more general approach (discussed elsewhere in the source material).

📚 Key distinctions

📚 Correlation vs. causation

  • Two variables being statistically related does not necessarily mean one causes the other.
  • "Correlation does not imply causation."
  • Example: if people who exercise regularly are happier, it could mean:
    • Exercising increases happiness, OR
    • Greater happiness causes people to exercise (directionality problem), OR
    • Better physical health causes both exercise and happiness (third-variable problem).
  • Experiments address this by showing two variables are statistically related in a way that supports a causal conclusion.

📚 Manipulation vs. comparison

  • Manipulation (experiment): active intervention by the researcher to change the independent variable.
  • Comparison (not an experiment): comparing groups that already differ before the study begins.
  • Example: comparing the health of people who already keep a journal vs. those who don't is not manipulation—those groups likely differ in other ways (conscientiousness, introversion, stress).
  • Without manipulation, the third-variable problem remains, so no causal conclusion is possible.
18

Experimental Design

Experimental Design

🧭 Overview

🧠 One-sentence thesis

Researchers must choose between between-subjects designs (each participant experiences one condition) and within-subjects designs (each participant experiences all conditions), using random assignment and appropriate control conditions to isolate the effect of the independent variable while managing confounding variables and carryover effects.

📌 Key points (3–5)

  • Between-subjects vs within-subjects: the core choice is whether each participant sees one condition or all conditions; each approach has distinct trade-offs.
  • Random assignment is essential: it controls extraneous variables by distributing participant characteristics evenly across conditions, preventing confounds.
  • Control conditions matter: no-treatment, placebo, and waitlist controls help distinguish real treatment effects from expectation-driven placebo effects.
  • Common confusion: random assignment ≠ random sampling; assignment distributes participants to conditions, sampling selects participants from a population.
  • Carryover effects in within-subjects designs: practice, fatigue, and context effects can confound results unless counterbalancing is used.

🔀 Between-subjects vs within-subjects designs

🔀 Between-subjects experiments

Between-subjects experiment: each participant is tested in only one condition.

  • Each person experiences a single level of the independent variable.
  • Example: 100 students split so 50 write about a traumatic event and 50 write about a neutral event.
  • Key requirement: groups must be highly similar on average (same proportion of men/women, similar IQs, motivation, health, etc.) to avoid confounding.
  • Advantage: conceptually simpler, less testing time per participant, no carryover effects.
  • Disadvantage: extraneous participant variables (IQ, personality, etc.) can differ between groups and add noise.

🔁 Within-subjects experiments

Within-subjects experiment: each participant is tested under all conditions.

  • The same people experience every level of the independent variable.
  • Example: the same group judges both an attractive defendant and an unattractive defendant.
  • Advantage: maximum control of extraneous participant variables (same people = same IQ, background, etc.); statistical procedures can remove participant variability, making effects easier to detect.
  • Disadvantage: vulnerable to carryover effects (practice, fatigue, context effects); participants may guess the hypothesis more easily.

⚖️ Choosing between the two

FactorBetween-subjectsWithin-subjects
Participant variablesLess control (different people)Maximum control (same people)
Testing timeShorter per personLonger per person
Carryover effectsNonePossible (needs counterbalancing)
Conceptual simplicitySimplerMore complex
  • Rule of thumb: if you can conduct a within-subjects experiment with proper counterbalancing in the available time and carryover effects are not a serious concern, prefer within-subjects.
  • Use between-subjects if time is limited (e.g., testing shoppers in a store) or if the treatment produces long-term change that makes control-condition testing impossible afterward.
  • Don't confuse: using one design in one study doesn't prevent using the other design in a follow-up study on the same question.

🎲 Random assignment

🎲 What random assignment is

Random assignment: using a random process to decide which participants are tested in which conditions.

  • Not the same as random sampling: random sampling selects people from a population; random assignment distributes a sample into conditions.
  • Purpose: control extraneous variables across conditions so they don't become confounding variables.
  • Example: flipping a coin for each participant—heads = Condition A, tails = Condition B.

🎲 How it works in practice

Strict criteria:

  1. Each participant has an equal chance of being assigned to each condition.
  2. Each assignment is independent of others.

Practical implementation:

  • For two conditions: flip a coin per participant.
  • For three conditions: generate a random integer (1, 2, or 3) per participant.
  • Usually a full sequence is created ahead of time; each new participant gets the next condition in the sequence.

🧱 Block randomization

Block randomization: all conditions occur once in the sequence before any repeat; within each block, conditions appear in random order.

  • Keeps group sizes as equal as possible (statistically most efficient for a fixed number of participants).
  • Example table (9 participants, 3 conditions):
ParticipantCondition
1A
2C
3B
4B
5C
6A
7C
8B
9A
  • Tools like Research Randomizer can generate these sequences automatically.

🎲 Limitations and strengths

  • Not perfect: by chance, one condition might end up with older, more tired, or more motivated participants on average.
  • Why it's still strong:
    • Works better than expected, especially with large samples.
    • Inferential statistics account for random assignment's fallibility.
    • Any resulting confound is likely to be detected when the experiment is replicated.
  • Always considered a strength of experimental design.

🧪 Treatment and control conditions

🧪 Why control conditions are needed

Treatment: any intervention meant to change people's behavior for the better (psychotherapies, medical treatments, learning interventions, etc.).

  • To determine if a treatment works, compare a treatment condition (receives the treatment) with a control condition (does not receive it).
  • If the treatment group ends up better off, the researcher can conclude the treatment works.
  • This design is often called a randomized clinical trial in medical/psychotherapy research.

🚫 No-treatment control condition

No-treatment control condition: participants receive no treatment whatsoever.

  • Problem: placebo effects.
  • Placebo: a simulated treatment lacking any active ingredient.
  • Placebo effect: a positive effect from an inert treatment, probably driven by expectations of improvement.
  • Example: chicken soup for a cold, soap under bedsheets for leg cramps—likely just placebos.
  • Expectations can reduce stress/anxiety/depression, alter perceptions, even improve immune function.
  • Why it's a problem: if the treatment group improves more than the no-treatment group, you can't tell if the treatment itself worked or if expectations caused the improvement.

💊 Placebo control condition

Placebo control condition: participants receive a placebo that looks like the treatment but lacks the active ingredient.

  • Example: treatment group takes a pill with active ingredient; placebo group takes an identical-looking sugar pill.
  • In psychotherapy research: placebo might be unstructured talk sessions with a therapist.
  • Logic: if both groups expect to improve, any extra improvement in the treatment group must be due to the treatment itself, not expectations.
  • Informed consent: participants must be told they'll be assigned to treatment or placebo (but not which one until the study ends).
  • Often, control participants are offered the real treatment afterward.

⏳ Waitlist control condition

Waitlist control condition: participants are told they will receive the treatment but must wait until the treatment group finishes.

  • Allows comparison of those currently receiving treatment with those expecting it later.
  • Both groups expect improvement, so expectations are controlled.

🆚 Best-available-alternative control

  • Compare a new treatment with the best existing treatment (not just no treatment).
  • Both groups receive a treatment, so expectations are similar.
  • Better research question: not "Does it work?" but "Does it work better than what's already available?"

💡 The powerful placebo

  • Placebos work not just for "psychological" disorders (depression, anxiety, insomnia) but also for "physiological" ones (asthma, ulcers, warts).
  • Even sham surgery can be as effective as real surgery.
  • Example study: arthroscopic knee surgery vs sham surgery (small incisions, tranquilizer, but no actual procedure)—both groups improved equally in pain and function.
  • Don't confuse: placebo effects are real improvements, not imaginary; they just aren't caused by the treatment's active ingredient.

🔄 Carryover effects and counterbalancing

🔄 What carryover effects are

Carryover effect: an effect of being tested in one condition on participants' behavior in later conditions.

Three main types:

  1. Practice effect: participants perform better in later conditions because they've practiced the task.
  2. Fatigue effect: participants perform worse in later conditions because they're tired or bored.
  3. Context effect: being tested in one condition changes how participants perceive stimuli or interpret the task in later conditions.

Example of context effect: an average-looking defendant might be judged more harshly after participants judge an attractive defendant than after judging an unattractive defendant.

🔄 Why carryover effects are problematic

  • They make it easier for participants to guess the hypothesis.
    • Example: judging an attractive then unattractive defendant → participant guesses attractiveness affects guilt judgments → might judge the unattractive defendant more harshly (or similarly, to be "fair").
  • Order becomes a confounding variable: if attractive is always first and unattractive always second, any difference could be due to order (boredom, fatigue) rather than attractiveness itself.
  • Don't confuse: carryover effects can be interesting in their own right (e.g., does one person's attractiveness depend on others we've seen recently?), but when not the focus, they're a problem.

🔀 Counterbalancing as the solution

Counterbalancing: testing different participants in different orders.

  • Example (2 conditions): some participants do attractive-then-unattractive; others do unattractive-then-attractive.
  • Example (3 conditions): six possible orders (ABC, ACB, BAC, BCA, CAB, CBA); some participants tested in each order.
  • Participants are randomly assigned to orders (random assignment still plays a key role in within-subjects designs).

Two ways counterbalancing helps:

  1. Controls order: order is no longer a confounding variable because each condition appears first for some participants and second for others.
  2. Detects carryover effects: data can be analyzed separately for each order to see if order had an effect.

🔢 Latin square design

  • An efficient counterbalancing method: equal rows and columns, no treatment repeats in a row or column (like Sudoku).
  • Example for four treatments:
ABCD
BCDA
CDAB
DABC

🔀 Simultaneous within-subjects designs

  • Instead of testing one condition at a time, mix stimuli from all conditions in a single sequence.
  • Example: instead of judging 10 attractive defendants then 10 unattractive ones, present all 20 in a mixed sequence.
  • Or: study a single list with both negative and positive adjectives, then recall as many as possible.
  • Order of stimuli is often randomized differently for each participant.

🧠 Context and design trade-offs

🧠 When lack of context is a problem

  • Between-subjects designs can create misleading results because participants lack context.
  • Example study: one group rated the number 9, another rated 221 on a 1-to-10 scale (1 = very very small, 10 = very very large).
    • Result: 9 received a mean rating of 5.13; 221 received 3.10—participants rated 9 as larger than 221!
    • Why: participants spontaneously compared 9 with other one-digit numbers (relatively large) and 221 with other three-digit numbers (relatively small).
  • Don't confuse: sometimes the context effects created by within-subjects designs are a smaller problem than the lack of context in between-subjects designs.

🧠 Mixed methods approach

  • Researchers can use both between-subjects and within-subjects designs to answer the same research question in different studies.
  • Using one type doesn't preclude using the other; professional researchers often take this mixed approach.
19

Conducting Experiments

Conducting Experiments

🧭 Overview

🧠 One-sentence thesis

Conducting an experiment requires careful attention to participant recruitment, standardized procedures, and pilot testing to minimize extraneous variables and ensure the study works as planned.

📌 Key points (3–5)

  • Recruitment strategies: use formal subject pools, advertisements, or personal appeals; field experiments require well-defined selection rules to avoid bias.
  • Standardization is critical: procedures must be identical across all conditions to prevent extraneous variables from becoming confounding variables.
  • Experimenter expectancy effects: experimenters' expectations can unintentionally influence participant behavior, so blinding and standardization are essential.
  • Common confusion: volunteers differ systematically from non-volunteers (more educated, higher IQ, more sociable), which can affect external validity.
  • Pilot testing catches problems: small-scale trials reveal whether instructions are clear, procedures work correctly, and the study takes the expected time.

👥 Recruiting participants

👥 Where to find participants

  • Formal subject pools: established groups (e.g., introductory psychology students) who have agreed to participate, usually via online sign-up systems.
  • Advertisements and personal appeals: post notices or speak directly to groups representing the target population.
  • Example: a researcher studying older adults might present at a retirement community meeting to ask for volunteers.

⚠️ The volunteer problem

Volunteers: people who agree to participate in research, even if they receive compensation like course credit or small payments.

Research shows volunteers differ predictably from non-volunteers:

CharacteristicVolunteers vs non-volunteers
Interest in topicMore interested
EducationMore educated
Need for approvalGreater need
IQHigher
SociabilityMore sociable
Social classHigher
  • Why it matters: if volunteers behave differently than the general population, external validity suffers.
  • Example: a rational argument might work better on volunteers (who tend to be more educated) than on the general population.
  • Don't confuse: this is about who volunteers, not about how many people you recruit.

🎯 Selection in field experiments

  • In field experiments, researchers often select rather than recruit participants.
  • Example from the excerpt: a confederate on a stairway gazed at shoppers and either smiled or didn't; later another confederate dropped diskettes to see if the shopper would help.
  • Critical rule: selection must follow well-defined rules established before data collection begins.
  • In the example: the confederate gazed at the first person aged 20–50 he encountered; only if they gazed back did they become a participant.
  • Why strict rules matter: prevents bias—e.g., choosing friendly-looking shoppers when set to smile and unfriendly-looking ones when not smiling.

🔧 Standardizing procedures

🔧 Why standardization matters

  • Extraneous variables are surprisingly easy to introduce: one experimenter gives clear instructions, another gives vague ones; one is warm, another is cold.
  • If these variables affect behavior, they add noise and make the independent variable harder to detect.
  • If they vary across conditions, they become confounding variables and offer alternative explanations.
  • Example: if a treatment group is tested by a warm experimenter and a control group by a cold one, apparent treatment effects might actually be experimenter demeanor effects.

🧪 Experimenter expectancy effects

Experimenter expectancy effect: when experimenters' expectations about how participants "should" behave unintentionally influence the results.

  • Experimenters might unconsciously give clearer instructions, more encouragement, or more time to participants they expect to perform better.
  • Classic example: students trained rats to run mazes; some were told their rats were "maze-bright" (bred to learn well), others told "maze-dull" (bred to learn poorly). The "maze-bright" rats performed better over five days, even though all rats were genetically similar.
  • How it happened: students who expected better performance felt more positively about their rats and handled them more friendly.
  • Sex of experimenter also matters: male and female experimenters interact differently with participants, and participants respond differently to each. Example: participants tolerated icy water pain longer when the experimenter was the opposite sex.

📋 How to standardize

The excerpt lists specific strategies:

  • Written protocol: specifies everything experimenters do and say from greeting to dismissal.
  • Standard instructions: participants read themselves or experimenters read word-for-word.
  • Automate: use software or simple slide shows for as much of the procedure as possible.
  • Anticipate questions: raise and answer them in instructions or develop standard answers.
  • Train together: multiple experimenters practice the protocol on each other.
  • Counterbalance: each experimenter tests participants in all conditions.

🙈 Blinding techniques

Double-blind study: neither participants nor experimenters know which condition each participant is assigned to.

Single-blind study: participants don't know their condition, but experimenters do.

  • Purpose: minimize experimenter expectancy effects by minimizing expectations.
  • Example: in a drug study, neither participant nor experimenter knows who receives the drug vs. placebo.
  • When blinding isn't possible: if you're both investigator and only experimenter, or if the experimenter must carry out different procedures in different conditions.

📝 Record keeping and preparation

📝 What to record

Essential records to keep:

  • Written sequence of conditions generated before the study.
  • For each participant: basic demographics, date/time/place of testing, experimenter name.
  • Space for comments about unusual occurrences (confused participant, equipment problems) or questions that arise.
  • Identification numbers assigned consecutively (starting with 1) to each participant.
  • These numbers should also appear on response sheets and questionnaires to keep materials together.

Why it matters: useful later for analyzing sex differences, experimenter effects, or investigating questions about particular participants or sessions.

🧪 Pilot testing

🧪 What pilot testing is

Pilot test: a small-scale study conducted to make sure a new procedure works as planned.

  • Participants can be recruited formally (from a subject pool) or informally (family, friends, classmates).
  • Number can be small but should be enough to give confidence the procedure works.

✅ Questions pilot testing answers

The excerpt lists specific questions:

Question categoryExamples
ComprehensionDo participants understand instructions? What misunderstandings, mistakes, or questions arise?
ExperienceDo participants become bored or frustrated?
ManipulationIs an indirect manipulation effective? (Requires a manipulation check.) Can participants guess the research question?
LogisticsHow long does the procedure take? Are computer programs working properly? Are data being recorded correctly?

🗣️ Getting honest feedback

  • Observe participants carefully during the procedure and talk with them afterward.
  • Participants often hesitate to criticize in front of the researcher.
  • Make sure they understand their participation is part of a pilot test and you genuinely want feedback to improve the procedure.
  • Iterative process: if problems exist, solve them, pilot test again, and continue until ready to proceed with the actual study.
20

Overview of Nonexperimental Research

Overview of Nonexperimental Research

🧭 Overview

🧠 One-sentence thesis

Nonexperimental research—which lacks manipulation of independent variables or random assignment—is appropriate and necessary when experimental methods are impossible, unethical, or unsuited to the research question, though it generally provides weaker causal evidence than experiments.

📌 Key points (3–5)

  • What defines nonexperimental research: lacks manipulation of an independent variable, random assignment to conditions, or both.
  • When to use it: when the research question concerns a single variable, noncausal relationships, unmanipulable variables, or exploratory/experiential questions.
  • Three broad types: single-variable research, correlational and quasi-experimental research, and qualitative research.
  • Common confusion—internal validity: experimental research is highest in internal validity, correlational lowest, and quasi-experimental in between; but a poorly designed experiment can be worse than a well-designed quasi-experiment.
  • Key limitation: nonexperimental research generally cannot provide strong evidence that changes in one variable cause differences in another.

🔬 What nonexperimental research is and isn't

🔬 Core definition

Nonexperimental research: research that lacks the manipulation of an independent variable, random assignment of participants to conditions or orders of conditions, or both.

  • The excerpt acknowledges it is "unfair" to define this diverse set of approaches by what they are not, but the distinction from experimental research is considered extremely important in psychology.
  • The key difference: experimental research can provide strong causal evidence; nonexperimental research generally cannot.
  • This does not mean nonexperimental research is less important or inferior—it serves different purposes.

⚖️ Relationship to experimental research

  • The choice between experimental and nonexperimental approaches is generally dictated by the nature of the research question.
  • If a question is about a causal relationship and involves a manipulable independent variable, experimental is typically preferred.
  • Otherwise, nonexperimental is preferred.
  • The two approaches can also be used in complementary ways to address the same question from different angles.
  • Example: nonexperimental studies established a relationship between violent television and aggressive behavior; experimental studies then confirmed the relationship is causal.

🎯 When to choose nonexperimental research

🎯 Four key scenarios

The excerpt lists situations where nonexperimental research is appropriate or necessary:

ScenarioExample from excerpt
Single variable focus"How accurate are people's first impressions?"
Noncausal relationship"Is there a correlation between verbal intelligence and mathematical intelligence?"
Unmanipulable variable or no random assignment possible"Does damage to a person's hippocampus impair the formation of long-term memory traces?"
Broad/exploratory/experiential questions"What is it like to be a working mother diagnosed with depression?"

🚫 When manipulation or random assignment is impossible

  • The independent variable cannot be manipulated (e.g., you cannot ethically damage someone's hippocampus).
  • Participants cannot be randomly assigned to conditions or orders of conditions.
  • These constraints make experimental research impossible, even if the question is about causality.

📊 Three types of nonexperimental research

📊 Single-variable research

Single-variable research: focuses on a single variable rather than a statistical relationship between two variables.

  • The excerpt notes there is no widely shared term for this type.
  • Example: Milgram's original obedience study observed all participants performing the same task under the same conditions, measuring the extent to which they obeyed.
  • Example: Loftus and Pickrell's study on false memories—whether participants "remembered" mildly traumatic childhood events that didn't actually happen (nearly a third did).
  • What it can do: answer interesting and important questions about a single variable.
  • What it cannot do: answer questions about statistical relationships between variables.

⚠️ Common mistake with single-variable research

The excerpt warns that beginning researchers sometimes design single-variable studies when they actually want to study relationships:

  • Flawed approach: Students interested in bullying and self-esteem measure only the self-esteem of bullied children.
  • Why it's flawed: This tells you something about bullied children's self-esteem, but not what you really want to know—how it compares to non-bullied children's self-esteem.
  • What's needed: The sample must include both bullied and non-bullied students, introducing a second variable.

🔗 Correlational research

Correlational research: the researcher measures two variables of interest with little or no attempt to control extraneous variables and then assesses the relationship between them.

  • Example: Finding out whether middle-school students have been bullied and then measuring each student's self-esteem.
  • No manipulation of variables.
  • No random assignment.
  • Focuses on statistical relationships.

🔀 Quasi-experimental research

Quasi-experimental research: the researcher manipulates an independent variable but does not randomly assign participants to conditions or orders of conditions.

  • Example: Starting an antibullying program at one school and comparing bullying incidence with a similar school that has no program.
  • Key difference from true experiments: lacks random assignment.
  • Key difference from correlational: includes manipulation of an independent variable.

📝 Qualitative research

Qualitative research: the data are usually nonnumerical and therefore cannot be analyzed using statistical techniques.

  • Contrasts with quantitative research, where data consist of numbers analyzed using statistical techniques.
  • Example: Rosenhan's study of people in a psychiatric ward—data were notes taken by "pseudopatients" (people pretending to have heard voices) plus hospital records.
  • Analysis consists mainly of written descriptions supported by concrete examples.
  • The excerpt notes: "Upon being admitted, I and other pseudopatients took the initial physical examinations in a semipublic room, where staff members went about their own business as if we were not there."
  • Uses separate analysis tools depending on the research question (e.g., thematic analysis for emerging themes, conversation analysis for how words were said).

🎯 Internal validity across research types

🎯 What internal validity means here

Internal validity: the extent to which the design of a study supports the conclusion that changes in the independent variable caused any observed differences in the dependent variable.

📊 Ranking by internal validity

Research typeInternal validity levelWhy
ExperimentalHighestAddresses directionality and third-variable problems through manipulation and control of extraneous variables through random assignment
Quasi-experimentalMiddleManipulation addresses some problems, but lack of random assignment fails to address others
CorrelationalLowestFails to address either directionality or third-variable problems

🔍 Why correlational is lowest

  • If the average score on the dependent variable differs across levels of the independent variable, the independent variable could be responsible.
  • But there are other interpretations:
    • Direction of causality could be reversed.
    • A third variable could be causing differences in both the independent and dependent variables.

🔍 Why quasi-experimental is in the middle

Example: Researcher finds two similar schools, starts an antibullying program in one, finds fewer bullying incidents in the "treatment school."

  • No directionality problem: The number of bullying incidents clearly did not determine which school got the program.
  • Still a problem: Lack of random assignment means students in the treatment school could differ from control school students in some other way that explains the difference in bullying.

⚠️ Important nuance

  • There is overlap in internal validity across types.
  • A poorly designed experiment with many confounding variables can be lower in internal validity than a well-designed quasi-experiment with no obvious confounding variables.
  • Internal validity is only one of several validities to consider (as noted in Chapter 5).
21

Correlational Research

Correlational Research

🧭 Overview

🧠 One-sentence thesis

Correlational research measures two variables and assesses their statistical relationship without manipulating either variable, making it useful when causal manipulation is impossible, impractical, or unethical, but leaving open questions about causality.

📌 Key points (3–5)

  • What defines correlational research: measuring two variables and assessing their relationship without manipulating either one—the key is no manipulation, not the type of variables or where data are collected.
  • Why researchers choose it: either because they don't believe the relationship is causal (e.g., validating two tests) or because they cannot manipulate the independent variable for practical or ethical reasons.
  • Common confusion: correlational research is not limited to quantitative variables—it can involve categorical variables (e.g., nationality, occupation); what matters is that neither variable is manipulated.
  • The causality problem: even strong correlations cannot prove causation due to the directionality problem (which causes which?) and the third-variable problem (something else causes both).
  • Data collection methods: naturalistic observation and archival data are strongly associated with correlational research, though the research type is defined by lack of manipulation, not by data collection location.

🔍 What correlational research is

🔍 Core definition

Correlational research: a type of nonexperimental research in which the researcher measures two variables and assesses the statistical relationship (i.e., the correlation) between them with little or no effort to control extraneous variables.

  • The defining feature is that neither variable is manipulated.
  • Both variables are measured as they naturally occur.
  • The terms "independent variable" and "dependent variable" often do not apply because there is no manipulation.

🎯 Two reasons researchers choose correlational designs

Reason 1: The relationship is not believed to be causal

  • Example: A researcher validates a brief extraversion test by comparing scores with a longer, validated extraversion test.
  • Neither test score causes the other—the researcher simply wants to see if they correlate.
  • No independent variable exists to manipulate.

Reason 2: The relationship may be causal, but manipulation is impossible, impractical, or unethical

  • Example: Kanner and colleagues studied whether daily hassles (rude salespeople, heavy traffic) affect physical and psychological symptoms.
  • They could not manipulate the number of hassles participants experienced.
  • They had to measure both hassles and symptoms using self-report questionnaires.
  • The strong positive relationship they found is consistent with hassles causing symptoms, but also consistent with symptoms causing hassles, or a third variable (e.g., neuroticism) causing both.

⚠️ Common misconception about variable types

The misconception: Correlational research must involve two quantitative variables (e.g., test scores, counts).

The reality: Correlational research is defined by how the study is conducted (no manipulation), not by variable type.

  • Variables can be quantitative or categorical.
  • Example: Administering the Rosenberg Self-Esteem Scale to 50 American and 50 Japanese university students "feels" like a between-subjects experiment, but it is correlational because the researcher did not manipulate nationality.
  • Example: Comparing professors and factory workers on need for cognition is correlational because occupations were not manipulated.

🧪 Distinguishing experiments from correlational studies

The excerpt provides a key example with to-do lists and stress:

  • If the researcher randomly assigned some participants to make daily to-do lists and others not to → experiment → can conclude that making lists reduced stress.
  • If the researcher simply asked participants whether they made daily to-do lists → correlational study → can only conclude that the variables are related.

Don't confuse: What defines a study as experimental or correlational is not:

  • The variables being studied
  • Whether variables are quantitative or categorical
  • The type of graph or statistics used

It is how the study is conducted—whether variables are manipulated or only measured.

🌍 Data collection methods in correlational research

🌍 Overview of methods

  • The defining feature remains: neither variable is manipulated.
  • It does not matter how or where variables are measured.
  • However, certain data collection approaches are strongly associated with correlational research.

Examples of settings:

  • Laboratory: participants complete computerized tasks; researcher assesses relationship between scores.
  • Field: researcher asks shoppers about attitudes and habits at a mall; assesses relationship between these variables.
  • Both are correlational because no independent variable is manipulated.

👀 Naturalistic observation

Naturalistic observation: an approach to data collection that involves observing people's behavior in the environment in which it typically occurs.

Key characteristics:

  • A type of field research (not laboratory research).
  • Observations made as unobtrusively as possible—participants often unaware they are being studied.
  • Examples: observing shoppers in a grocery store, children on a playground, psychiatric inpatients in their wards.

Ethical considerations:

  • Acceptable if participants remain anonymous and behavior occurs in a public setting where people would not normally expect privacy.
  • Example: Grocery shoppers putting items in carts are engaged in public behavior easily observable by employees and other shoppers → ethically acceptable.
  • Counterexample: Observing "bathroom behavior" violates reasonable expectation of privacy even in a public restroom.

🚶 Example: Pace of life study

Levine and Norenzayan used naturalistic observation to study differences in "pace of life" across countries:

  • Measure: observing pedestrians in large cities to see how long it took them to walk 60 feet.
  • Finding: People in Canada and Sweden covered 60 feet in just under 13 seconds on average; people in Brazil and Romania took close to 17 seconds.

📋 Sampling in naturalistic observation

The issue: When, where, and under what conditions will observations be made? Who exactly will be observed?

Levine and Norenzayan's sampling process:

  • Male and female walking speed measured in at least two locations in main downtown areas in each city.
  • Measurements taken during main business hours on clear summer days.
  • All locations were flat, unobstructed, had broad sidewalks, and were sufficiently uncrowded.
  • To control for socializing effects, only pedestrians walking alone were timed.
  • Children, individuals with obvious physical handicaps, and window-shoppers were not timed.
  • 35 men and 35 women timed in most cities.

Why precise specification matters:

  • Makes data collection manageable for observers.
  • Provides control over important extraneous variables (e.g., controlling for weather effects by observing on clear summer days in all countries).

📏 Measurement in naturalistic observation

The issue: What specific behaviors will be observed?

Simple measurement:

  • Levine and Norenzayan's study: measured out a 60-foot distance, used a stopwatch to time participants.

Complex measurement (coding):

  • Example: Kraut and Johnston studied bowlers' reactions to their shots, both when facing the pins and when turning toward companions.
  • Created a list of reactions: "closed smile," "open smile," "laugh," "neutral face," "look down," "look away," "face cover."
  • Observers memorized the list, practiced by coding videotaped bowlers.
  • During the study, observers spoke into an audio recorder describing reactions.
  • Key finding: Bowlers rarely smiled while facing the pins; much more likely to smile after turning toward companions → smiling is not purely an expression of happiness but also a form of social communication.

🔢 Coding process

Coding: a process in which observations require a judgment on the part of the observers.

Steps:

  1. Clearly define a set of target behaviors.
  2. Observers categorize participants individually in terms of which behavior they engaged in and the number of times.
  3. Observers might record the duration of each behavior.

Interrater reliability requirement:

  • Target behaviors must be defined so that different observers code them in the same way.
  • Researchers must demonstrate interrater reliability: multiple raters code the same behaviors independently, showing close agreement.
  • Example: Kraut and Johnston video recorded a subset of participants' reactions; two observers independently coded them and agreed 97% of the time → good interrater reliability.

📂 Archival data

Archival data: data that have already been collected for some other purpose.

Example: Implicit egotism study:

  • Pelham and colleagues studied "implicit egotism"—the tendency for people to prefer people, places, and things similar to themselves.
  • Examined Social Security records to show that women named Virginia, Georgia, Louise, and Florence were especially likely to have moved to the states of Virginia, Georgia, Louisiana, and Florida, respectively.

📊 Measurement with archival data

Simple measurement:

  • Counting the number of people named Virginia who live in various states based on Social Security records is relatively straightforward.

Complex measurement (content analysis):

  • Example: Peterson and colleagues studied the relationship between optimism and health using data collected many years earlier for a study on adult development.
  • In the 1940s, healthy male college students completed an open-ended questionnaire about difficult wartime experiences.
  • In the late 1980s, researchers reviewed questionnaire responses to obtain a measure of explanatory style—habitual ways of explaining bad events.
    • More pessimistic people: blame themselves, expect long-term negative consequences affecting many life aspects.
    • More optimistic people: blame outside forces, expect limited negative consequences.

Procedure:

  1. All negative events and causal explanations mentioned in questionnaire responses were identified and written on index cards.
  2. Cards given to a separate group of raters who rated each explanation on three dimensions of optimism-pessimism.
  3. Ratings averaged to produce an explanatory style score for each participant.
  4. Researchers assessed the statistical relationship between men's explanatory style as undergraduates and archival measures of their health at approximately 60 years of age.
  5. Result: The more optimistic the men were as undergraduates, the healthier they were as older men (Pearson's r = +.25).

🗂️ Content analysis

Content analysis: a family of systematic approaches to measurement using complex archival data.

Process:

  • Just as naturalistic observation requires specifying behaviors of interest and noting them as they occur, content analysis requires specifying keywords, phrases, or ideas and finding all occurrences in the data.
  • These occurrences can be counted, timed (e.g., amount of time devoted to entertainment topics on nightly news), or analyzed in various other ways.

⚠️ The causality problem

⚠️ Why correlation does not prove causation

Even when a strong positive relationship is found between two variables, three explanations are possible:

ExplanationDescriptionExample (daily hassles and symptoms)
Variable A causes Variable BThe presumed cause actually causes the effectDaily hassles cause physical/psychological symptoms
Variable B causes Variable A (directionality problem)The presumed effect actually causes the presumed causeSymptoms cause people to experience more daily hassles
Variable C causes both A and B (third-variable problem)An unmeasured third variable causes both measured variablesNeuroticism causes both more hassles and more symptoms

Don't confuse: A strong correlation is consistent with a causal relationship, but it does not prove causation—alternative explanations remain possible in correlational research.

22

Quasi-Experimental Research

Quasi-Experimental Research

🧭 Overview

🧠 One-sentence thesis

Quasi-experimental research manipulates an independent variable like true experiments but lacks random assignment, placing it between correlational studies and true experiments in internal validity.

📌 Key points (3–5)

  • What makes it "quasi": the independent variable is manipulated, but participants are not randomly assigned to conditions.
  • Internal validity trade-off: eliminates directionality problems but does not eliminate confounding variables, so it sits between correlational studies and true experiments.
  • Common confusion: quasi-experiments vs. true experiments—both manipulate variables, but only true experiments use random assignment; quasi-experiments vs. correlational studies—quasi-experiments manipulate the IV, correlational studies only measure.
  • Three main designs: nonequivalent groups (different groups without random assignment), pretest-posttest (measure before and after treatment), and interrupted time-series (multiple measurements before and after).
  • Why used: most common in field settings where random assignment is difficult or impossible, often for evaluating real-world interventions.

🔬 What quasi-experimental research is

🔬 Definition and core features

Quasi-experimental research: research that resembles experimental research but is not true experimental research; the independent variable is manipulated, but participants are not randomly assigned to conditions or orders of conditions.

  • The prefix "quasi" means "resembling"—it looks like an experiment but lacks a key feature.
  • What it has: manipulation of the independent variable before measuring the dependent variable.
  • What it lacks: random assignment to conditions.

⚖️ Where it fits in internal validity

Research typeManipulates IV?Random assignment?Internal validity
CorrelationalNoNoLower
Quasi-experimentalYesNoMedium
True experimentalYesYesHigher
  • Eliminates directionality problem: because the IV is manipulated before the DV is measured, we know which came first.
  • Does not eliminate confounding variables: without random assignment, groups may differ in other important ways.
  • Don't confuse: quasi-experiments are better than correlational studies (because of IV manipulation) but weaker than true experiments (because of no random assignment).

🌍 Typical context

  • Most likely conducted in field settings where random assignment is difficult or impossible.
  • Often used to evaluate treatment effectiveness (e.g., psychotherapy, educational interventions).
  • Example: evaluating a new teaching method in existing classrooms where students are already assigned to teachers.

🧑‍🤝‍🧑 Nonequivalent groups design

🧑‍🤝‍🧑 What it is

Nonequivalent groups design: a between-subjects design in which participants have not been randomly assigned to conditions.

  • When random assignment creates equivalent groups, lack of random assignment creates nonequivalent groups.
  • The groups are likely to be dissimilar in some ways before the treatment even begins.

📚 Classroom example from the excerpt

  • A researcher wants to evaluate a new method of teaching fractions to third graders.
  • Treatment group: one class of third-grade students.
  • Control group: another class of third-grade students.
  • The problem: students are not randomly assigned to classes by the researcher.

⚠️ Potential confounding variables

The excerpt lists several ways the classes might differ:

  • Parent selection: parents of higher-achieving or more motivated students might request a specific teacher (e.g., Ms. Williams's class).
  • Principal assignment: the principal might assign "troublemakers" to one teacher (e.g., Mr. Jones) because he is a stronger disciplinarian.
  • Teacher differences: teaching styles and classroom environments might differ and affect achievement or motivation.
  • Result: if the classes differ in fractions knowledge at the end, it might be the teaching method—or any of these confounding variables.

🛠️ How to improve the design

Researchers can take steps to make groups more similar:

  • Select classes at the same school.
  • Choose classes with similar standardized math test scores.
  • Match teachers on sex, age, and teaching style.
  • Limitation: even with these steps, other important confounding variables may remain that the researcher cannot control.

📊 Pretest-posttest design

📊 What it is

Pretest-posttest design: the dependent variable is measured once before the treatment is implemented and once after it is implemented.

  • Similar to a within-subjects experiment where each participant is tested under control and then treatment.
  • Key difference: the order of conditions is not counterbalanced because participants typically cannot be "untreated" after receiving treatment.

💊 Antidrug program example

  • Week 1: measure elementary students' attitudes toward illegal drugs (pretest).
  • Week 2: implement the antidrug program (treatment).
  • Week 3: measure attitudes again (posttest).
  • If posttest scores are better than pretest scores, the treatment might be responsible—but there are alternative explanations.

🚨 Alternative explanations (threats to validity)

🚨 History

  • Other things might have happened between pretest and posttest.
  • Example: an antidrug program aired on television, or a celebrity died of a drug overdose.
  • These external events could cause the change instead of the treatment.

🚨 Maturation

  • Participants might have changed naturally because they are growing and learning.
  • Example: in a yearlong program, participants might become less impulsive or better reasoners on their own.

🚨 Regression to the mean

Regression to the mean: the statistical fact that an individual who scores extremely on a variable on one occasion will tend to score less extremely on the next occasion.

  • Example: a bowler with a long-term average of 150 who bowls a 220 will almost certainly score lower next game—her score will "regress" toward her mean of 150.
  • Problem in research: if participants are selected for their extreme scores (e.g., students who scored especially low on a fractions test), their scores will likely improve on retest even without treatment.

🚨 Spontaneous remission

Spontaneous remission: the tendency for many medical and psychological problems to improve over time without any form of treatment.

  • Example: common cold sufferers will improve in a week even without chicken soup.
  • Example: severely depressed people are likely to be less depressed in 6 months without treatment—one study found waitlist control participants improved 10–15% before receiving any treatment.
  • Caution: one must be very cautious about inferring causality from pretest-posttest designs.

🧠 Psychotherapy effectiveness case study

The excerpt describes how early psychotherapy research illustrates these problems:

  • 1952: Hans Eysenck summarized 24 pretest-posttest studies showing about two-thirds of patients improved.
  • The problem: Eysenck compared these results with archival data showing similar patients recovered at the same rate without psychotherapy.
  • Conclusion: the improvement might be no more than spontaneous remission, not evidence that psychotherapy was ineffective—just no evidence that it was effective.
  • Solution: Eysenck called for properly planned experimental studies with random assignment.
  • By 1980: hundreds of true experiments had been conducted, showing psychotherapy was quite effective (about 80% of treatment participants improved more than the average control participant).

⏱️ Interrupted time-series design

⏱️ What it is

Interrupted time-series design: a time series (a set of measurements taken at intervals over a period of time) is "interrupted" by a treatment.

  • Similar to pretest-posttest design but includes multiple pretest and posttest measurements.
  • Example: a manufacturing company measures workers' productivity each week for a year, then reduces work shifts from 10 to 8 hours and continues measuring.

📈 How to interpret the data

The excerpt provides a hypothetical example with student absences:

  • Dependent variable: number of student absences per week in a research methods course.
  • Treatment: instructor begins publicly taking attendance so students know the instructor is aware of who is present/absent.

If the treatment worked:

  • Consistently high absences before treatment.
  • Immediate and sustained drop in absences after treatment.

If the treatment did not work:

  • Average number of absences after treatment is about the same as before.
  • Week-to-week variation is normal fluctuation, not treatment effect.

✅ Advantage over simple pretest-posttest

  • If there had been only one measurement before (Week 7) and one after (Week 8), it might look like the treatment worked.
  • Multiple measurements reveal that the reduction might be just normal week-to-week variation, not a treatment effect.
  • Don't confuse: a single drop with natural fluctuation—multiple measurements help distinguish real effects from noise.

🔀 Combination designs

🔀 Combining nonequivalent groups and pretest-posttest

A stronger quasi-experimental design combines elements of both:

  • Treatment group: given a pretest, receives treatment, then given a posttest.
  • Control group: given a pretest, does not receive treatment, then given a posttest.
  • Key question: not just whether the treatment group improves, but whether they improve more than the control group.

🏫 School antidrug program example

  • School 1 (treatment): pretest on attitudes toward drugs → antidrug program → posttest.
  • School 2 (control): pretest → no program → posttest.

Interpretation:

  • If treatment students become more negative toward drugs, it could be the treatment—or history/maturation.
  • If it's really the treatment effect, treatment students should become more negative than control students.
  • If it's history (e.g., celebrity drug overdose) or maturation (e.g., improved reasoning), both groups should show similar change.

⚠️ Remaining limitations

  • Does not completely eliminate confounding variables.
  • Example: something could occur at one school but not the other (e.g., a student drug overdose at the treatment school).
  • Important note: if participants in this design are randomly assigned to conditions, it becomes a true experiment rather than a quasi-experiment.

🎯 Key takeaways from the excerpt

🎯 Summary of quasi-experimental research

  • Involves manipulation of an independent variable without random assignment.
  • Important types: nonequivalent groups, pretest-posttest, and interrupted time-series designs.
  • Eliminates the directionality problem (because IV is manipulated).
  • Does not eliminate confounding variables (because no random assignment).
  • Generally higher in internal validity than correlational studies but lower than true experiments.
23

Research Methods in Psychology - 2nd Canadian Edition

Qualitative Research

🧭 Overview

🧠 One-sentence thesis

This textbook excerpt covers qualitative research methods in psychology, explaining how they differ from quantitative approaches through in-depth study of small samples using non-statistical analysis techniques to understand participants' subjective experiences.

📌 Key points (3–5)

  • What qualitative research is: Focuses on understanding subjective experience through relatively unstructured data collection from small samples, analyzed using narrative rather than statistical techniques
  • Key differences from quantitative: Less focused research questions, large amounts of "unfiltered" data from fewer individuals, non-statistical analysis, emphasis on detailed understanding over generalization
  • Main purposes: Generating new research questions, providing rich descriptions of behavior in context, conveying "lived experience" of participants
  • Data collection methods: Primarily interviews (unstructured, semi-structured, or structured), focus groups, and participant observation
  • Common confusion: Qualitative research vs. case studies - case studies are detailed descriptions that may include both qualitative and quantitative elements, while qualitative research is a broader methodological approach

🎯 Core characteristics

🔍 Defining features

Qualitative research: A quantitative and qualitative method with two important characteristics - variables measured using self-reports and considerable attention paid to sampling issues.

Wait, that definition appears incorrect in the source. Let me provide the accurate characterization:

Qualitative research differs from quantitative research in several fundamental ways:

  • Research questions: Begins with less focused questions rather than specific hypotheses
  • Data type: Collects large amounts of relatively unstructured data
  • Sample size: Studies relatively small numbers of individuals
  • Analysis: Uses narrative/non-statistical techniques rather than statistics
  • Goal: Understanding detailed experience rather than drawing general conclusions about behavior

📊 Comparison with quantitative research

DimensionQuantitativeQualitative
Research questionFocused hypothesisBroader, exploratory question
Sample sizeLarge number of participantsSmall number of participants
Data per participantSmall amountLarge amount
Data structureHighly structuredRelatively unstructured
AnalysisStatistical techniquesNarrative techniques
FocusGroup means, general conclusionsIndividual experience, detailed understanding

⚠️ Don't confuse with case studies

Case studies are detailed descriptions of individuals that can include both qualitative and quantitative analyses. Key differences:

  • Case studies are a specific format (detailed individual description)
  • Qualitative research is a broader methodological approach
  • Case studies have serious internal and external validity problems
  • Case studies cannot determine causal relationships or representativeness
  • Example from text: Freud's "Anna O." case illustrated psychoanalytic theory but provided no way to test whether the interpretation was correct

🎤 Data collection approaches

💬 Interviews

Interviews are the most common data collection method in qualitative psychological research.

Three main types:

  1. Unstructured: Small number of general questions/prompts allowing participants to discuss what interests them
  2. Semi-structured: Few consistent questions with flexibility to follow up on topics that emerge (most common)
  3. Structured: Strict script with no deviation

Example: Lindqvist and colleagues studying families of teenage suicide victims used unstructured interviews, beginning with a general request to talk about the victim and ending with an invitation to discuss anything else. This approach let families control disclosure about the sensitive topic.

Why interviews work: They allow researchers to explore topics where participant control is important, especially for sensitive subjects where the amount disclosed should be led by participants rather than researchers.

👥 Focus groups

Focus groups: Small groups of people who participate together in interviews focused on a particular topic or issue.

  • Interaction among participants can bring out more information than one-on-one interviews
  • Standard technique in business/industry for understanding consumer preferences
  • Content usually recorded and transcribed for later analysis
  • Important consideration: Group dynamics can influence responses (a factor to be aware of from social psychology)

🔬 Participant observation

Participant observation: Researchers become active participants in the group or situation they are studying.

What researchers collect:

  • Interviews (usually unstructured)
  • Their own notes from observations and interactions
  • Documents
  • Photographs
  • Other artifacts

Rationale: Some important information may only be accessible to or interpretable by someone who is an active participant in the group/situation.

Example: Sociologist Amy Wilkins spent 12 months attending and participating in a university religious organization's meetings and social events while interviewing members. She identified several ways the group "enforced" happiness, such as continually talking about happiness and discouraging negative emotions.

📈 Data analysis methods

🏗️ Grounded theory approach

Grounded theory: An approach where researchers start with the data and develop a theory or interpretation that is "grounded in" those data.

How it differs from quantitative research:

  • Quantitative: Start with theory → derive hypothesis → collect data to test hypothesis
  • Qualitative (grounded theory): Start with data → develop theory grounded in the data

Analysis stages:

  1. Identify ideas repeated throughout the data
  2. Organize these ideas into a smaller number of broader themes
  3. Write a theoretical narrative interpreting the data in terms of identified themes

Key features of the narrative:

  • Focuses on subjective experience of participants
  • Usually supported by many direct quotations from participants
  • Interprets data through the lens of identified themes

📋 Example: Postpartum depression study

Abrams and Curran studied postpartum depression symptoms among low-income mothers through unstructured interviews with 19 participants.

Five broad themes identified:

  1. Ambivalence ("I wasn't prepared for this baby")
  2. Care-giving overload ("Please stop crying," "I need a break")
  3. Juggling ("No time to breathe," "Everyone depends on me")
  4. Mothering alone ("I really don't have any help")
  5. Real-life worry ("I don't have any money," "Will my baby be OK?")

Their theoretical narrative: Focused on how participants experienced symptoms not as an abstract "affective disorder" but as closely tied to the daily struggle of raising children alone under difficult circumstances.

Example quote from participant "Destiny": "Well, just recently my apartment was broken into and the fact that his Medicaid for some reason was cancelled so a lot of things was happening within the last two weeks all at one time. So that in itself I don't want to say almost drove me mad but it put me in a funk.…Like I really was depressed."

🤝 The quantitative-qualitative relationship

⚔️ Historical tensions

Quantitative researchers' criticisms of qualitative methods:

  • Lack objectivity
  • Difficult to evaluate for reliability and validity
  • Don't allow generalization beyond those studied
  • Overlook richness of human behavior and experience

Qualitative researchers' criticisms of quantitative methods:

  • Answer only simple questions about easily quantifiable variables
  • Miss the complexity of human experience

🔧 Addressing concerns

Qualitative researchers have developed frameworks for addressing objectivity, reliability, validity, and generalizability issues (though details are beyond the excerpt's scope).

Quantitative researchers acknowledge they use simplification as a strategy for uncovering general principles, not because they believe all behavior can be reduced to simple variables.

🤝 Mixed-methods research

Mixed-methods research: The combination of quantitative and qualitative approaches.

Two main approaches to combining methods:

  1. Sequential approach: Use qualitative research for hypothesis generation, then quantitative research for hypothesis testing

    • Example: Qualitative study suggests families experiencing unexpected suicide have more difficulty resolving "why" questions
    • Follow-up: Well-designed quantitative study tests this hypothesis with large sample measuring specific variables
  2. Triangulation: Use both methods simultaneously to study the same questions and compare results

    • If results converge: They reinforce and enrich each other
    • If results diverge: They suggest interesting new questions about why and how to reconcile differences

🔍 Triangulation example

Trenor and colleagues investigated female engineering students' experiences:

  • Phase 1 (quantitative): Survey where students rated perceptions including sense of belonging

  • Result: No statistical differences in belonging ratings across ethnic groups

  • Possible conclusion: Ethnicity doesn't affect sense of belonging

  • Phase 2 (qualitative): Interviews with students

  • Result: Many minority students reported how cultural diversity enhanced their sense of belonging

  • Actual conclusion: Without qualitative component, wrong conclusion would have been drawn from quantitative results

🎯 Complementary strengths

Some researchers characterize the approaches as:

  • Quantitative: Best for identifying behaviors or phenomena (the "what")
  • Qualitative: Best for understanding meaning or mechanisms (the "why" and "how")

However, Bryman (2012) argues for breaking down the divide between these "arbitrarily different ways of investigating the same questions."

💡 Key purposes of qualitative research

🌱 Generating research questions

Qualitative research excels at generating novel and interesting research questions that quantitative research can later test.

Example: Lindqvist's research on suicide victims' families suggested a relationship between how unexpected a suicide is and how consumed families are with understanding why. This relationship can now be explored quantitatively, but the question might not have arisen without researchers listening to what families wanted to say about their experience.

📖 Providing rich description

"Thick description": The depth of detail qualitative research provides about human behavior in real-world contexts (term from Geertz, 1973).

Qualitative research describes behavior in the contexts where it actually occurs, with a level of detail quantitative research typically cannot achieve.

🎭 Conveying lived experience

"Lived experience": What qualitative researchers call the sense of what it's actually like to be a member of a particular group or in a particular situation.

Example: Lindqvist and colleagues describe how all families spontaneously offered to show the interviewer the victim's bedroom or suicide location, revealing the importance of these physical spaces to the families. A quantitative study would be unlikely to discover this detail.

⚠️ Important distinctions

📚 Qualitative research vs. case studies

Case study characteristics:

  • Detailed description of an individual
  • Can include both qualitative and quantitative analyses
  • Useful for: suggesting new research questions, illustrating general principles, understanding rare phenomena
  • Major limitations: Cannot substitute for controlled studies due to low internal and external validity

Why case studies have validity problems:

  1. Internal validity: Cannot determine if specific events are causally related or even related at all

    • Example: If a case study describes someone sexually abused as a child who developed an eating disorder as a teen, there's no way to determine if these events are connected
  2. External validity: An individual case can always be unusual and unrepresentative of people generally

    • Cannot determine if the case represents typical patterns

Historical importance: Psychology's history includes influential case studies (Freud's "Anna O.", Watson and Rayner's "Little Albert"), but these cannot substitute for rigorous research designs.

Don't confuse: Case studies are a specific type of detailed individual description, while qualitative research is a broader methodological approach with systematic data collection and analysis procedures.

This comprehensive overview covers the main concepts, methods, and considerations for qualitative research in psychology as presented in the excerpt, emphasizing its complementary relationship with quantitative approaches and its unique strengths in understanding human experience.

24

Multiple Dependent Variables

Multiple Dependent Variables

🧭 Overview

🧠 One-sentence thesis

Researchers often measure multiple dependent variables in a single study to answer more research questions efficiently and to confirm that their manipulations worked as intended.

📌 Key points (3–5)

  • Why use multiple DVs: Once a study is designed and participants recruited, measuring several dependent variables allows researchers to answer multiple questions with minimal extra effort.
  • Two main types: Researchers measure either different constructs (e.g., mood and health) or the same construct in different ways (e.g., multiple stress measures).
  • Manipulation checks: When an independent variable is manipulated indirectly (like emotions), researchers include a measure to confirm the manipulation actually worked.
  • Common confusion: Multiple measures of the same construct can be analyzed separately OR combined into one score—but only if they correlate well with each other (good internal consistency).
  • Order matters: Measuring one dependent variable might affect responses to another (carryover effects), so researchers must carefully order measurements or counterbalance them.

🎯 Why measure multiple dependent variables

🎯 Efficiency and breadth

  • After investing effort in designing a study, recruiting participants, and obtaining ethics approval, measuring only one variable seems wasteful.
  • Even when primarily interested in one relationship, researchers can easily answer additional questions by including more dependent variables.
  • Example: Schnall and colleagues studied how disgust affects moral judgments (primary interest) but also measured willingness to eat at a restaurant.

🔬 Real research example

The excerpt describes an odor experiment:

  • Independent variable: room scent (no odor, lemon, lavender, or cabbage-like smell)
  • Primary dependent variable: creativity
  • Additional dependent variables: mood and perceived health
  • Results: Creativity was unaffected, but mood was lower with cabbage smell and perceived health was higher with lemon scent

📊 Two approaches to multiple dependent variables

📊 Measuring different constructs

Multiple dependent variables: several distinct outcome measures in one study.

  • Each dependent variable represents a different construct or question.
  • Researchers measure separate phenomena they're curious about.
  • Example: disgust affecting both moral judgments AND restaurant willingness—two distinct constructs.

🔗 Measuring the same construct differently

  • Researchers operationally define one construct in multiple ways to strengthen conclusions.
  • Called "converging operations"—if different measures show the same pattern, confidence increases.
  • Example: measuring stress through both a questionnaire (Perceived Stress Scale) and a biological measure (cortisol hormone levels).

Don't confuse: Different constructs vs. different measures of the same construct—the first should be analyzed separately; the second can potentially be combined.

⚖️ Managing measurement order

⚖️ Carryover effects problem

  • Measuring one dependent variable first might affect responses to later ones.
  • Example: Measuring mood before perceived health could influence health ratings, or vice versa.

🔄 Two solutions

ApproachHow it worksWhen to use
Fixed orderMeasure in same order for all participants, most important firstWhen one variable is clearly primary
CounterbalancingSystematically vary the order across participantsWhen multiple variables are equally important

✅ Manipulation checks

✅ What they are and why they matter

Manipulation check: an additional measure of the independent variable itself, included to confirm the manipulation was successful.

  • Used when the independent variable is a construct manipulated indirectly (emotions, internal states).
  • Usually done at the end of the procedure to avoid drawing attention to the manipulation.
  • Example: Schnall's team had participants rate their disgust level to confirm the messy room actually made people feel more disgusted than the clean room.

🔍 When manipulation checks are critical

  • Most important when the manipulation has no effect on the dependent variable.
  • Two possible explanations for null results:
    1. The independent variable truly doesn't affect the dependent variable
    2. The manipulation failed to change the independent variable
  • Example scenario: Playing happy/sad music to affect mood, then measuring childhood memory recall. If no effect is found, a mood manipulation check reveals whether the music actually changed participants' moods or the manipulation simply failed.

🔢 Combining multiple measures

🔢 Creating composite scores

  • When multiple dependent variables measure the same construct on the same scale, researchers can combine them into a single score.
  • Method: Compute the mean of the individual measures.
  • Example: Schnall presented seven scenarios of morally questionable behaviors and combined the seven ratings into one overall "harshness of moral judgment" score.

📏 Internal consistency requirement

Multiple-response measure: treating several dependent variables collectively as one measure of a single construct.

Advantage: Multiple-response measures are generally more reliable than single-response measures.

Critical requirement: Individual measures must correlate with each other.

  • Check using internal consistency measures like Cronbach's α.
  • If measures don't correlate well (poor internal consistency), they should be treated as separate dependent variables, not combined.
  • Don't confuse: Just because measures seem conceptually related doesn't mean they can be combined—statistical correlation is required.
25

Multiple Independent Variables

Multiple Independent Variables

🧭 Overview

🧠 One-sentence thesis

Including multiple independent variables in a single experiment allows researchers to answer more research questions and discover interactions—when the effect of one variable depends on the level of another.

📌 Key points (3–5)

  • Why use multiple independent variables: researchers can address multiple research questions in one study and examine whether effects depend on other variables (interactions).
  • Factorial design structure: combines every level of one independent variable with every level of others to create all possible conditions.
  • Two types of results: main effects (overall effect of one variable averaged across others) and interactions (when one variable's effect depends on another's level).
  • Common confusion: distinguishing main effects from interactions—a main effect is the average impact of one variable alone; an interaction means that impact changes depending on another variable.
  • Nonmanipulated variables: studies can include measured (not manipulated) independent variables, but causal conclusions can only be drawn about manipulated variables.

🔬 What factorial designs are

🧩 Definition and structure

Factorial design: each level of one independent variable (also called a factor) is combined with each level of the others to produce all possible combinations.

  • Each combination becomes a separate condition in the experiment.
  • The design is described by multiplying the number of levels: 2 × 2 (four conditions), 3 × 2 (six conditions), 4 × 5 (twenty conditions).
  • Example: cell phone use (yes vs. no) and time of day (day vs. night) creates a 2 × 2 design with four conditions: phone during day, no phone during day, phone at night, no phone at night.

📊 Factorial design table

  • Columns represent one independent variable, rows represent another.
  • Each cell in the table represents one experimental condition.
  • The table makes it easy to visualize all possible combinations.

🚧 Practical limits

  • In practice, designs rarely exceed three independent variables with two or three levels each.
  • Reasons for limits:
    • Number of conditions grows quickly (e.g., 2 × 2 × 2 × 3 = 24 conditions).
    • Participant requirements become unfeasible while maintaining adequate statistical power.

👥 Assigning participants to conditions

🔀 Between-subjects factorial design

  • All independent variables are manipulated between subjects.
  • Each participant is tested in only one condition.
  • Example: each person tested either with or without a cell phone and either during day or during night—one condition total per participant.
  • Advantages: conceptually simpler, avoids carryover effects, minimizes participant time and effort.

🔄 Within-subjects factorial design

  • All independent variables are manipulated within subjects.
  • Each participant is tested in all conditions.
  • Example: each person tested both with and without a cell phone and both during day and during night—all four conditions per participant.
  • Advantages: more efficient for researchers, controls extraneous participant variables.

🔀🔄 Mixed factorial design

  • One independent variable manipulated between subjects, another within subjects.
  • Example: cell phone use tested within subjects (same person in both conditions), but time of day between subjects (each person tested only once, either day or night).
  • Combines advantages of both approaches.

🎲 Random assignment

  • Regardless of design type, assignment to conditions or orders is typically done randomly.

📏 Nonmanipulated independent variables

🧬 Participant variables

  • One independent variable is often measured rather than manipulated.
  • These are usually participant variables: characteristics like private body consciousness, hypochondriasis, self-esteem.
  • Example study: disgust was manipulated (clean vs. messy room), but private body consciousness was measured.
  • Another example: word type was manipulated (health-related vs. non-health-related), but hypochondriasis level was measured.

⚠️ Causal inference limits

  • Studies with at least one manipulated variable are still considered experiments.
  • Critical distinction: causal conclusions can only be drawn about manipulated variables, not measured ones.
  • Example: researchers can conclude disgust affected moral judgments (manipulated variable), but cannot conclude private body consciousness caused harsher judgments (measured variable).
  • Why: a third variable might cause both high private body consciousness and strict moral codes (e.g., neuroticism).
  • Don't confuse: including a nonmanipulated variable doesn't make the entire study non-experimental if at least one variable is manipulated.

🔍 Between-subjects by definition

  • Nonmanipulated participant variables are always between-subjects factors.
  • People cannot be tested in both "high hypochondriasis" and "low hypochondriasis" conditions—they are one or the other.

📊 Graphing factorial results

📊 Bar graphs for categorical variables

  • One independent variable represented on the x-axis.
  • The other represented by different-colored bars.
  • The y-axis always shows the dependent variable.
  • Choice of which variable goes where depends on what communicates results most clearly.
  • Example: time of day on x-axis, cell phone use shown by bar color.

📈 Line graphs for quantitative variables

  • Used when the x-axis variable is quantitative with distinct levels.
  • Also appropriate for time series data (measurements over time intervals).
  • Different lines represent levels of the second independent variable.
  • Example: psychotherapy length on x-axis, psychotherapy type shown by different line formats.

🎯 Main effects and interactions

🎯 Main effects defined

Main effect: the statistical relationship between one independent variable and a dependent variable—averaging across the levels of the other independent variable.

  • There is one main effect for each independent variable in the study.
  • Main effects are independent of each other—the presence or absence of one main effect says nothing about the other.
  • Example: a main effect of cell phone use means driving performance differs on average between phone and no-phone conditions, regardless of time of day.
  • Example: a main effect of time of day means driving performance differs on average between day and night, regardless of phone use.

🔗 Interactions defined

Interaction effect (or interaction): when the effect of one independent variable depends on the level of another.

  • The effect of one variable changes depending on the level of the other variable.
  • Example: psychotherapy effect is stronger among highly motivated people than unmotivated people—the effect of therapy depends on motivation level.
  • Example: room cleanliness affected moral judgments only for people high in private body consciousness, not for those low—the effect of cleanliness depends on body consciousness level.

🔀 Three types of interactions

Interaction typeDescriptionVisual pattern
Effect at one level onlyVariable B has an effect at level 1 of variable A but no effect at level 2One bar/line changes, the other stays flat
Stronger effect at one levelVariable B has a stronger effect at level 1 than at level 2 of variable ABoth change, but one changes more
Crossover interactionVariable B has opposite effects at different levels of variable ALines literally cross over each other

🔄 Crossover interaction example

  • Study on caffeine and verbal test scores in introverts vs. extraverts.
  • Without caffeine: introverts perform better than extraverts.
  • With 4 mg caffeine per kg body weight: extraverts perform better than introverts.
  • The effect of caffeine reverses depending on personality type.

🎯 Research focus on interactions

  • Many studies have interactions as the primary research question.
  • Example: hypothesis that people with hypochondriasis are especially attentive to negative health information.
  • Prediction: high-hypochondriasis people recall health-related words better, but recall non-health-related words the same as low-hypochondriasis people.
  • This prediction is specifically about an interaction, not just a main effect.
26

Complex Correlational Designs

Complex Correlational Designs

🧭 Overview

🧠 One-sentence thesis

Complex correlational research allows researchers to explore relationships and possible causal patterns among multiple variables when manipulation is impractical or unethical, though it cannot definitively establish causation.

📌 Key points (3–5)

  • When to use: Researchers choose correlational designs over experiments when interested in noncausal relationships or when independent variables cannot be manipulated for practical or ethical reasons.
  • Multiple variables: Complex correlational studies measure several variables (both categorical and quantitative) and assess statistical relationships among them.
  • Exploring causation with limits: Techniques like statistical control and multiple regression can rule out some alternative explanations, but correlation still cannot unambiguously prove causation due to directionality and third-variable problems.
  • Common confusion: Factorial designs can be correlational (all nonmanipulated variables) or experimental (manipulated variables)—the key distinction is whether variables are manipulated or merely measured.
  • Factor analysis reveals structure: This technique organizes many variables into clusters (factors) that operate independently, not as categories—people can score high or low on multiple factors simultaneously.

🔬 Correlational Studies With Factorial Designs

🔬 What makes a factorial design correlational

A factorial design becomes a correlational study (not an experiment) when it includes only nonmanipulated independent variables.

  • The researcher measures variables rather than manipulating them.
  • Example: A hypothetical study measures participants' moods (positive vs. negative) and self-esteem (high vs. low), then assesses willingness to have unprotected sex—this is a 2 × 2 factorial correlational design.
  • The design can still be represented in factorial tables and bar graphs, with main effects and interactions analyzed.

⚠️ Causation limitations

  • Because no variables are manipulated, causality cannot be inferred confidently.
  • Directionality problem: Does mood affect behavior, or does behavior affect mood?
  • Third-variable problem: Any other variable correlated with mood (or self-esteem) could be the true cause.
  • Example: A main effect of mood on risky behavior might actually be caused by another variable that happens to correlate with mood.
  • Don't confuse: A similar study by MacDonald and Martineau (2002) was an experiment because they manipulated participants' moods—manipulation is the key difference.

📊 Assessing Relationships Among Multiple Variables

📊 Measuring many variables at once

  • Most complex correlational research does not fit neatly into factorial designs.
  • Instead, researchers measure several variables and assess statistical relationships among them.
  • Example: Radcliffe and Klein studied middle-aged adults' optimism levels alongside health behaviors, knowledge of heart attack risk factors, and beliefs about personal risk—finding that more optimistic participants were healthier, more knowledgeable, and correctly assessed their lower risk.

📋 Correlation matrices

A correlation matrix shows the correlation (Pearson's r) between every possible pair of variables in a study.

  • Only half the matrix is filled because the other half would duplicate information.
  • Diagonal values (correlation of a variable with itself) are always +1.00, so they are replaced with dashes.
  • Example use: Cacioppo and Petty validated their Need for Cognition Scale by examining its correlations with intelligence (+.39), socially desirable responding (+.08), and dogmatism (−.27).
VariableNeed for cognitionIntelligenceSocial desirabilityDogmatism
Need for cognition
Intelligence+.39
Social desirability+.08+.02
Dogmatism−.27−.23+.03

🧩 Factor analysis

Factor analysis organizes many conceptually similar variables into smaller clusters (factors) based on strong within-cluster correlations and weak between-cluster correlations.

  • Each cluster represents an underlying construct or "factor."
  • Example: Mental tasks typically organize into two factors—mathematical intelligence (arithmetic, spatial reasoning) and verbal intelligence (grammar, reading comprehension, vocabulary).
  • Example: Rentfrow and Gosling asked students to rate 14 music genres; factor analysis identified four factors: Reflective and Complex (blues, jazz, classical, folk), Intense and Rebellious (rock, alternative, heavy metal), Upbeat and Conventional (country, pop, religious), and Energetic and Rhythmic (rap, soul, electronica).

🔑 Two critical points about factors

Factors are not categories

  • People are not "either/or" on factors; they can score high or low on multiple factors independently.
  • Example: Someone high in extroversion can be high or low in conscientiousness; someone who likes reflective music might also like intense music.

Researchers must interpret factor structure

  • Factor analysis reveals only the underlying statistical structure.
  • Researchers must label factors and explain why that structure exists.
  • Example: The Big Five personality factors operate separately partly because they appear to be controlled by different genes.

🔍 Exploring Causal Relationships

🔍 Statistical control of third variables

Statistical control involves measuring potential third variables and including them in statistical analysis, rather than controlling them through random assignment or holding them constant.

  • This technique can rule out some plausible alternative explanations.
  • Example: Piff and colleagues hypothesized that lower socioeconomic status (SES) causes greater generosity. They measured SES and had participants play the "dictator game" (splitting 10 points with a supposed partner).
  • Lower-SES participants gave away more points, consistent with the hypothesis.
  • Potential third variables: Perhaps lower-SES people are more religious, or come from ethnic groups emphasizing generosity.
  • How they addressed it: Researchers measured religiosity and ethnicity, found neither correlated with generosity, and ruled them out as third variables.

⚠️ Limits of statistical control

  • Ruling out some third variables does not prove causation.
  • Other unmeasured third variables could still explain the relationship.
  • Statistical control makes a stronger case for causation but cannot provide definitive proof.

🧮 Multiple regression

Multiple regression measures several independent variables (X₁, X₂, X₃, …) as possible causes of a single dependent variable (Y), producing an equation expressing Y as an additive combination of the independent variables.

General form: b₁X₁ + b₂X₂ + b₃X₃ + … + bᵢXᵢ = Y

  • Regression weights (b₁, b₂, etc.) indicate how much each independent variable contributes to the dependent variable.
  • Specifically, they show how much Y changes for each one-unit change in an independent variable.

🎯 Advantage of multiple regression

  • Shows whether an independent variable contributes to a dependent variable over and above contributions from other independent variables.
  • Example: A researcher wants to know how income and health relate to happiness. Income and health are themselves related, so:
    • If higher-income people are happier, is it only because they're healthier?
    • If healthier people are happier, is it only because they earn more?
  • Multiple regression including both income and health shows whether each contributes to happiness when the other is accounted for.
  • Research finding: Both income and health make extremely small contributions to happiness except in cases of severe poverty or illness.

🚧 Final caution

  • Complex correlational techniques (statistical control, multiple regression) can show patterns consistent with some causal interpretations and inconsistent with others.
  • They cannot unambiguously establish that one variable causes another.
  • The best they can do is strengthen or weaken the case for particular causal interpretations by ruling out specific alternatives.
27

Overview of Survey Research

Overview of Survey Research

🧭 Overview

🧠 One-sentence thesis

Survey research is a flexible quantitative and qualitative method that uses self-reports and careful sampling to study everything from mental health prevalence to how emotions shape risk perception, serving both basic research and applied policy needs.

📌 Key points (3–5)

  • Two defining characteristics: survey research measures variables through self-reports and pays strong attention to sampling (especially large random samples for population accuracy).
  • Can be experimental or nonexperimental: surveys describe single variables and relationships between them, but can also manipulate independent variables to test causal hypotheses.
  • Historical roots: emerged from early 20th-century social reform documentation, advanced through election polling (notably the 1936 Roosevelt-Landon election), and spread into academic psychology.
  • Common confusion: survey research is not only descriptive—it can test causal relationships when researchers manipulate variables (e.g., priming anger vs. fear to measure risk judgments).
  • Wide applications: used to estimate mental disorder prevalence, study attitudes and stereotypes, conduct market research, and supplement laboratory experiments with diverse samples.

📊 What survey research is

📊 Core definition

Survey research: a quantitative and qualitative method with two important characteristics—variables are measured using self-reports, and considerable attention is paid to sampling.

  • Self-reports: researchers ask participants (called respondents) to report directly on their own thoughts, feelings, and behaviors.
  • Sampling emphasis: survey researchers strongly prefer large random samples because they provide the most accurate estimates of population characteristics.
  • Random sampling is routinely used in survey research, unlike most other psychology approaches.

🔄 Flexibility of the method

Beyond the two core characteristics, survey research is highly flexible:

  • Length: can be long or short
  • Delivery mode: conducted in person, by telephone, mail, or Internet
  • Topics: voting intentions, consumer preferences, social attitudes, health, or any question that yields meaningful answers
  • Analysis: often uses statistics, but many questions suit qualitative analysis

Example: The Lerner study used an Internet-based survey of nearly 2,000 Americans (ages 13–88) after September 2001 attacks, measuring reactions and risk judgments.

🧪 Experimental vs. nonexperimental surveys

TypePurposeExample from excerpt
NonexperimentalDescribe single variables or assess relationshipsPercentage of voters preferring a candidate; relationship between income and health; prevalence of schizophrenia
ExperimentalManipulate independent variables to test causal effectsLerner study: primed some participants to feel anger (via questions and media clips) and others to feel fear, then measured risk perceptions
  • Don't confuse: the Lerner study is both survey research (self-reports + large national sample) and experimental (manipulation of emotion to assess effect on judgments).
  • Key finding: anger-primed participants perceived less risk than fear-primed participants, showing risk perceptions are tied to specific emotions.

🕰️ Historical development

🕰️ Early roots (turn of 20th century)

  • Survey research originated in English and American "social surveys" conducted by researchers and reformers.
  • Goal: document the extent of social problems such as poverty.
  • By the 1930s, the US government conducted surveys to document economic and social conditions nationwide.
  • The need to draw conclusions about entire populations spurred advances in sampling procedures.

🗳️ Election polling breakthrough (1936)

A watershed event was the 1936 presidential election between Alf Landon and Franklin Roosevelt:

  • Literary Digest approach: sent ballots (also subscription requests) to millions of Americans; predicted Landon would win in a landslide based on this "straw poll."
  • New scientific pollsters: used scientific methods with much smaller samples; predicted Roosevelt would win in a landslide.
  • George Gallup publicly criticized Literary Digest's methods before the election and guaranteed his prediction would be correct—and it was.
  • This demonstrated that scientific sampling with smaller samples outperforms large but poorly sampled polls.

📚 Spread into academic fields

  • From market research and election polling, survey research entered political science, sociology, and public health.
  • 1930s psychology advances: psychologists developed questionnaire design techniques still used today, such as the Likert scale.
  • Strong historical association with social psychological study of attitudes, stereotypes, and prejudice.
  • Early attitude researchers sought larger and more diverse samples than the convenience samples of university students routinely used in psychology.

🇨🇦 Long-term projects

  • Canadian Election Studies: measured opinions of Canadian voters around federal elections since 1965; anyone can access the data and read results.

🧠 Applications in psychology

🧠 Mental health research

Survey research is instrumental in estimating prevalence of mental disorders and identifying statistical relationships.

National Comorbidity Survey example:

  • Large-scale mental health survey conducted in the United States
  • Nearly 10,000 adults given structured mental health interviews in their homes (2002–2003)
  • Measured lifetime prevalence (percentage of population that develops a problem sometime in their lifetime)

Sample findings on lifetime prevalence:

DisorderAverageFemaleMale
Generalized anxiety disorder5.7%7.1%4.2%
Obsessive-compulsive disorder2.3%3.1%1.6%
Major depressive disorder16.9%20.2%13.2%
Bipolar disorder4.4%4.5%4.3%
Alcohol abuse13.2%7.5%19.6%
Drug abuse8.0%4.8%11.6%

💡 Value for multiple audiences

This information is useful to:

  • Basic researchers: seeking to understand causes and correlates of mental disorders
  • Clinicians and policymakers: need to understand exactly how common these disorders are

🔬 Supplementing laboratory research

  • Survey experiments conducted on large and diverse samples can supplement laboratory studies on university students.
  • Although not a typical use of survey research, it illustrates the method's flexibility.

🧩 Survey responding as a cognitive process

🧩 Five-stage model

The excerpt introduces a cognitive model of how people respond to survey items, involving five processes:

  1. Interpret the question: understand what is being asked
  2. Retrieve relevant information from memory: recall pertinent thoughts, feelings, or experiences
  3. Form a tentative judgment: create an initial answer based on retrieved information
  4. Convert judgment into response option: translate the judgment into the provided format (e.g., rating on a 1-to-7 scale)
  5. Edit response as necessary: adjust the answer before finalizing
  • Understanding this process is essential before constructing good survey questionnaires.
  • Each stage can introduce unintended influences on answers, adding noise or systematic biases.

⚠️ Potential problems

The excerpt notes that answers can be influenced in unintended ways by:

  • Wording of items
  • Order of items
  • Response options provided
  • Many other factors

At best, these influences add noise to the data; at worst, they result in systematic biases and misleading results.

28

Constructing Survey Questionnaires

Constructing Survey Questionnaires

🧭 Overview

🧠 One-sentence thesis

Constructing effective survey questionnaires requires understanding that survey responding is a complex cognitive process vulnerable to unintended context effects, which can be minimized through careful item writing, appropriate response formatting, and thoughtful questionnaire organization.

📌 Key points (3–5)

  • Survey responding is cognitive work: respondents must interpret questions, retrieve information from memory, form judgments, format responses, and edit their answers—each step introduces potential error.
  • Context effects distort responses: item order, response options, and question wording can systematically bias answers in ways unrelated to the actual content being measured.
  • Open vs closed items trade off depth for efficiency: open-ended questions yield richer, unbiased data but are harder to analyze; closed-ended questions are faster and easier to code but may constrain or influence responses.
  • Common confusion—typical vs past behavior: asking about "typical" behavior may be more valid than "past" behavior, but respondents may interpret "typical day" differently (weekday vs weekend vs both).
  • BRUSO principles guide effective writing: items should be Brief, Relevant, Unambiguous, Specific, and Objective to maximize reliability and validity.

🧠 Survey responding as a cognitive process

🧠 The five-stage model

The excerpt presents a cognitive model with five sequential stages:

  1. Interpret the question – decide what the question is really asking
  2. Retrieve information – search memory for relevant facts or beliefs
  3. Form a tentative judgment – combine retrieved information into an answer
  4. Format the response – translate the judgment into one of the provided response options
  5. Edit the response – decide whether to report the answer as-is or modify it (e.g., for social desirability)

Each stage introduces opportunities for error or unintended influence.

🍺 Example: the "alcoholic drinks" question

The excerpt walks through a seemingly simple question: "How many alcoholic drinks do you consume in a typical day?"

Interpretation challenges:

  • Does "alcoholic drinks" include beer and wine, or only hard liquor?
  • Does "typical day" mean weekday, weekend, or an average of both?

Retrieval strategies vary:

  • Vague recall of recent drinking occasions
  • Careful counting of last week's drinks
  • Retrieving a self-belief ("I'm not much of a drinker")

Judgment formation:

  • Might involve mental calculation (e.g., dividing last week's total by seven)

Formatting problems:

  • Response options like "average" or "somewhat more than average" are themselves ambiguous

Editing for social desirability:

  • Respondents who drink heavily may underreport to avoid looking bad

Don't confuse: What appears to be a straightforward factual question actually requires complex cognitive work at every stage.

⚠️ Context effects on responses

📋 What context effects are

Context effects: influences on respondents' answers that are not related to the content of the item but to the context in which the item appears.

These effects add noise at best and systematic bias at worst.

🔄 Item-order effects

Mechanism: One item can change how participants interpret a later item or what information they retrieve to answer it.

Example from the excerpt:

  • Researchers asked college students about general life satisfaction and dating frequency
  • When life satisfaction came first: correlation = −0.12 (weak relationship)
  • When dating frequency came first: correlation = +0.66 (strong relationship)
  • Why: Reporting dating frequency first made that information more accessible in memory, so respondents used it as a basis for rating life satisfaction

Mitigation strategy: Rotate questions when there is no natural order; use counterbalancing (the excerpt notes undecided voters give a 2.5% boost to the first candidate listed on a ballot simply due to order).

📊 Response-option effects

How response ranges shape answers:

  • When asked how often they are "really irritated" with options from "less than once a year" to "more than once a month," people think of major irritations and report low frequency
  • With options from "less than once a day" to "several times a month," people think of minor irritations and report high frequency

Middle-option bias:

  • People assume middle options represent what is normal or typical
  • If they see themselves as normal, they gravitate toward middle options
  • Example: People report watching more TV when response options center on 4 hours than when centered on 2 hours

Don't confuse: The response scale is not a neutral measurement tool—it actively shapes what respondents think about and how they answer.

✍️ Types of questionnaire items

📖 Open-ended items

Open-ended items: simply ask a question and allow participants to answer in whatever way they choose.

Examples from the excerpt:

  • "What is the most important thing to teach children to prepare them for life?"
  • "Please describe a time when you were discriminated against because of your age."
  • "Is there anything else you would like to tell us about?"

When to use:

  • Researchers don't know how participants might respond
  • Want to avoid influencing responses
  • Have vaguely defined research questions (early stages of research)
  • Asking about quantities that can be converted to categories later

Advantages:

  • Unbiased—no expectations provided
  • More valid and reliable
  • Easy to write (no response options needed)

Disadvantages:

  • Take more time and effort for participants
  • Respondents more likely to skip them
  • Harder to analyze (must transcribe, code, and use qualitative methods like content analysis)

🔘 Closed-ended items

Closed-ended items: ask a question and provide a set of response options for participants to choose from.

When to use:

  • Researchers have a good idea of possible responses
  • Interested in well-defined variables (e.g., agreement level, risk perception, behavior frequency)

Advantages:

  • Quick and easy for participants
  • Much easier to analyze (responses convert to numbers for spreadsheets)
  • Higher completion rates

Disadvantages:

  • More difficult to write (must create appropriate response options)
  • May constrain or influence responses

Common formats:

  • Categorical variables: list categories, participants choose one or more
  • Quantitative variables: provide a rating scale

📏 Rating scales

Rating scale: an ordered set of responses that participants must choose from.

Number of options:

  • Typical range: 3 to 11 options
  • Most common: 5 or 7 options

Five-point scales:

  • Best for unipolar scales (one construct tested)
  • Example: frequency (Never, Rarely, Sometimes, Often, Always)

Seven-point scales:

  • Best for bipolar scales (dichotomous spectrum)
  • Example: liking (Like very much, Like somewhat, Like slightly, Neither like nor dislike, Dislike slightly, Dislike somewhat, Dislike very much)

Branching technique:

  • For bipolar questions, first ask which side of the spectrum ("Do you generally like or dislike ice cream?")
  • Then refine with the seven-point scale
  • Improves both reliability and validity

Labeling best practices:

  • Present verbal labels to respondents
  • Convert to numerical values only in analysis
  • Avoid partial labels or overly specific labels
  • Can supplement with meaningful graphics
  • Visual-analog scales: participants mark along a horizontal line

📊 What is a Likert scale?

Precise definition (not just any rating scale):

In the 1930s, Rensis Likert created a specific approach:

  1. Present several statements (both favorable and unfavorable) about a person, group, or idea
  2. Respondents rate agreement on a 5-point scale: Strongly Agree, Agree, Neither Agree nor Disagree, Disagree, Strongly Disagree
  3. Assign numbers to responses (with reverse coding as needed)
  4. Sum across all items to produce an attitude score

Don't confuse: A Likert scale is specifically for measuring attitudes through agreement with multiple statements—not just any rating scale.

📝 Writing effective items: the BRUSO model

🎯 BRUSO acronym

BRUSO: Brief, Relevant, Unambiguous, Specific, Objective

📏 B – Brief

  • To the point
  • Avoid long, overly technical, or unnecessary words
  • Makes items easier to understand and faster to complete
PoorEffective
"Are you now or have you ever been the possessor of a firearm?""Have you ever owned a gun?"

🎯 R – Relevant

  • Items should relate to the research question
  • Don't ask about sexual orientation, marital status, or income unless relevant
  • Avoids annoying respondents with "nosy" questions
  • Makes questionnaire faster to complete
PoorEffective
"What is your sexual orientation?" (when not relevant)Do not include unless clearly relevant to research

🔍 U – Unambiguous

  • Can be interpreted in only one way
  • Different respondents should understand the question the same way
PoorEffective
"Are you a gun person?""Do you currently own a gun?"

Best practice: Conduct pre-tests and ask people to explain how they interpreted the question.

🎲 S – Specific

  • Clear what the response should be about
  • Clear to researchers what it is about

Double-barreled items (avoid):

  • Ask about two conceptually separate issues but allow only one response
  • Example: "Please rate the extent to which you have been feeling anxious and depressed"
  • Solution: Split into two items—one about anxiety, one about depression
PoorEffective
"How much have you read about the new gun control measure and sales tax?""How much have you read about the new sales tax?"

🎭 O – Objective

  • Don't reveal researcher's opinions
  • Don't lead participants to answer in a particular way
PoorEffective
"How much do you support the new gun control measure?""What is your view of the new gun control measure?"

🏗️ Creating appropriate response scales

✅ Mutually exclusive and exhaustive categories

For categorical variables:

Mutually exclusive:

  • Categories do not overlap
  • Example: "Christian" and "Catholic" are NOT mutually exclusive
  • Example: "Protestant" and "Catholic" ARE mutually exclusive

Exhaustive:

  • Cover all possible responses
  • "Protestant" and "Catholic" are mutually exclusive but NOT exhaustive (missing Jewish, Hindu, Buddhist, etc.)
  • Solution: Include an "Other" category with space for specific response

Multiple categories:

  • If respondents could belong to more than one (e.g., race), instruct them to choose all that apply

⚖️ Balanced rating scales

Most extreme options should be balanced around a neutral or modal midpoint.

Unbalanced (avoid): Unlikely | Somewhat Likely | Likely | Very Likely | Extremely Likely

Balanced (better): Extremely Unlikely | Somewhat Unlikely | As Likely as Not | Somewhat Likely | Extremely Likely

🎚️ Middle/neutral options

Including a middle option:

  • Allows genuine neutral responses on bipolar dimensions
  • Useful when "neither" is a valid answer

Omitting a middle option:

  • Some researchers leave it out to encourage deeper thinking
  • Prevents default selection of middle option without consideration

Numerical scales:

  • For dimensions like attractiveness, pain, likelihood, a 0-to-10 scale is familiar and easy to use
  • Generally, five or seven options allow as much precision as respondents can provide

📋 Formatting the questionnaire

👋 Introduction functions

Every survey questionnaire needs a written or spoken introduction with two functions:

1. Encourage participation:

  • Briefly explain the survey's purpose and importance
  • Provide information about the sponsor (university-based surveys generate higher response rates)
  • Acknowledge the importance of respondent's participation
  • Describe any incentives

2. Establish informed consent:

  • Topics covered by the survey
  • Amount of time required
  • Option to withdraw at any time
  • Confidentiality issues
  • Must be well-documented and presented clearly and completely to every respondent

Note: Written consent forms are not typically used in survey research, making the oral/written introduction especially important.

📑 Questionnaire organization

After introduction, present:

  1. Clear instructions for completing the questionnaire, including examples of unusual response scales

Item ordering principles:

Start with most important items:

  • Respondents are most interested and least fatigued at the beginning
  • Place most important items for research purposes first
  • Proceed to less important items

Group related items:

  • By topic or by type
  • Items using the same rating scale should be grouped together (faster and easier)

Demographic items last:

  • Least interesting to participants
  • Easy to answer if respondents are tired or bored

End with appreciation:

  • Express thanks to the respondent

Don't confuse: Item order is not arbitrary—strategic placement maximizes data quality by matching item importance to respondent attention and energy levels.

29

Conducting Surveys

Conducting Surveys

🧭 Overview

🧠 One-sentence thesis

Survey research relies on probability sampling methods to produce representative samples that allow researchers to make accurate estimates about populations, while minimizing sampling bias and maximizing response rates.

📌 Key points (3–5)

  • Probability vs nonprobability sampling: probability sampling allows researchers to specify selection probabilities for each population member, enabling more accurate population estimates.
  • Main probability sampling methods: simple random sampling (equal probability for all), stratified random sampling (proportional subgroups), and cluster sampling (groups then individuals).
  • Sampling bias and nonresponse bias: samples can be unrepresentative if selection is flawed or if nonresponders differ systematically from responders.
  • Common confusion: sample size depends on desired confidence level and budget, not population size—a sample of 1,000 can be adequate even for populations of millions.
  • Survey methods trade-offs: in-person interviews have highest response rates but highest cost; Internet surveys have lowest cost but require careful sampling frame construction.

📊 Sampling fundamentals

📊 What sampling means in survey research

Sampling: selecting a sample to study from the population of interest.

  • All psychological research involves sampling, but survey research specifically emphasizes probability sampling.
  • The goal is to make accurate estimates about what is true in a particular population.
  • Example: election outcome predictions require probability samples of likely registered voters because margins are often only a few percentage points.

🎲 Probability vs nonprobability sampling

TypeDefinitionUse in research
Probability samplingResearcher can specify the probability that each population member will be selectedCommon in survey research for accurate population estimates
Nonprobability samplingResearcher cannot specify these probabilitiesMost psychological research; includes convenience sampling
  • Convenience sampling (studying nearby, willing individuals) is a common nonprobability method.
  • Survey researchers prefer probability sampling because estimates are most accurate when based on probability samples.

🗂️ Sampling frame requirement

Sampling frame: essentially a list of all the members of the population from which to select respondents.

  • Probability sampling requires very clear specification of the population first.
  • Population examples: all registered voters in British Columbia, Canadian consumers who purchased a car in the past year, women in Quebec over 40 who received a mammogram in the past decade.
  • Sampling frames can come from telephone directories, voter registration lists, hospital records, insurance records, or even maps (for selecting cities, streets, households).

🎯 Probability sampling methods

🎲 Simple random sampling

Simple random sampling: done so that each individual in the population has an equal probability of being selected for the sample.

  • Traditional method: putting all names in a hat, mixing, and drawing the needed number.
  • Modern method: computerized sorting or selection from computer files.
  • Random-digit dialing: a computer randomly generates phone numbers from possible numbers within a geographic area (common in telephone surveys).

📊 Stratified random sampling

Stratified random sampling: the population is divided into different subgroups or "strata" (usually based on demographic characteristics) and then a random sample is taken from each stratum.

Two main uses:

  1. Matching population proportions: ensure sample subgroup proportions match population proportions.

    • Example: Because about 15.3% of the Canadian population is Asian, stratified sampling can ensure a survey of 1,000 Canadian adults includes about 153 Asian Canadian respondents.
  2. Oversampling small subgroups: sample extra respondents from particularly small subgroups to draw valid conclusions about them.

    • Example: Black Canadians make up about 2.9% of the Canadian population, so a simple random sample of 1,000 might include too few to draw conclusions. Stratified sampling can ensure enough Black Canadian respondents are included.

🗺️ Cluster sampling

Cluster sampling: larger clusters of individuals are randomly sampled and then individuals within each cluster are randomly sampled.

  • Example: To select small-town residents in Saskatchewan, randomly select several small towns, then randomly select several individuals within each town.
  • Why useful: minimizes travel for face-to-face interviewing.
  • Instead of traveling to 200 small towns to interview 200 residents, travel to 10 towns and interview 20 residents of each.

📏 Sample size considerations

📏 How large a sample needs to be

Two factors determine sample size:

  1. Level of confidence desired: larger samples produce statistics closer to the true population value.
  2. Budget: larger samples take more time, effort, and money.
  • Most survey research uses sample sizes ranging from about 100 to about 1,000.

🔍 Why 1,000 is often adequate

The surprising answer about sample size:

  • A sample of 1,000 registered voters is considered good for roughly 25 million registered voters—only about 0.00004% of the population.
  • Confidence intervals shrink with larger samples but at a slower rate:
    • 100 voters: 95% confidence interval is 40% to 60% (if 50% support incumbent)
    • 1,000 voters: 95% confidence interval is 47% to 53%
    • 2,000 voters: 95% confidence interval is 48% to 52%
  • Beyond 1,000, the small increase in confidence is often not worth the additional resources.

The more surprising part:

  • Confidence intervals depend only on sample size, not population size.
  • A sample of 1,000 produces a 95% confidence interval of 47% to 53% regardless of whether the population is one hundred thousand, one million, or one hundred million.

⚠️ Sampling bias

⚠️ What sampling bias is

Sampling bias: occurs when a sample is selected in such a way that it is not representative of the entire population and therefore produces inaccurate results.

  • Probability sampling was developed largely to address sampling bias.
  • Historical example: The Literary Digest straw poll was far off in predicting the 1936 presidential election because mailing lists came from telephone directories and automobile owner lists, over-representing wealthier people who were more likely to vote for one candidate.

📉 Nonresponse bias

Nonresponse bias: occurs when survey nonresponders differ from survey responders in systematic ways.

Why it happens:

  • Almost never does everyone selected for the sample actually respond.
  • Some may have died or moved away.
  • Others decline to participate because they are too busy, not interested in the topic, or don't participate in surveys on principle.

Example from research:

  • Researcher Vivienne Lahaut and colleagues found only about half the sample responded to a mail survey on alcohol consumption after initial contact and two follow-up reminders.
  • The danger: the half who responded might have different alcohol consumption patterns than the half who did not.
  • Testing for nonresponse bias: researchers made unannounced visits to nonresponders' homes (up to five times). They found nonresponders included an especially high proportion of abstainers (nondrinkers), meaning estimates based only on original responders were too high.

🛡️ Minimizing nonresponse bias

Statistical correction methods exist but:

  • They are based on assumptions about nonresponders (e.g., they are more similar to late responders than early responders).
  • These assumptions may not be correct.

Best approach: maximize the response rate

Factors that increase response rates:

MethodEffect
Survey typeIn-person interviews highest, then telephone, then mail and Internet
PrenotificationSend short message informing potential respondents they will be asked to participate soon
Follow-up remindersSend simple reminders to nonresponders after a few weeks
Survey length/complexityKeep questionnaires short, simple, and on topic
IncentivesOffering incentives (especially cash) reliably increases response rates

Ethical note: There are limits to offering incentives that may be so large as to be considered coercive.

📞 Survey methods

📞 Four main ways to conduct surveys

MethodResponse ratePersonal contactCostNotes
In-person interviewsHighestClosestMost expensiveImportant when interviewer must see and judge respondents
Telephone surveysLowerSomeLess costlyTelephone directories less comprehensive today (more cell-only users)
Mail surveysEven lowerNoneLess costly stillMost susceptible to nonresponse bias
Internet surveysVariesNoneLowestBecoming more common; in rapid development

💻 Internet surveys

Advantages:

  • Increasingly easy to construct and use.
  • Low cost.
  • More people are online than ever before.
  • Likely to become the dominant approach in the near future.

Contact methods:

  1. Initial contact by mail with a link to the survey—does not necessarily produce higher response rates than ordinary mail survey.
  2. Initial contact by e-mail with a direct link—works well when the population has known e-mail addresses and regularly uses them (e.g., a university community).
  3. Posting on websites—difficult to get anything approaching a random sample because website visitors are likely different from the population as a whole.

Challenge: For many populations, it is difficult or impossible to find a comprehensive list of e-mail addresses to serve as a sampling frame.

🔍 Common concerns about online data

Three preconceptions addressed by research:

PreconceptionFinding
Internet samples are not demographically diverseInternet samples are more diverse than traditional samples in many domains, though not completely representative
Internet samples are maladjusted, socially isolated, or depressedInternet users do not differ from nonusers on markers of adjustment and depression
Internet-based findings differ from those obtained with other methodsEvidence suggests Internet-based findings are consistent with findings based on traditional methods (e.g., on self-esteem, personality), but more data are needed

🛠️ Online survey tools

Online questionnaire creation tools exist (the excerpt mentions several but notes a caution):

  • Free accounts typically limit the number of questionnaire items and respondents.
  • Useful for small-scale surveys and practicing good questionnaire construction.

Data sovereignty caution:

  • Even when Canadian researchers study Canadian residents, if data is held on US servers, it may be subject to seizure under the Patriot Act.
  • To avoid infringing on rights, use survey sites hosted in Canada or other countries outside North America.

Mechanical Turk (MTurk):

  • Created by Amazon.com, originally for usability testing.
  • Database of over 500,000 workers from over 190 countries.
  • Can deploy simple tasks (e.g., testing different question wording) at very low cost (e.g., a few cents for less than 5 minutes).
  • Lauded as an inexpensive way to gather high-quality data.
30

Overview of Single-Subject Research

Overview of Single-Subject Research

🧭 Overview

🧠 One-sentence thesis

Single-subject research is a quantitative experimental approach that studies a small number of participants intensively to establish causal relationships through repeated measurement and careful control, offering an important alternative to traditional group research in psychology.

📌 Key points (3–5)

  • What it is: studying in detail the behavior of each of a small number of participants (typically 2–10), measuring repeatedly over time under different conditions.
  • How it differs from group research: focuses on individual patterns rather than group averages; analyzes data through visual inspection rather than primarily through inferential statistics.
  • Common confusion: single-subject research is not the same as case studies—it uses experimental manipulation, highly structured data, and quantitative analysis, whereas case studies are descriptive and lack the controls needed for causal inference.
  • Key assumptions: individual differences matter and can be hidden by group averages; causal relationships can be discovered through experimental control; effects should be strong, consistent, and socially important.
  • Who uses it: originally used by psychology's founders (Wundt, Ebbinghaus, Pavlov); refined by B.F. Skinner for experimental analysis of behavior; widely used today in applied behavior analysis, developmental disabilities, education, and clinical settings.

🔬 What single-subject research is and isn't

🔬 Definition and scope

Single-subject research: a type of quantitative research that involves studying in detail the behavior of each of a small number of participants.

  • The term "single-subject" does not mean only one participant; typically 2–10 participants are studied.
  • Also called "small-n designs" (where n is sample size).
  • Contrasts with group research, which studies large numbers of participants and examines behavior primarily through group means and standard deviations.

🚫 Not qualitative research

Single-subject research is quantitative and experimental, not qualitative:

AspectSingle-subject researchQualitative research
FocusUnderstanding objective behaviorUnderstanding subjective experience
Data collectionHighly structuredRelatively unstructured (e.g., detailed interviews)
AnalysisQuantitative techniquesNarrative techniques
MethodExperimental manipulation and controlInterpretive approaches

🚫 Not case studies

Case study: a detailed description of an individual, which can include both qualitative and quantitative analyses.

Why case studies cannot substitute for single-subject research:

  • Internal validity problem: case studies usually cannot determine whether specific events are causally related or even related at all.
    • Example: if a patient was sexually abused as a child and later developed an eating disorder as a teenager, there is no way to determine from the case study whether these events had anything to do with each other.
  • External validity problem: an individual case can always be unusual in some way and therefore unrepresentative of people more generally.

When case studies are useful:

  • Suggesting new research questions
  • Illustrating general principles
  • Understanding rare phenomena (e.g., effects of damage to a specific brain region)

Don't confuse: The excerpt describes Freud's "Anna O." case—she could not drink fluids, and under hypnosis recalled a repressed memory of her companion's dog drinking from a glass. Freud interpreted this as evidence for his theory, but the excerpt notes this is "essentially worthless" as evidence because there is no way to know whether the repression caused the symptom or whether recalling the trauma relieved it.

🎯 Core assumptions of single-subject research

🎯 Focus intensively on individuals

Why this matters:

  • Group research can hide individual differences: a treatment with a positive effect for half the people and a negative effect for the other half would appear to have no effect on average—but single-subject research would reveal these differences.
  • Sometimes one individual is the focus of interest: a school psychologist might want to change the behavior of a particular disruptive student; studying that specific student is more direct and probably more effective than relying only on published group research.

🎯 Discover causal relationships experimentally

Single-subject research is considered a type of experimental research with good internal validity because it involves:

  • Manipulation of an independent variable
  • Careful measurement of a dependent variable
  • Control of extraneous variables

Example: Hall and colleagues measured studying many times—first under no-treatment control, then under treatment (positive teacher attention), then again under control. Because studying clearly increased when treatment was introduced, decreased when removed, and increased when reintroduced, there is little doubt the treatment caused the improvement.

🎯 Study strong, consistent, and important effects

Social validity: treatments should have substantial effects on important behaviors and can be implemented reliably in real-world contexts.

  • Applied researchers are especially interested in effects that matter in practice.
  • Example: Hall's study had good social validity because it showed strong, consistent effects of positive teacher attention on a behavior obviously important to teachers, parents, and students, and the treatment was easy to implement in chaotic elementary school classrooms.

📜 History and current use

📜 Early foundations (late 1800s–early 1900s)

Single-subject research has existed since psychology's founding:

  • Wilhelm Wundt (late 1800s): studied sensation and consciousness by focusing intensively on each of a small number of participants.
  • Herman Ebbinghaus: research on memory.
  • Ivan Pavlov: research on classical conditioning.

📜 B.F. Skinner and experimental analysis of behavior (mid-1900s)

  • Skinner clarified many assumptions and refined techniques of single-subject research.
  • He and others used it to describe how rewards, punishments, and other external factors affect behavior over time.
  • Work carried out primarily with nonhuman subjects (rats and pigeons).
  • This approach, called experimental analysis of behavior, remains an important subfield and relies almost exclusively on single-subject research.
  • See the Journal of the Experimental Analysis of Behaviour for examples.

📜 Applied behavior analysis (1960s–present)

Applied behavior analysis: using the single-subject approach to conduct applied research primarily with humans.

  • Plays an especially important role in research on:
    • Developmental disabilities
    • Education
    • Organizational behavior
    • Health
  • See the Journal of Applied Behaviour Analysis for examples (including Hall's study).

📜 Beyond the behavioral perspective

  • Although most contemporary single-subject research is conducted from the behavioral perspective, it can in principle address questions from any theoretical perspective (cognitive, psychodynamic, humanistic).
  • Example: a studying technique based on cognitive principles could be evaluated by testing it on individual high school students.
  • Clinicians of any theoretical orientation can use the single-subject approach to study therapeutic change with individual clients and document improvement.

🏗️ General design features

🏗️ Repeated measurement over time

  • The dependent variable (y-axis) is measured repeatedly over time (x-axis) at regular intervals.
  • The study is divided into distinct phases, and the participant is tested under one condition per phase.
  • Conditions are often designated by capital letters: A, B, C, etc.

🏗️ Steady state strategy

Steady state strategy: the change from one condition to the next does not occur after a fixed amount of time or number of observations; instead, it depends on the participant's behavior.

How it works:

  • The researcher waits until the participant's behavior in one condition becomes fairly consistent from observation to observation before changing conditions.
  • When the dependent variable has reached a steady state, any change across conditions will be relatively easy to detect.
  • This is the same principle as minimizing "noise" in experimental research—effects are easier to detect when variability is low.

🔄 Reversal designs

🔄 Basic ABA reversal design

Reversal design (ABA design): the most basic single-subject research design, in which a baseline is established, treatment is introduced, and then treatment is removed.

Phases:

  1. Phase A (baseline): establish the level of responding before any treatment; this is a control condition.
  2. Phase B (treatment): introduce the treatment; wait for steady state.
  3. Phase A (return to baseline): remove the treatment; wait for steady state.

Can be extended: ABAB (reintroduce treatment), ABABA (another return to baseline), etc.

Example: Hall's study was an ABAB design. The percentage of time Robbie spent studying was low during first baseline, increased during first treatment, decreased during second baseline, and increased again during second treatment.

🔄 Why the reversal is necessary

Why not just use AB (baseline → treatment)?

  • An AB design is essentially an interrupted time-series design applied to an individual.
  • Problem: if the dependent variable changes after treatment is introduced, it's not always clear the treatment was responsible—something else might have changed at the same time.
  • The reversal increases internal validity: if the dependent variable changes with introduction of treatment and then changes back with removal of treatment, it is much clearer that the treatment (and its removal) is the cause.

Don't confuse: This assumes the treatment does not create a permanent effect.

🔄 Multiple-treatment reversal design

  • A baseline phase is followed by separate phases in which different treatments are introduced.
  • Example: establish baseline of studying (A) → introduce positive attention (B) → switch to mild punishment for not studying (C) → return to baseline → reintroduce treatments (perhaps in reverse order to control for carryover effects).
  • This could be called an ABCACB design.

🔄 Alternating treatments design

  • Two or more treatments are alternated relatively quickly on a regular schedule.
  • Example: positive attention one day, mild punishment the next; or one treatment in the morning, another in the afternoon.
  • A quick and effective way of comparing treatments, but only when treatments are fast acting.

🔀 Multiple-baseline designs

🔀 Why use multiple-baseline designs

Two problems with reversal designs:

  1. Ethical concern: if a treatment is working, it may be unethical to remove it (e.g., if treatment reduces self-injury in a developmentally disabled child).
  2. Dependent variable may not return to baseline: the student might continue to study even after positive attention is removed—this could mean the treatment had a lasting effect (good), or it could mean something else caused the change (confound).

🔀 How multiple-baseline designs work

Multiple-baseline design: a baseline is established for each of several participants (or variables or settings), and the treatment is introduced at a different time for each one.

Key principle: if the dependent variable changes when treatment is introduced for one participant, it might be coincidence. But if it changes when treatment is introduced for multiple participants—especially at different times—it is extremely unlikely to be coincidence.

Example: Ross and Horner studied bullying prevention at three schools. They observed two problem students at each school during baseline, then implemented the program at one school after 2 weeks, at the second school after 2 more weeks, and at the third school after 2 more weeks. Aggressive behaviors dropped shortly after the program was implemented at each school. If they had studied only one school or introduced treatment at all three schools simultaneously, it would be unclear whether the reduction was due to the program or something else (holiday, TV program, weather change). But the same coincidence would have to happen three separate times—very unlikely.

🔀 Variations

VariationWhat variesExample
Across participantsDifferent participants, same dependent variableMultiple students, all measured on studying behavior
Across dependent variablesSame participant, different behaviorsOne office worker measured on sales calls and report writing
Across settingsSame participant, same behavior, different locationsOne child's reading time measured at school and at home

Logic is the same for all: if the dependent variable changes after treatment is introduced in each case—especially at different times—the researcher can be confident the treatment is responsible.

📊 Data analysis through visual inspection

📊 Visual inspection vs. inferential statistics

Visual inspection: plotting individual participants' data, looking carefully at those data, and making judgments about whether and to what extent the independent variable had an effect.

How single-subject research differs from group research in analysis:

  • Group research: combine data across participants, use means/standard deviations/Pearson's r, use inferential statistics to decide whether results generalize to the population.
  • Single-subject research: rely heavily on visual inspection; inferential statistics typically not used.

📊 Three factors in visual inspection

  1. Level: changes in the height of the dependent variable from condition to condition.

    • If the dependent variable is much higher or much lower in one condition than another, this suggests the treatment had an effect.
  2. Trend: gradual increases or decreases in the dependent variable across observations.

    • If the dependent variable begins increasing or decreasing with a change in conditions, this suggests the treatment had an effect.
    • Especially telling when a trend changes directions (e.g., unwanted behavior increasing during baseline but decreasing with treatment).
  3. Latency: the time it takes for the dependent variable to begin changing after a change in conditions.

    • In general, if change begins shortly after a change in conditions, this suggests the treatment was responsible.

📊 Interpreting visual patterns

Strong evidence of treatment effect:

  • Fairly obvious changes in level and trend from condition to condition
  • Short latencies (change happens immediately)

Weak or no evidence of treatment effect:

  • Small changes in level
  • Trend in treatment condition appears to be a continuation of a trend that already began during baseline

📊 Statistical approaches (supplementary)

Although visual inspection is primary, statistical procedures are becoming more common:

  • Parallel to group research: compute mean and standard deviation of each participant's responses under each condition; apply t tests or ANOVA. (Note: averaging across participants is less common.)
  • Percentage of nonoverlapping data (PND): the percentage of responses in the treatment condition that are more extreme than the most extreme response in a relevant control condition.
    • Example: in Hall's study, all measures of Robbie's study time in the first treatment condition were greater than the highest measure in the first baseline → PND of 100%.
    • Greater PND = stronger treatment effect.

Don't confuse: Formal statistical approaches are generally considered a supplement to visual inspection, not a replacement.

⚖️ Single-subject vs. group research debate

⚖️ Disagreements about data analysis

Concerns from group research advocates about visual inspection:

  1. Not sensitive enough to detect weak effects.
  2. Unreliable: different researchers may reach different conclusions about the same data.
  3. Results cannot be clearly summarized or compared across studies (unlike measures of relationship strength used in group research).

Single-subject researchers' response:

  • They share these concerns in general.
  • However, they argue that their use of the steady state strategy helps address these issues (the excerpt cuts off here, so the full argument is not provided).

⚖️ Complementary approaches

The excerpt suggests that single-subject research and group research are probably best conceptualized as complementary approaches rather than competing methods—each has appropriate situations for use.

31

Single-Subject Research Designs

Single-Subject Research Designs

🧭 Overview

🧠 One-sentence thesis

Single-subject research designs establish causal relationships by measuring behavior repeatedly over time in individual participants, using systematic phase changes and visual inspection to determine treatment effects with high internal validity.

📌 Key points (3–5)

  • Core approach: Measure the dependent variable repeatedly at regular intervals, divided into distinct phases (conditions), waiting for steady state before changing conditions.
  • Two main design families: Reversal designs (ABA/ABAB) reintroduce and remove treatment to show causation; multiple-baseline designs introduce treatment at different times across participants/variables/settings.
  • Data analysis method: Primarily visual inspection of graphed data, examining level, trend, and latency rather than inferential statistics.
  • Common confusion: Single-subject designs vs. case studies—single-subject research is controlled and systematic (high internal validity), whereas case studies are descriptive and low in both internal and external validity.
  • When reversal fails: If treatment cannot ethically be removed or effects don't reverse, multiple-baseline designs solve the problem by staggering treatment introduction.

🔬 General design features

📊 Repeated measurement structure

  • The dependent variable (y-axis) is measured repeatedly over time (x-axis) at regular intervals.
  • The study divides into distinct phases, each testing one condition.
  • Conditions are designated by capital letters: A, B, C, etc.
  • Example: A participant might be tested in condition A (baseline), then B (treatment), then A again (return to baseline).

⏱️ Steady state strategy

Steady state strategy: Waiting until the participant's behavior becomes fairly consistent from observation to observation before changing conditions.

  • The change from one condition to the next does not occur after a fixed time or number of observations.
  • Instead, it depends on the participant's behavior reaching consistency.
  • Why it matters: When the dependent variable has reached steady state, any change across conditions is relatively easy to detect—minimizing "noise" in the data.
  • Don't confuse with fixed-interval designs: the timing is flexible and behavior-driven, not predetermined.

🔄 Reversal designs

🔄 Basic ABA structure

Reversal design (ABA design): A design in which a baseline phase (A) is followed by a treatment phase (B), then a return to baseline (A).

  • Phase A (first): Baseline is established—the level of responding before any treatment, serving as a control condition.
  • Phase B: Treatment is introduced; behavior may become variable initially, then reaches a new steady state.
  • Phase A (second): Treatment is removed; researcher waits for steady state again.
  • Can be extended: ABAB (reintroduce treatment), ABABA (another baseline), and so on.

🧪 Why reversal is necessary

  • An AB design alone is essentially an interrupted time-series design—if the dependent variable changes after treatment, something else might have changed at the same time (confound).
  • The reversal logic: If the dependent variable changes with treatment introduction and then changes back with treatment removal, the treatment is much more clearly the cause.
  • This greatly increases internal validity.
  • Limitation: Assumes treatment does not create a permanent effect.

🔀 Multiple-treatment and alternating designs

Design typeStructureUse case
Multiple-treatment reversalBaseline (A) → Treatment 1 (B) → Treatment 2 (C) → return to baseline → reintroduce treatments (e.g., ABCACB)Evaluating more than one treatment; controlling for carryover effects by reversing order
Alternating treatmentsTwo or more treatments alternated quickly on a regular schedule (e.g., one treatment per day, or morning vs. afternoon)Quick comparison of treatments; only works when treatments are fast-acting

⚠️ Problems with reversal designs

  1. Ethical issue: If a treatment is working (e.g., reducing self-injury in a child), removing it may be unethical.
  2. Practical issue: The dependent variable may not return to baseline when treatment is removed.
    • Could mean the treatment had a lasting positive effect.
    • Or could mean the treatment wasn't the real cause—something else happened at the same time (e.g., parents started rewarding good grades).

📈 Multiple-baseline designs

📈 Core logic and structure

Multiple-baseline design: A design in which baselines are established for multiple participants, dependent variables, or settings, and treatment is introduced at a different time for each.

  • Each participant (or variable, or setting) is essentially tested in an AB design.
  • Key feature: Treatment is introduced at a different time for each baseline.
  • Why it works: If the dependent variable changes when treatment is introduced for one participant, it might be coincidence; but if it changes for multiple participants at different introduction times, coincidence is extremely unlikely.

🧑‍🤝‍🧑 Three versions of multiple-baseline

VersionWhat variesExample
Across participantsDifferent individualsStudy bullying behavior in two students at each of three schools; implement program at School 1 after 2 weeks, School 2 after 4 weeks, School 3 after 6 weeks
Across dependent variablesDifferent behaviors in same personOffice worker has two tasks (sales calls, report writing); introduce goal-setting for one task first, then later for the other
Across settingsDifferent locations for same personMeasure child's reading time at school and at home; introduce positive attention at school first, then at home later

🎯 Example: Bullying prevention study

  • Researchers studied two students at each of three schools who regularly engaged in bullying.
  • Observed students for 10 minutes daily during lunch recess, counting aggressive behaviors.
  • After 2 weeks: implemented program at School 1.
  • After 2 more weeks: implemented at School 2.
  • After 2 more weeks: implemented at School 3.
  • Result: Aggressive behaviors dropped shortly after program implementation at each school.
  • Why convincing: For results to be coincidence, something else would have to happen three separate times at exactly the right moments—very unlikely.

📊 Data analysis through visual inspection

👁️ Visual inspection approach

Visual inspection: Plotting individual participants' data, looking carefully at those data, and making judgments about whether and to what extent the independent variable had an effect on the dependent variable.

  • Single-subject research relies heavily on this method, not inferential statistics.
  • Contrasts with group research: group research combines data across participants, uses means/standard deviations, and applies inferential statistics.

📐 Three factors examined

FactorDefinitionWhat it suggests
LevelHow high or low the dependent variable is in one condition vs. anotherMuch higher or lower in one condition → treatment had an effect
TrendGradual increases or decreases across observationsDependent variable begins increasing/decreasing with condition change → treatment effect; especially telling when trend changes direction (e.g., unwanted behavior increasing at baseline, then decreasing with treatment)
LatencyTime it takes for the dependent variable to begin changing after a condition changeShort latency (change happens immediately) → treatment was responsible

🔍 Interpreting patterns

  • Strong treatment effect: Fairly obvious changes in level and trend from condition to condition, with short latencies.
  • Weak or no effect: Small changes in level; trends that appear to be continuations of baseline trends; unclear latency.
  • Example: If an increasing trend in treatment looks like a continuation of a baseline trend, the treatment probably wasn't responsible.

📈 Statistical supplements

  • Formal statistical procedures are becoming more common but are considered a supplement to visual inspection, not a replacement.
  • Approaches:
    • Compute mean and standard deviation for each participant under each condition; apply t-tests or ANOVA.
    • Percentage of nonoverlapping data (PND): the percentage of responses in treatment that are more extreme than the most extreme response in control; greater PND → stronger treatment effect.
  • Single-subject researchers continue to debate which statistical methods are most useful.

🤝 Single-subject vs. group research

⚖️ Points of disagreement

Concerns from group researchers about visual inspection:

  1. Sensitivity: Visual inspection may not be sensitive enough to detect weak effects.
  2. Reliability: Different researchers may reach different conclusions about the same data (unreliable).
  3. Summarization: Results (overall judgment of effectiveness) cannot be clearly and efficiently summarized or compared across studies, unlike measures of relationship strength in group research.

Single-subject researchers' response:

  • They share these concerns.
  • However, they argue that the steady state strategy combined with systematic design features addresses many of these issues.
  • (Note: The excerpt ends mid-sentence here, so the full response is not provided.)

🔗 Complementary approaches

  • Single-subject research and group research are best conceptualized as complementary approaches.
  • Both are quantitative methods that establish causal relationships by manipulating an independent variable, measuring a dependent variable, and controlling extraneous variables.
  • Key similarities: Both try to establish causation through controlled manipulation.
  • Key differences: Focus on individuals vs. groups; visual inspection vs. inferential statistics; intensive repeated measurement vs. between-group comparisons.
32

The Single-Subject Versus Group "Debate"

The Single-Subject Versus Group“Debate”

🧭 Overview

🧠 One-sentence thesis

Single-subject and group research are complementary quantitative approaches with different strengths that are best suited to answering different types of research questions, rather than competing methods where one is superior to the other.

📌 Key points (3–5)

  • Both are quantitative causal methods: Both manipulate independent variables, measure dependent variables, and control extraneous variables to establish causality.
  • Key disagreements center on two issues: data analysis (visual inspection vs. statistics, group means vs. individual effects) and external validity (generalizing from few vs. many participants).
  • Common confusion about generalization: Studying large groups does not automatically solve generalization problems—a treatment with a small average effect in a group may have large positive, small, or negative effects on specific individuals.
  • Each approach has distinct strengths: Single-subject excels at detecting strong, consistent effects in individuals; group research excels at detecting weak effects and interactions at the population level.
  • Research traditions shape method choice: Researchers typically learn to conceptualize questions in ways that fit their training tradition, though integration across traditions is possible and productive.

🔬 Data analysis disagreements

📊 Group researchers' concerns about visual inspection

Group research advocates raise three specific worries about the visual inspection method used in single-subject research:

  • Sensitivity: Visual inspection may not detect weak treatment effects.
  • Reliability: Different researchers examining the same data may reach different conclusions.
  • Comparability: Overall judgments (effective vs. not effective) cannot be easily summarized or compared across studies, unlike statistical measures of relationship strength.

🛡️ Single-subject researchers' response

Single-subject researchers acknowledge these concerns but argue their methodology addresses them:

  • They use the steady state strategy combined with focus on strong and consistent effects to minimize detection problems.
  • If an effect is too weak or noisy to detect visually, they work to increase effect strength or reduce noise by controlling extraneous variables.
  • If the effect remains difficult to detect, they consider it neither strong enough nor consistent enough to warrant further interest.
  • Statistical analysis is increasingly used as a supplement to visual inspection, especially for cross-study comparisons.

Example: If administering a treatment inconsistently makes the effect hard to see, researchers would standardize administration rather than rely solely on statistics to find a weak effect.

📉 Single-subject researchers' concerns about group means

Single-subject researchers point out a critical limitation of focusing on group averages:

Group mean misleading scenario: A treatment might have a strong positive effect on half the participants and an equally strong negative effect on the other half—the group mean would show no effect even though every participant experienced a strong effect.

Group researchers' response:

  • They emphasize examining distributions of individual scores, not just means.
  • A bimodal distribution (from positive and negative effects) would be visible in a histogram.
  • Within-subjects designs allow observation of individual-level effects and specification of what percentage show strong, medium, weak, or negative effects.

🌍 External validity disagreements

🤔 Group researchers' concerns about generalization from few participants

The core worry: How can results from just a few participants generalize to the broader population?

Example scenario from the excerpt: A treatment reduces self-injury in two developmentally disabled children—but will it work for other developmentally disabled children?

🔁 Single-subject researchers' defense

Single-subject researchers share the concern but offer several counterarguments:

  • Strong, consistent effects observed even in small samples are likely to generalize.
  • Replication is emphasized: Researchers repeat studies with different small samples, perhaps with slightly different participant types or conditions—each similar result increases confidence in generality.
  • Historical evidence: Principles of classical and operant conditioning were discovered using single-subject approaches and have successfully generalized across an incredibly wide range of species and situations.

🔄 Single-subject researchers' concerns about group generalization

Single-subject researchers turn the tables with two important points:

Point 1: Large groups don't solve individual generalization

ScenarioGroup-level predictionIndividual-level prediction
Treatment shows small positive effect on average in large studyCan predict small average effect in another large groupCannot predict whether an individual will show small, large, or negative effect

Point 2: Situation generalization problem

  • Group researchers also face generalization challenges when studying a single situation and applying results to other situations.
  • Example from excerpt: Researchers studying cell phone use on a closed oval track want to generalize to real-world driving situations—this requires generalizing from one situation to a population of situations.
  • Generalization depends on careful consideration of similarity between studied and target populations of both participants and situations, not just the number of participants studied.

Don't confuse: More participants ≠ automatic generalization; similarity of participants and contexts matters more than sheer numbers.

🤝 Complementary strengths and appropriate uses

💪 When single-subject research is ideal

Single-subject research is particularly appropriate for:

  • Testing treatment effectiveness on individuals with focus on strong, consistent, and biologically or socially important effects.
  • Situations where specific individuals' behavior is of interest.
  • Clinicians working with one individual at a time—may be the only option for systematic quantitative research.

💪 When group research is ideal

Group research is particularly appropriate for:

  • Testing treatment effectiveness at the group level.
  • Detecting weak effects that may lead to treatment refinements producing larger, more meaningful effects.
  • Studying interactions between treatments and participant characteristics (e.g., treatment effective for high-motivation participants but not low-motivation ones—group designs detect this more efficiently).
  • Questions about non-manipulable independent variables (number of siblings, extraversion, culture) that cannot be addressed with single-subject approaches.

🎓 Research traditions and integration

Different traditions shape method choice:

  • Researchers in experimental analysis of behavior and applied behavior analysis learn to conceptualize questions fitting the single-subject approach.
  • Researchers in most other psychology areas learn to conceptualize questions fitting the group approach.
  • This training factor is probably the most important influence on which approach a researcher uses.

Integration is possible and productive:

Example from the excerpt: Research on innate "number sense"—the awareness of how many objects or events have been experienced without counting.

  • Single-subject research with rats and birds + group research with human infants have shown strikingly similar discrimination abilities.
  • This number sense likely evolved before humans and may be the foundation of advanced mathematical abilities.
  • Demonstrates successful integration of findings from both traditions.
33

American Psychological Association (APA) Style

American Psychological Association (APA) Style

🧭 Overview

🧠 One-sentence thesis

APA style is a standardized set of writing guidelines that facilitates scientific communication in psychology by promoting clarity, consistency, and objectivity in how research is organized, expressed, and cited.

📌 Key points (3–5)

  • What APA style is: a genre of writing appropriate for presenting psychological research in academic and professional contexts, not synonymous with "good writing" in general.
  • Three levels of structure: overall organization (fixed sections), high-level style (formal and straightforward expression), and low-level style (specific formatting rules).
  • Core values reflected: APA style embodies scientific values like objectivity, collaboration, tentativeness of conclusions, and focus on phenomena rather than personalities.
  • Common confusion: APA style vs. other styles—APA is for psychological research; MLA is for literary analysis; AP is for journalism; each genre requires its own appropriate style.
  • References and citations matter: extensive rules for formatting references reflect that science is a large-scale collaboration requiring proper attribution.

📚 What APA style is and why it exists

📖 Definition and purpose

APA style: a set of guidelines for writing in psychology and related fields, set down in the Publication Manual of the American Psychological Association.

  • Originated in 1929 as a short journal article providing basic manuscript standards.
  • Now in its seventh edition and nearly 300 pages long.
  • Primary purpose: facilitate scientific communication by promoting clarity and standardizing organization and content.
  • Makes writing easier (you know what to present and in what order) and reading easier (information appears in familiar, expected ways).

🎭 APA style as a genre

  • APA style is a genre appropriate for presenting psychological research results, especially in academic and professional contexts.
  • It is not synonymous with "good writing" in general.
  • Different writing tasks require different styles:
    • Literary analysis → MLA style
    • Newspaper article → AP style
    • Empirical research report → APA style
  • Part of being a good writer is adopting a style appropriate to the task at hand.

🏗️ The three levels of APA style

📋 Level 1: Overall organization

The first level concerns the structure of an article. Empirical research reports have several distinct sections that always appear in the same order:

SectionPurpose
Title pagePresents article title, author names, and affiliations
AbstractSummarizes the research
IntroductionDescribes previous research and rationale for current study
MethodDescribes how the study was conducted
ResultsDescribes the results of the study
DiscussionSummarizes the study and discusses implications
ReferencesLists references cited throughout the article

✍️ Level 2: High-level style (clear expression)

High-level style includes guidelines for clear expression of ideas, with two important themes:

Theme 1: Formal rather than informal

  • Adopts a tone appropriate for communicating with professional colleagues (researchers and practitioners).
  • These colleagues share interest in the topic but are not necessarily similar to the writer or each other.
  • Example scenario: a graduate student in British Columbia might write for a young psychotherapist in Toronto and a respected professor in Tokyo.
  • Avoids: slang, contractions, pop culture references, humor, and other elements acceptable in informal writing.

Theme 2: Straightforward communication

  • Communicates ideas as simply and clearly as possible.
  • Puts focus on the ideas themselves, not on how they are communicated.
  • Minimizes literary devices (metaphor, imagery, irony, suspense).
  • Uses short, direct sentences.
  • Technical terms are used to improve communication, not to sound more "scientific."
  • Example: write "participants immersed their hands in a bucket of ice water" rather than "were subjected to a pain-inducement apparatus."
  • Don't confuse: straightforward ≠ avoiding technical terms; use "between-subjects design" when that's the clearest way to communicate.

🔧 Level 3: Low-level style (specific rules)

  • Includes all specific guidelines for spelling, grammar, references, citations, numbers, statistics, figures, tables, etc.
  • So many guidelines that even experienced professionals consult the Publication Manual regularly.

🔬 How APA style reflects scientific values

🧪 Style features and their underlying values

The excerpt argues that APA style promotes psychologists' scientific values and assumptions. Many features that seem arbitrary actually make good sense:

APA style featureScientific value or assumption reflected
Very few direct quotations of other researchersPhenomena and theories are objective and don't depend on specific words a particular researcher used
Criticisms directed at work, not personally at researchersFocus is on drawing general conclusions about the world, not on personalities of particular researchers
Many references and citationsScientific research is large-scale collaboration among many researchers
Empirical reports organized with specific sections in fixed orderThere is an ideal approach to conducting empirical research (even if not always achieved)
Researchers "hedge" conclusions (e.g., "results suggest that...")Scientific knowledge is tentative and always subject to revision based on new empirical results

🌍 Avoiding biased language

Another important element is avoiding language biased against particular groups.

Why it matters:

  • Not only to avoid offending people interested in your work
  • For the sake of scientific objectivity and accuracy
  • Example: use "sexual orientation" instead of "sexual preference" because people don't generally experience orientation as a "preference," nor is it as easily changeable as that term suggests

General principles:

  1. Be sensitive to labels: avoid offensive terms or those with negative connotations; avoid terms that identify people with a disorder (e.g., "patients with schizophrenia" not "schizophrenics")
  2. Use specific rather than general terms: e.g., "Chinese Canadians" is better than "Asian Canadians" if everyone in the group is Chinese Canadian
  3. Avoid objectifying participants: acknowledge their active contribution (e.g., "students completed the questionnaire" not "subjects were administered the questionnaire")

Examples of avoiding biased language:

Instead of...Use...
man, menmen and women, people
firemenfirefighters
homosexuals, gays, bisexualslesbians, gay men, bisexual men, bisexual women
minorityspecific group label (e.g., African American)
neuroticspeople scoring high in neuroticism
special childrenchildren with learning disabilities

👥 Note on "subjects" vs. "participants"

  • Previous edition strongly discouraged "subjects" (except for nonhumans) and encouraged "participants"
  • Current edition acknowledges "subjects" can still be appropriate in areas where traditionally used (e.g., basic memory research)
  • Encourages more specific terms when possible: university students, children, respondents, etc.

📝 References and citations

📚 The reference list

At the end of an APA-style article is a list containing references to all works cited in the text (and only works cited in the text).

Format basics:

  • Begins on its own page with heading "References" centered in upper and lower case
  • References listed alphabetically by last name of first author
  • Everything is double-spaced
  • Uses a hanging indent (first line not indented, all subsequent lines are)

📄 Formatting journal articles

Generic format:

Author, A. A., Author, B. B., & Author, C. C. (year). Title of article. Title of Journal, xx (yy), pp–pp. doi:xx.xxxxxxxxxx

Concrete example:

Adair, J. G., & Vohra, N. (2003). The explosion of knowledge, references, and citations: Psychology's unique response to a crisis. American Psychologist, 58 (1), 15–23. doi: 10.1037/0003-066X.58.1.15

Key features:

  • Authors' names in same order as on article (reflects relative contributions)
  • Only last names and initials
  • Names separated by commas with ampersand (&) between last two (even with only two authors)
  • Only first word of article title capitalized (exceptions: proper nouns/adjectives, first word of subtitle)
  • Journal title: all important words capitalized
  • Journal title and volume number italicized; issue number (in parentheses) not italicized
  • DOI (digital object identifier) at end provides permanent link; include if available

📖 Formatting books

Generic format:

Author, A. A. (year). Title of book. Location: Publisher.

Concrete example:

Kashdan, T., & Biswas-Diener, R. (2014). The upside of your dark side. New York, NY: Hudson Street Press.

📑 Formatting book chapters

Generic format:

Author, A. A., Author, B. B., & Author, C. C. (year). Title of chapter. In A. A. Editor, B. B. Editor, & C. C. Editor (Eds.), Title of book (pp. xxx–xxx). Location: Publisher.

Concrete example:

Lilienfeld, S. O., & Lynn, S. J. (2003). Dissociative identity disorder: Multiple personalities, multiple controversies. In S. O. Lilienfeld, S. J. Lynn, & J. M. Lohr (Eds.), Science and pseudoscience in clinical psychology (pp. 109–142). New York, NY: Guilford Press.

Key differences from journal articles:

  • For edited books: editors' names appear as first initial, middle initial, last name (not reversed) with "Eds." or "Ed." in parentheses
  • Only first word of book title capitalized (with noted exceptions); entire title italicized
  • For chapters: page numbers in parentheses after book title with "pp."
  • Both formats end with location and publisher separated by colon

🔗 Reference citations in text

Reference citation: when you refer to another researcher's idea, you must include a citation in the text to the work where that idea originally appeared and a full reference in the reference list.

What must be cited:

  • Phenomena discovered by other researchers
  • Theories they have developed
  • Hypotheses they have derived
  • Specific methods they have used (e.g., questionnaires, stimulus materials)
  • Factual information that is not common knowledge (so others can check it)

What does NOT need citation:

  • Widely shared methodological and statistical concepts (e.g., between-subjects design, t test)
  • Statements so broad they would be difficult to argue with (e.g., "Working memory plays a role in many daily activities")
  • Warning: "common knowledge" about human behavior is often incorrect; when in doubt, cite or remove the assertion

✏️ Two ways to cite in text

Both include only last names of authors and year of publication.

Method 1: Authors' names in the sentence

  • Use authors' last names (no first names or initials) followed immediately by year in parentheses
  • Examples:
    • "Burger (2008) conducted a replication of Milgram's (1963) original obedience study."
    • "Although many people believe that women are more talkative than men, Mehl, Vazire, Ramirez-Esparza, Slatcher, and Pennebaker (2007) found essentially no difference..."

Key features:

  • Authors' names treated grammatically as names of people, not things (better: "a replication of Milgram's (1963) study" not "a replication of Milgram (1963)")
  • Two authors: no comma between names
  • Three or more authors: separated by commas
  • Word "and" (not ampersand) joins authors' names
  • Year follows immediately after final author's name
  • Year only needed first time a work is cited in same paragraph

Method 2: Parenthetical citation

  • Include authors' last names and year in parentheses following the idea being credited
  • Examples:
    • "People can be surprisingly obedient to authority figures (Burger, 2008; Milgram, 1963)."
    • "Recent evidence suggests that men and women are similarly talkative (Mehl, Vazire, Ramirez-Esparza, Slatcher, & Pennebaker, 2007)."

Key features:

  • Often placed at end of sentence to minimize disruption
  • Always includes year, even when citation given multiple times in same paragraph
  • Multiple citations in same parentheses: organized alphabetically by first author's name, separated by semicolons

When to use each method:

  • No strict rules; most articles contain a mixture
  • Method 1 works well when emphasizing the person who conducted research (e.g., comparing theories of two prominent researchers) or describing a particular study in detail
  • Method 2 works well when discussing a general idea, especially when including multiple citations for the same idea

🔤 Using "et al."

Et al.: abbreviation for Latin term "et alia," meaning "and others"

Rule: If an article or book chapter has more than two authors, include all names when first citing that work. After that, use first author's name followed by "et al."

Examples:

  • "Recall that Mehl et al. (2007) found that women and men spoke about the same number of words per day on average."
  • "There is a strong positive correlation between the number of daily hassles and the number of symptoms people experience (Kanner et al., 1981)."

Formatting notes:

  • No comma between first author's name and "et al."
  • No period after "et" (it's a complete word)
  • Period after "al." (it's an abbreviation for "alia")

⚠️ Common APA style errors

The excerpt provides the top 10 most common errors based on analysis of manuscripts submitted to one professional journal over 6 years:

RankError typeExample
1Use of numbersFailing to use numerals for 10 and above
2HyphenationFailing to hyphenate compound adjectives before a noun (e.g., "role playing technique" should be "role-playing technique")
3Use of et al.Failing to use it after a reference is cited for the first time
4HeadingsNot capitalizing headings correctly
5Use of "since"Using "since" to mean "because"
6Tables and figuresNot formatting them in APA style; repeating information already in text
7Use of commasFailing to use comma before "and" or "or" in series of three or more elements
8Use of abbreviationsFailing to spell out a term completely before introducing an abbreviation
9SpacingNot consistently double-spacing between lines
10Use of "&" in referencesUsing "&" in text or "and" in parentheses
34

Writing a Research Report in American Psychological Association (APA) Style

Writing a Research Report in American Psychological Association (APA) Style

🧭 Overview

🧠 One-sentence thesis

An APA-style empirical research report follows a standardized structure—title page, abstract, introduction, method, results, discussion, and references—designed to present new research findings in a clear, replicable, and logically organized manner.

📌 Key points (3–5)

  • What an empirical research report is: an article presenting results of one or more new studies, following APA conventions for structure and formatting.
  • Standard sections and their roles: title page/abstract (overview), introduction (argument for the research question), method (replication recipe), results (statistical findings), discussion (interpretation and implications), references (cited sources).
  • Common confusion—design vs. procedure: design describes the overall structure (independent/dependent variables, manipulation type, operational definitions); procedure describes what participants actually did step-by-step.
  • Key principle for the method section: must be detailed enough that other researchers could replicate the study by following your description.
  • Literature review is an argument: not a list of past studies, but a structured case for why the research question is worth addressing.

📄 Front matter: title page and abstract

📄 Title page elements

  • Title: centered in upper half, important words capitalized, clearly communicates primary variables and research questions in ~12 words or fewer.
  • Sometimes requires a main title + subtitle separated by a colon for clarity.
  • Authors and affiliation: names listed in order reflecting contribution (or alphabetically/randomly if equal); institutional affiliation on the next line.
  • Author note (for submissions): includes full affiliations, acknowledgments, funding, contact information; generally not needed for student papers or theses.

📄 Abstract structure

The abstract is a summary of the study, usually limited to about 200 words.

  • Appears on page 2, headed with "Abstract," first line not indented.
  • Must include: research question, summary of method, basic results, most important conclusions.
  • Challenge: convey all essential information within the strict word limit.

🎯 Introduction: building the argument

🎯 The opening (1–2 paragraphs)

Purpose: introduce the research question and explain why it is interesting.

How to capture attention (Bem's recommendations):

  • Start with general observations about the topic in ordinary language (not technical jargon).
  • Focus on people and their behavior, not researchers or research.
  • Use concrete examples.

Poor opening example:

"Festinger's theory of cognitive dissonance received a great deal of attention during the latter part of the 20th century."

Better opening example:

"The individual who holds two beliefs that are inconsistent with one another may feel uncomfortable. For example, the person who knows that he or she enjoys smoking but believes it to be unhealthy may experience discomfort..."

After capturing attention:

  • Introduce the research question.
  • Explain why it matters: Does it fill a gap? Test a theory? Have practical implications?
  • Motivate readers to continue and help them make sense of the literature review.

🎯 The literature review (several paragraphs to pages)

Purpose: describe relevant previous research and construct an argument for why the research question is worth addressing.

Not a list: the literature review is structured like an argument, not a chronological catalog of studies.

Possible structures:

  • Describe a phenomenon + studies demonstrating it → competing theories → hypothesis to test theories.
  • Describe one phenomenon → describe seemingly inconsistent phenomenon → propose reconciling theory → hypothesis to test theory.
  • (Applied research) Describe phenomenon/theory → how it applies to real-world situation → suggest test of that application.

How to emphasize structure:

  • Start with an outline of main points in the order you want to make them.
  • Begin the literature review by summarizing your argument upfront: "In this article, I will describe two apparently contradictory phenomena, present a new theory..."
  • Open each paragraph with a sentence that summarizes the main point and links to preceding points (these are your transitions).

Transition examples (instead of just "Williams (2004) found that..."):

  • "Another example of this phenomenon comes from the work of Williams (2004)."
  • "Williams (2004) offers one explanation of this phenomenon."
  • "An alternative perspective has been provided by Williams (2004)."

Balance is essential:

  • Your goal is to argue why the research question is interesting, not why your favorite answer is correct.
  • Discuss studies that support a phenomenon, but also those that fail to demonstrate it.
  • Discuss findings consistent with your theory, but also inconsistent findings.
  • It is acceptable to argue that the balance supports a phenomenon or theory, but not to ignore contradictory evidence.
  • Uncertainty about the answer is part of what makes a research question interesting.

🎯 The closing (final 1–2 paragraphs)

Two key elements:

  1. Clear statement of the main research question or hypothesis: more formal and precise than in the opening, often expressed in terms of operational definitions.
  2. Brief overview of the method and comment on its appropriateness: explain how the method addresses the research question.

Example (Darley & Latané, 1968):

"These considerations lead to the hypothesis that the more bystanders to an emergency, the less likely, or the more slowly, any one bystander will intervene... The experiment reported below attempted to fulfill these conditions."

The closing leads smoothly into the method section.

🔬 Method: the replication recipe

🔬 Core principle

The method section should be clear and detailed enough that other researchers could replicate the study by following your "recipe."

What to include: all important elements—participant demographics, recruitment, random assignment, variable manipulation/measurement, counterbalancing, etc.

What to avoid: irrelevant details (e.g., specific classroom number, that questionnaires were double-sided and completed with pencils).

🔬 Participants subsection

  • Heading: "Participants" (left justified, italics), immediately after "Method" (centered).
  • Content: number of participants, gender breakdown, age indication, other relevant demographics, recruitment method, incentives.

🔬 Three common organizational approaches

ApproachSubsectionsWhen to use
SimpleParticipants → Design and ProcedureMethods that are relatively simple, describable in a few paragraphs
TypicalParticipants → Design → ProcedureBoth design and procedure are complicated, each requiring multiple paragraphs
ComplexParticipants → Materials → Design → ProcedureComplicated materials to describe (multiple questionnaires, vignettes, perceptual stimuli, etc.)

🔬 Design vs. Procedure (common confusion)

Design = overall structure:

  • What were the independent and dependent variables?
  • Was the independent variable manipulated? Between or within subjects?
  • How were variables operationally defined?

Procedure = how the study was carried out:

  • Often works well to describe in terms of what participants did (not what researchers did).
  • Example: "participants gave informed consent, read instructions, completed 4 practice trials, completed 20 test trials, completed two questionnaires, were debriefed and excused."

🔬 Materials subsection (when needed)

  • Use when there are complicated materials: multiple questionnaires, written vignettes, perceptual stimuli, etc.
  • Heading can be modified to reflect content: "Questionnaires," "Stimuli," etc.

📊 Results: presenting findings

📊 What to include (and exclude)

The results section presents the main results of the study, including statistical analyses.

  • Does not include raw data (individual participants' responses or scores).
  • Researchers should save raw data and make it available to others who request it.
  • Several journals now encourage open sharing of raw data online.

📊 Preliminary issues (typically addressed first)

  1. Exclusions: Were any participants or responses excluded? Why? Describe the rationale clearly so others can judge appropriateness.
  2. Data combination: How were multiple responses combined to produce primary variables? (e.g., mean attractiveness rating across 20 stimulus people; number correctly recalled vs. percentage vs. correct minus incorrect).
  3. Reliability: Test-retest correlations, Cronbach's α, or other statistics showing measures are consistent across time and items.
  4. Manipulation checks: Was the manipulation successful?

📊 Answering research questions

Organization: tackle primary research questions one at a time, with clear structure (e.g., most general to specific, or main question first then secondary).

Bem's five-step structure for each result:

  1. Remind the reader of the research question.
  2. Give the answer in words.
  3. Present the relevant statistics.
  4. Qualify the answer if necessary.
  5. Summarize the result.

Key point: Only step 3 involves numbers; the rest present the question and answer in words. Basic results should be clear even to a reader who skips the numbers.

💬 Discussion: interpretation and implications

💬 Typical elements

  • Summary of the research
  • Theoretical implications
  • Practical implications
  • Limitations
  • Suggestions for future research

💬 Summary of the study

  • Provides a clear answer to the research question.
  • In a short report with one study: might require only a sentence.
  • In a longer report with multiple studies: might require a paragraph or two.

💬 Theoretical implications

  • Do the results support any existing theories?
  • If not, how can they be explained?
  • You don't need a definitive explanation or detailed theory, but outline one or more possible explanations.

💬 Practical implications

  • Common in applied research, often in basic research too.
  • How can the results be used, and by whom, to accomplish real-world goals?

💬 Limitations

What to discuss: problems with internal/external validity, ineffective manipulation, unreliable measures, evidence of participant misunderstanding or suspicion.

Don't overdo it:

  • All studies have limitations; readers understand different samples/measures might produce different results.
  • Unless there's good reason to think they would have, don't mention routine issues.
  • Pick 2–3 limitations that seem like they could have influenced results, explain how, and suggest ways to deal with them.

💬 Suggestions for future research

Not just a list: a discussion of 2–3 of the most important unresolved issues.

For each issue:

  • Identify and clarify the question.
  • Suggest alternative answers.
  • Suggest ways the question could be studied.

💬 Ending the discussion

  • Some researchers end with a sweeping or thought-provoking conclusion.
  • Example (Darley & Latané, 1968): "If people understand the situational forces that can make them hesitate to intervene, they may better overcome them."
  • Caution: this can be difficult to pull off; may sound overreaching or banal and detract from impact.
  • Often better to simply end when you've made your final point (but avoid ending on a limitation).

📚 References, appendices, tables, and figures

📚 References section

  • Begins on a new page with "References" centered at the top.
  • All sources cited in the text are listed in the format presented earlier in APA guidelines.
  • Alphabetical order: by last name of first author.
  • Same first author: alphabetized by last name of second author.
  • All authors the same: listed chronologically by year of publication.
  • Everything is double-spaced within and between references.

📚 Appendices

An appendix is appropriate for supplemental material that would interrupt the flow of the research report if presented within any major section.

Possible uses: stimulus word lists, questionnaire items, detailed descriptions of special equipment or unusual statistical analyses, references to studies in a meta-analysis.

Formatting:

  • Each appendix begins on a new page.
  • If only one: heading is "Appendix," centered.
  • If more than one: "Appendix A," "Appendix B," etc., in the order first mentioned in the text.

📚 Tables and figures

Both used to present results; figures can also illustrate theories (flowcharts), display stimuli, outline procedures, etc.

Formatting:

  • Each appears on its own page, after any appendices.
  • Tables come before figures.
  • Numbered in order of first mention: "Table 1," "Table 2," "Figure 1," "Figure 2," etc.
  • Tables: brief explanatory title above, important words capitalized.
  • Figures: brief explanatory caption below, only first word of each sentence capitalized (aside from proper nouns).
35

Other Presentation Formats

Other Presentation Formats

🧭 Overview

🧠 One-sentence thesis

Psychological research can be shared through multiple formats beyond traditional journal articles—including review articles, final manuscripts, and conference talks and posters—each adapted to different audiences and purposes.

📌 Key points (3–5)

  • Multiple presentation formats: Research can be presented as review/theoretical articles, final manuscripts (dissertations, theses), or conference presentations (talks and posters).
  • Copy vs final manuscripts: Copy manuscripts are formatted for journal submission with features that aid editing; final manuscripts are reader-friendly versions prepared in their final form.
  • Conference presentations serve interaction: Talks and posters at professional conferences allow researchers to share work and facilitate direct discussion with peers.
  • Common confusion: Copy manuscripts (for submission) vs final manuscripts (for reading)—they differ in spacing, placement of figures, and formatting details.
  • Format follows function: Each presentation type has specific structural requirements and style conventions suited to its purpose and audience.

📝 Written formats beyond empirical reports

📚 Review and theoretical articles

Review articles: summarize research on a particular topic without presenting new empirical results.

Theoretical articles: review articles that present a new theory.

Structure similarities with empirical reports:

  • Include title page, abstract, references, appendixes, tables, and figures
  • Written in the same high-level and low-level APA style
  • Organized logically with clear sections

Key differences:

  • No method or results section (because no new empirical data)
  • Body includes: opening (identifies topic and importance), literature review (organizes previous research, identifies relationships or gaps), closing (summarizes conclusions and suggests future directions)
  • In theoretical articles, much of the body presents the new theory
  • Sections and headings vary by article (unlike the fixed structure of empirical reports)

Don't confuse: Review articles don't just list studies—they should identify important relationships among concepts or gaps in the literature, building an argument.

📄 Copy manuscripts vs final manuscripts

AspectCopy manuscriptsFinal manuscripts
PurposeSubmitted for journal publicationPrepared in final form, not for submission elsewhere
Formatting goalEasier to edit and typesetEasier to read
ExamplesJournal submissionsDissertations, theses, student papers
Table/figure placementAt the endClose to where discussed
SpacingConsistent double-spacingVaries for readability (single for titles, triple between sections)
Additional elementsRunning head, specific formattingMay include longer abstract, acknowledgments page, table of contents

Practical note: For student papers in research methods courses, papers are usually required to be written as copy manuscripts (as though being submitted for publication).

🎤 Conference presentations

🏛️ Professional conferences overview

What they are:

  • Events where researchers share research with each other
  • Range from small-scale (dozen researchers, one afternoon) to large-scale (thousands of researchers, several days)
  • Focus on research (not clinical practice)

Two formal presentation types:

  1. Oral presentations ("talks")
  2. Posters

Submission process:

  • Usually requires submitting an abstract in advance
  • Abstract must be accepted for presentation
  • Peer review is typically less rigorous than for journal manuscripts

🗣️ Oral presentations (talks)

Basic format:

  • Presenter stands before audience and describes research
  • Usually accompanied by slide show
  • Duration: 10–20 minutes, with last few minutes for audience questions
  • At larger conferences, talks grouped into hour-or-two sessions on the same general topic

Key principles for preparation:

  1. Slide count rule: No more than about one slide per minute of talk

  2. Structure follows APA report:

    • Title and authors slide
    • Few slides for background
    • Few slides for method
    • Few slides for results
    • Few slides for conclusions
  3. Presentation style:

    • Look at audience members
    • Speak in conversational tone
    • Less formal than APA writing, more formal than casual conversation
    • Slides are visual aids, not the focus
    • Present main points in bulleted lists or simple tables and figures

📊 Posters

What poster sessions are:

  • One- to two-hour sessions in a large room
  • Presenters set up posters on bulletin boards and stand near them
  • Other researchers circulate, read posters, and talk to presenters
  • Described as "a grown-up version of the school science fair"

Why posters are popular:

  • Encourage meaningful interaction among researchers
  • Increasingly common (example: nearly 2,000 posters across 16 sessions at a recent APA conference)
  • Facilitate direct discussion between presenter and interested researchers

🎨 Poster construction

Physical format:

  • Can be several sheets attached separately to bulletin board
  • More commonly: single large sheet of paper

Content sections (organized into distinct areas):

  • Title
  • Author names and affiliations
  • Introduction
  • Method section
  • Results section
  • Discussion or conclusions section
  • References
  • Acknowledgments
  • Abstract may not be necessary (poster itself is already a brief summary)

Design principles for crowded, social environments:

Design elementRecommendationReason
Font size72 points for title/authors, 28 points for main textReadable from a distance
Text formatBlocked into sentences or bulleted points, not paragraphsEasier to scan quickly
OrganizationColumns flowing top to bottom (not rows across)Multiple people can read simultaneously without bumping
SectionsClear headingsHelps visitors navigate
Visual elementsColorful figures, photos of apparatus, stimulus copies, simulationsAdds visual interest
Decorative elementsUse sparinglyDon't overdo

Presenter responsibilities:

  • Stand by poster
  • Greet visitors
  • Offer to describe research (many presenters do this immediately, using poster as visual aid)
  • Answer questions
  • Be prepared for critical comments
  • Have detailed write-up available or offer to send one
  • Provide contact information for follow-up

Example: A visitor approaches and the presenter offers a brief overview, pointing to relevant sections of the poster while explaining the research question, method, and findings.

Why posters facilitate interaction: Unlike reading a journal article alone, posters allow immediate face-to-face discussion, clarification of methods, and exploration of implications with the researcher who conducted the work.

🎯 Key distinctions summary

Format selection depends on:

  • Audience: Journal readers vs conference attendees vs thesis committee
  • Purpose: Permanent scholarly record vs facilitating discussion vs meeting degree requirements
  • Interaction level: One-way communication (articles) vs two-way dialogue (posters)
  • Detail level: Comprehensive (journal articles) vs summarized (talks and posters)

Don't confuse: All formats follow APA style principles, but talks and posters are "considerably less detailed than APA-style research reports"—their function is to present new research and facilitate interaction, not to provide exhaustive documentation.

36

Describing Single Variables

Describing Single Variables

🧭 Overview

🧠 One-sentence thesis

Describing a single variable involves displaying its distribution through tables and graphs, summarizing its central tendency (mean, median, mode) and variability (range, standard deviation), and locating individual scores within the distribution using percentile ranks or z scores.

📌 Key points (3–5)

  • What distributions show: how scores are spread across the levels of a variable, revealed through frequency tables and histograms.
  • Central tendency measures: mean (average), median (middle score), and mode (most frequent score) each describe the center of a distribution in different ways.
  • Variability measures: range and standard deviation quantify how spread out scores are around the center.
  • Common confusion: mean vs. median—the mean can be pulled far from the center in skewed distributions, making the median a better measure of central tendency in those cases.
  • Locating individual scores: percentile ranks show what percentage of scores fall below a given score; z scores express how many standard deviations a score is from the mean.

📊 Displaying distributions

📋 Frequency tables

A frequency table lists the values of a variable in one column and the frequency (count) of each value in another column.

  • Values typically run from highest to lowest.
  • The table only includes the range of scores actually present in the data.
  • Quickly reveals the range, most/least common scores, and any extreme outliers.

When to use grouped frequency tables:

  • When there are many different scores across a wide range.
  • The first column lists ranges of values (all equal width); the second lists frequencies in each range.
  • Typically use 5–15 ranges.
  • Example: reaction times grouped into 20 ms intervals (141–160, 161–180, etc.).

For categorical variables:

  • Levels are category labels rather than numbers.
  • Order is somewhat arbitrary but often arranged from most to least frequent.

📈 Histograms

A histogram is a graphical display of a distribution, showing the same information as a frequency table but in visual form.

  • The x-axis represents the variable; the y-axis represents frequency.
  • Vertical bars above each level show the count of individuals with that score.
  • For quantitative variables: no gaps between bars (unless a value has zero frequency).
  • For categorical variables: small gaps between bars.
  • Advantage: even quicker and easier to grasp than a frequency table.

🔍 Distribution shapes

Peaks (modality):

  • Unimodal: one distinct peak near the middle with tails tapering in both directions.
  • Bimodal: two distinct peaks (e.g., a depression inventory might show one peak for non-depressed and another for depressed individuals).
  • Distributions with more than two peaks are relatively rare in psychological research.

Symmetry vs. skew:

ShapeDescriptionPeak locationTail direction
SymmetricalLeft and right halves are mirror imagesCenterEqual on both sides
Negatively skewedPeak shifted toward upper endHigh end of rangeLong tail extends toward lower scores
Positively skewedPeak shifted toward lower endLow end of rangeLong tail extends toward higher scores

⚠️ Outliers

An outlier is an extreme score that is much higher or lower than the rest of the scores in the distribution.

  • May represent truly extreme values on the variable of interest (e.g., a clinically depressed person in an otherwise happy sample).
  • May also represent errors, misunderstandings, equipment malfunctions, or similar problems.
  • Don't confuse: a legitimate extreme value vs. a data collection error—both appear as outliers but require different responses.

🎯 Central tendency

📐 The mean

The mean (symbolized M) is the sum of the scores divided by the number of scores.

  • Formula in words: add up all the scores, then divide by how many scores there are.
  • The summation sign (Greek letter sigma) means "sum across all values."
  • N represents the number of scores.

Why the mean is most common:

  • Usually provides a good indication of central tendency.
  • Easily understood by most people.
  • Has statistical properties that make it especially useful for inferential statistics.

Limitation in skewed distributions:

  • The mean can be pulled far in the direction of the skew (toward the longer tail).
  • Example: reaction times of 200, 250, 280, 250 ms have a mean of 245 ms; adding one score of 5,000 ms (inattentive participant) raises the mean to 1,445 ms—greater than 80% of scores and not representative of anyone's typical behavior.

🎚️ The median

The median is the middle score—half the scores in the distribution are less than it and half are greater.

How to find it:

  • Organize scores from lowest to highest.
  • Locate the score in the middle.
  • Example: for scores 2, 3, 3, 4, 8, 12, 14—the median is 4 (three scores below, three above).
  • With an even number of scores: take the value halfway between the two middle scores.

When to prefer the median:

  • For highly skewed distributions, where the mean is pulled too far from the center.
  • The median is more resistant to the influence of outliers.

🔝 The mode

The mode is the most frequent score in a distribution.

  • Example: in a self-esteem distribution, if more students scored 22 than any other value, the mode is 22.
  • The only measure of central tendency that can be used for categorical variables.

🔄 Comparing the three measures

In unimodal, symmetrical distributions:

  • Mean, median, and mode are very close to each other at the peak.

In bimodal distributions:

  • Mean and median tend to fall between the two peaks.
  • The mode is at the tallest peak.

In skewed distributions:

  • The mean differs from the median in the direction of the skew (the longer tail).
  • The median stays closer to the bulk of the data.

Key insight:

  • You are not required to choose just one measure—each provides slightly different information, and all can be useful.

📏 Variability

🎢 What variability measures

Variability is the extent to which scores vary around their central tendency.

  • Two distributions can have the same mean, median, and mode but differ in how spread out the scores are.
  • Low variability: scores cluster tightly around the center.
  • High variability: scores spread across a much greater range.

📐 The range

The range is the difference between the highest and lowest scores in the distribution.

  • Formula: highest score minus lowest score.
  • Example: self-esteem scores from 15 to 24 have a range of 24 − 15 = 9.

Limitation:

  • Misleading when outliers are present.
  • Example: exam scores between 90 and 100 have a range of 10; one student scoring 20 increases the range to 80, giving a false impression of high variability.

📊 The standard deviation

The standard deviation is, roughly speaking, the average distance between the scores and the mean.

What it tells you:

  • How much scores differ from the mean on average.
  • Example: a standard deviation of 1.69 means scores differ from the mean by about 1.69 units on average; a standard deviation of 4.30 means they differ by about 4.30 units on average.

How it's computed (in words):

  1. Find the difference between each score and the mean.
  2. Square each difference (making all values positive).
  3. Find the mean of these squared differences (this is called the variance).
  4. Take the square root of that mean (this is the standard deviation).

Key property:

  • The standard deviation is always positive (because differences are squared).

🔢 Variance

The variance (symbolized SD²) is the mean of the squared differences from the mean.

  • It is itself a measure of variability.
  • Plays a larger role in inferential statistics than in descriptive statistics.
  • The standard deviation is the square root of the variance.

⚙️ N or N − 1 adjustment

Dividing by N:

  • Appropriate when your goal is simply to describe variability in a sample.
  • Emphasizes that variance is the mean of squared differences.

Dividing by N − 1:

  • Most calculators and software use this.
  • Corrects for the tendency of a sample's standard deviation to be slightly lower than the population's.
  • Results in a better estimate of the population standard deviation.
  • Makes sense because researchers typically view their data as a sample from a larger population and want to draw conclusions about that population.

📍 Locating individual scores

📊 Percentile ranks

The percentile rank of a score is the percentage of scores in the distribution that are lower than that score.

How to find it:

  • Count the number of scores lower than the target score.
  • Convert that count to a percentage of the total number of scores.
  • Example: if 32 of 40 scores (80%) are lower than 23, then a score of 23 has a percentile rank of 80 (or "at the 80th percentile").

Common use:

  • Often used to report results of standardized tests of ability or achievement.
  • Example: a percentile rank of 40 on a verbal ability test means you scored higher than 40% of people who took the test.

📐 Z scores

A z score is the difference between an individual's score and the mean of the distribution, divided by the standard deviation.

Formula in words:

  • Subtract the mean from the individual's score.
  • Divide the result by the standard deviation.

What it tells you:

  • How far above or below the mean a score is, expressed in standard deviations.
  • Example: in an IQ distribution with mean 100 and standard deviation 15, a score of 110 has a z score of (110 − 100) ÷ 15 = +0.67 (about two-thirds of a standard deviation above the mean).
  • A score of 85 has a z score of (85 − 100) ÷ 15 = −1.00 (one standard deviation below the mean).

Why z scores are important:

  1. Describe where an individual's score is located within a distribution.
  2. Sometimes used to report standardized test results.
  3. Provide one way of defining outliers (e.g., z scores less than −3.00 or greater than +3.00, meaning more than three standard deviations from the mean).
  4. Play an important role in understanding and computing other statistics.

🔑 Key takeaways

Distributions:

  • Every variable has a distribution showing how scores are spread across levels.
  • Can be described with frequency tables, histograms, and in words (shape, modality, symmetry/skew).

Central tendency:

  • Mean: sum of scores divided by number of scores.
  • Median: middle score.
  • Mode: most common score.
  • Each provides slightly different information; all can be useful.

Variability:

  • Range: difference between highest and lowest scores.
  • Standard deviation: roughly the average distance of scores from the mean.
  • Standard deviation is more informative than range, especially when outliers are present.

Individual score location:

  • Percentile rank: percentage of scores below a given score.
  • Z score: how many standard deviations a score is from the mean.
  • Both provide ways to interpret where an individual stands within a distribution.
37

Describing Statistical Relationships

Describing Statistical Relationships

🧭 Overview

🧠 One-sentence thesis

Statistical relationships—whether differences between groups or correlations between quantitative variables—can be described and quantified using standardized measures like Cohen's d and Pearson's r, which allow researchers to communicate the strength of relationships across different studies and measures.

📌 Key points (3–5)

  • Two basic forms: Statistical relationships appear as differences between groups/conditions (described by means, standard deviations, and Cohen's d) or as correlations between quantitative variables (described by Pearson's r).
  • Effect size measures: Cohen's d quantifies the strength of group differences in standard deviation units; Pearson's r quantifies the strength of correlations on a scale from −1.00 to +1.00.
  • Standardized interpretation: Both measures have conventional benchmarks (small/medium/large) that apply regardless of the specific variable or scale used.
  • Common confusion: "Effect size" does not imply causation—a correlational study can report an effect size, but that does not make the relationship causal.
  • Pitfalls to watch: Pearson's r can be misleading when relationships are nonlinear or when the range of one variable is restricted in the sample.

📊 Describing group differences

📊 Means and standard deviations

  • Group or condition differences are usually described by reporting the mean and standard deviation of each group.
  • Example: In a phobia treatment study, the exposure condition had a mean fear rating of 3.47 (SD = 1.77), the education condition had a mean of 4.83 (SD = 1.52), and the control condition had a mean of 5.56 (SD = 1.21).
  • These descriptive statistics show that both treatments reduced fear compared to the control, and exposure worked better than education.

📏 Cohen's d as effect size

Cohen's d: the difference between two group means divided by the standard deviation, expressing the difference in standard deviation units.

  • Formula in words: subtract one mean from the other, then divide by the standard deviation (often a pooled standard deviation).
  • Conceptually similar to a z-score: it standardizes the difference so it can be compared across studies and measures.
  • Example: A Cohen's d of 0.50 means the two groups differ by half a standard deviation; a d of 1.20 means they differ by 1.20 standard deviations.

📐 Interpreting Cohen's d

StrengthCohen's d value
Small±0.20
Medium±0.50
Large±0.80
  • In the phobia study, the difference between exposure and education conditions was d = 0.82, which is a large effect.
  • The sign (positive or negative) depends on which mean is subtracted from which; it does not affect the strength.

🔍 Why Cohen's d is useful

  • Scale-independent: A d of 0.20 has the same meaning whether you are measuring self-esteem scores, reaction time in milliseconds, number of siblings, or blood pressure.
  • Cross-study comparison: Researchers can combine and compare results from different studies that used different measures.
  • Communication: Makes it easier to talk about the magnitude of a finding in a common language.

⚠️ "Effect size" does not mean causation

  • The term "effect size" can be misleading because it suggests a causal relationship.
  • Example: If exercisers are happier than nonexercisers with d = 0.35, that is a small-to-medium-sized difference—but it is only a causal effect if the study was an experiment with random assignment.
  • In a correlational study, the same d = 0.35 simply describes the size of the difference; it does not prove that exercising caused the happiness difference.
  • Don't confuse: calling a difference an "effect size" does not make the relationship causal.

📈 Describing correlations between quantitative variables

📈 Visualizing correlations

  • Correlations between quantitative variables are often shown using line graphs or scatterplots.
  • Line graphs: used when the x-axis variable has a small number of distinct values (e.g., quartiles of last names).
    • Example: A study found that people with last names later in the alphabet responded faster to consumer offers; each point on the line graph represents the mean response time for one quartile.
  • Scatterplots: used when the x-axis variable has many values (e.g., individual self-esteem scores).
    • Example: A scatterplot of self-esteem scores at two time points shows that higher scores at Time 1 are associated with higher scores at Time 2.

🔗 Positive and negative relationships

  • Positive relationship: higher scores on one variable are associated with higher scores on the other (points go from lower left to upper right).
    • Example: The self-esteem scatterplot shows a positive relationship.
  • Negative relationship: higher scores on one variable are associated with lower scores on the other (points go from upper left to lower right).
    • Example: The last-name study shows a negative relationship—later alphabetical position is associated with faster response time.

🔗 Linear vs nonlinear relationships

  • Linear relationships: points are reasonably well fit by a single straight line.
  • Nonlinear relationships: points are better fit by a curved line.
    • Example: A hypothetical relationship between hours of sleep and depression forms an upside-down U—people who get about eight hours are least depressed, while those who get too little or too much are more depressed.
  • Nonlinear relationships are not uncommon in psychology but require different analysis methods.

📏 Pearson's r as effect size

Pearson's r: a measure of the strength of a correlation between quantitative variables, ranging from −1.00 (strongest negative) through 0 (no relationship) to +1.00 (strongest positive).

  • The sign indicates direction (positive or negative), but the absolute value indicates strength.
    • Example: r = +0.30 and r = −0.30 are equally strong; one is a moderate positive relationship, the other a moderate negative relationship.
  • Like Cohen's d, Pearson's r is called an "effect size" even when the relationship is not causal.

📐 Interpreting Pearson's r

StrengthPearson's r value
Small±0.10
Medium±0.30
Large±0.50
  • These benchmarks help researchers communicate about the magnitude of correlations across different studies and measures.

🧮 How Pearson's r is computed

  • Conceptually, Pearson's r is the "mean cross-product of z-scores."
  • Steps:
    1. Convert all X scores to z-scores (subtract the mean of X, divide by the standard deviation of X).
    2. Convert all Y scores to z-scores (subtract the mean of Y, divide by the standard deviation of Y).
    3. For each individual, multiply their X z-score by their Y z-score to get a cross-product.
    4. Take the mean of all the cross-products—that is Pearson's r.
  • This approach clarifies what Pearson's r represents: how much the two variables co-vary in standardized units.

⚠️ Pitfalls and limitations

⚠️ Nonlinear relationships

  • Pearson's r can be misleading when the relationship is nonlinear.
  • Example: The sleep-and-depression scatterplot shows a fairly strong relationship (a U-shape), but Pearson's r would be close to zero because the points are not well fit by a straight line.
  • Best practice: Make a scatterplot first to confirm the relationship is approximately linear before relying on Pearson's r.

⚠️ Restriction of range

Restriction of range: when one or both variables have a limited range in the sample relative to the population, Pearson's r can underestimate the true strength of the relationship.

  • Example: Suppose there is a strong negative correlation (r = −0.77) between age and enjoyment of hip-hop music in the full population. If you collect data only from 18- to 24-year-olds, the correlation in that restricted age range might be close to zero (r = 0.00), making the relationship seem weak.
  • Best practice: Design studies to avoid restriction of range (e.g., sample a wide range of ages if age is a key variable). Examine your data for possible restriction and interpret Pearson's r accordingly.
  • Don't confuse: A weak correlation in a restricted sample does not mean the relationship is weak in the full population.

🧪 Real-world example: Sex differences

🧪 Hyde's research on sex differences

  • Researcher Janet Shibley Hyde examined numerous studies on psychological sex differences and expressed results as Cohen's d.
  • She always treated the male mean as M₁ and the female mean as M₂, so positive values mean men score higher and negative values mean women score higher.

🧪 Sample findings

VariableCohen's d
Mathematical problem solving+0.08
Reading comprehension−0.09
Smiling−0.40
Aggression+0.50
Attitudes toward casual sex+0.81
Leadership effectiveness−0.02
  • Men and women differ by a large amount on some variables (e.g., attitudes toward casual sex, d = +0.81).
  • On the vast majority of variables, the difference is small—often d < 0.10, which Hyde terms "trivial."
  • Example: The difference in talkativeness mentioned elsewhere in the book was d = 0.06, also trivial.

🧪 The gender similarities hypothesis

  • Although researchers and the public often emphasize sex differences, Hyde argues it makes at least as much sense to think of men and women as fundamentally similar.
  • This perspective is called the "gender similarities hypothesis."
  • Don't confuse: Highlighting a few large differences can obscure the fact that most psychological variables show trivial or small sex differences.
38

Expressing Your Results

Expressing Your Results

🧭 Overview

🧠 One-sentence thesis

Descriptive statistical results must be presented clearly in writing, graphs, or tables following APA style guidelines so that readers can understand findings without referring back to the text.

📌 Key points (3–5)

  • Writing results: Use numerals rounded to two decimal places; write out terms like "mean" in narrative text but use symbols (M, SD) in parentheses.
  • Graphs add information: Graphs should present new information (not repeat text/tables), be as simple as possible, and be interpretable on their own with descriptive captions.
  • Three graph types: Bar graphs for group means, line graphs for correlations with few levels, scatterplots for many levels of quantitative variables.
  • Common confusion: Bar vs line graphs—use bar graphs when the x-axis variable is categorical, line graphs when it is quantitative.
  • Tables for complex data: Use tables to present multiple means/standard deviations or correlation matrices; they must be interpretable independently with clear titles.

✍️ Writing descriptive statistics

✍️ Format rules

  • Always present statistical results as numerals, not words (e.g., "2.00" not "two").
  • Round to two decimal places consistently.
  • Results can appear in narrative text or parenthetically (like citations).

📝 Narrative vs parenthetical presentation

LocationTerm formatSymbol formatExample
Narrative textWrite out "mean" and "standard deviation"Not used"The mean age was 22.43 years with a standard deviation of 2.34."
ParentheticalNot usedUse M and SD"The treatment group (M = 23.40, SD = 9.33)..."

🔄 Parallel construction

  • Express similar or comparable results in similar ways.
  • Example: "The treatment group had a mean of 23.40 (SD = 9.33), while the control group had a mean of 20.87 (SD = 8.45)."
  • Don't confuse: Avoid mixing narrative and parenthetical styles inconsistently within the same sentence structure.

📊 Presenting results in graphs

📊 General graph principles

Three core requirements for APA-style graphs:

  • Add information: Never repeat what's already in text or tables; if a graph is clearer, eliminate the redundant text.
  • Keep it simple: Avoid unnecessary color or decoration.
  • Self-contained: A reader should understand the basic result from the graph and caption alone, without consulting the text.

📐 Technical layout guidelines

Layout requirements:

  • Graph should be slightly wider than tall.
  • Independent variable on x-axis, dependent variable on y-axis.
  • Values increase left-to-right (x-axis) and bottom-to-top (y-axis).

Labels and legends:

  • Axis labels must be clear, concise, and include measurement units (if not in caption).
  • Labels should be parallel to the axis.
  • Legends appear within graph boundaries.
  • Use the same simple font throughout; vary by no more than four points.

Captions:

  • Briefly describe the figure and explain abbreviations.
  • Include units of measurement if not in axis labels.

📊 Bar graphs for group comparisons

Bar graphs: used to present and compare mean scores for two or more groups or conditions.

  • Error bars extend upward and downward from each bar top.
  • Error bars typically represent one standard error in each direction (not standard deviation).
  • The standard error = standard deviation ÷ square root of sample size.
  • Why standard error matters: A difference greater than two standard errors is typically statistically significant, so readers can "see" significance from the graph.

Example: Comparing treatment vs control groups on a severity rating—each bar shows the group mean, error bars show variability.

📈 Line graphs for correlations

Line graphs: used to present correlations between quantitative variables when the independent variable has relatively few distinct levels.

  • Each point represents the mean score on the dependent variable at one level of the independent variable.
  • Include error bars (standard errors).
  • Don't confuse with bar graphs: Line graphs and bar graphs show fundamentally similar relationships (differences in average scores across levels), but convention dictates using line graphs when the x-axis variable is quantitative and bar graphs when it is categorical.

🔵 Scatterplots for many levels

Scatterplots: used for relationships between quantitative variables when the x-axis variable has a large number of levels.

  • Each point represents an individual (not a group mean).
  • No lines connect the points.
  • When x and y variables are conceptually similar and on the same scale, make axes the same length.
  • When multiple individuals fall at the same point: offset points slightly, show the count in parentheses, or make the point larger/darker.
  • The regression line (straight line that best fits the points) can be included.

📋 Presenting results in tables

📋 General table principles

Tables follow the same core rules as graphs:

  • Add important information (don't repeat).
  • Be as simple as possible.
  • Be interpretable on their own.

📊 Tables for means and standard deviations

Most common use: present several means and standard deviations for complex designs with multiple independent and dependent variables.

Formatting requirements:

  • Horizontal lines span the entire table at top, bottom, and just beneath column headings.
  • Every column has a heading (including the leftmost).
  • Use spanning headings across multiple columns to organize information efficiently.
  • Number tables consecutively (Table 1, Table 2, etc.).
  • Provide a brief, clear, descriptive title.

Example: A table showing intentions and attitudes toward unprotected sex as a function of mood (negative/positive) and self-esteem (high/low).

🔗 Correlation matrices

Correlation matrix: a table presenting correlations (usually Pearson's r) among several variables.

  • Only half the table is filled because the other half would contain identical values (correlation of A with B = correlation of B with A).
  • The diagonal (correlation of a variable with itself) is always 1.00, so these are replaced with dashes for readability.
  • Precise values in tables don't need to be repeated in text; instead, note major trends and alert readers to particularly interesting correlations.

Example: A correlation matrix showing relationships between working memory, executive function, processing speed, vocabulary, episodic memory, and age.

🔑 Integration with text

🔑 Avoiding redundancy

  • Precise statistical results appearing in a table or graph should not be repeated in the text.
  • Instead, the writer should:
    • Note major trends.
    • Alert readers to specific details of particular interest.
    • Provide interpretation and context.

🔑 Understanding your descriptive statistics first

Before moving to inferential statistics, thoroughly understand what happened at the descriptive level:

  • Descriptive statistics tell "what happened" in your study.
  • Example: If a treatment group (M = 34.32, SD = 10.45) and control group (M = 21.45, SD = 9.22) have Cohen's d = 1.31, it should be clear from descriptives alone that the treatment worked.
  • Example: If a scatterplot shows an indistinct cloud and r = −.02, it should be clear the variables are essentially unrelated.
  • Don't confuse: Inferential statistics are required for formal reports, but descriptive statistics provide the fundamental understanding of your results.
39

Conducting Your Analyses

Conducting Your Analyses

🧭 Overview

🧠 One-sentence thesis

Preparing and analyzing raw data requires systematic steps—organizing files securely, checking for errors and outliers, conducting preliminary reliability checks, and understanding descriptive patterns before moving to inferential tests.

📌 Key points (3–5)

  • Data preparation first: raw data must be organized, checked for completeness and accuracy, and formatted in spreadsheets before any analysis begins.
  • Preliminary checks matter: assess internal consistency of measures, examine distributions of each variable, compute descriptive statistics, and identify outliers before answering research questions.
  • Outlier decisions require judgment: outliers may reflect errors (exclude with documented criteria) or genuine extreme responses (consider analyzing both with and without them).
  • Common confusion: don't skip straight to inferential statistics—descriptive statistics alone often reveal "what happened" in the study and must be understood first.
  • Exploratory analysis is valuable but risky: examining data from multiple angles can reveal interesting patterns, but chance patterns require replication before being treated as real findings.

📂 Preparing raw data

🔒 Security and backup

  • Remove any information that could identify individual participants.
  • Store data in a secure location (locked room or password-protected computer).
  • Store consent forms separately in another secure location.
  • Make photocopies or backup files and store them in yet another secure location until the project is complete.
  • Professional researchers keep raw data and consent forms for several years in case questions arise later.

✅ Checking for completeness and accuracy

Raw data checking: examining data to ensure they are complete and appear to have been accurately recorded.

  • Look for illegible or missing responses.
  • Identify obvious misunderstandings (e.g., a response of "12" on a 1-to-10 scale).
  • Decide whether problems are severe enough to exclude a participant's data.
  • If main independent or dependent variable information is missing, or if several responses are missing or suspicious, consider exclusion.
  • Important: never throw away or delete excluded data—set them aside and keep notes about why you excluded them, because you will need to report this information.

📊 Formatting in spreadsheets

Standard format:

  • Each row = one participant
  • Each column = one variable (with variable name at top)
  • First column typically contains participant identification numbers
  • Followed by demographics, independent variables, dependent variables

Categorical variables: can be entered as labels (e.g., "M" and "F") or numbers (e.g., "0" and "1"); some programs allow both.

Multiple-response measures: enter each response as a separate variable in the spreadsheet rather than combining by hand—this approach is more accurate, allows error detection, enables internal consistency assessment, and permits individual response analysis later.

Example: For a self-esteem measure with four items, create four separate columns (SE1, SE2, SE3, SE4) and use software functions to compute the total.

🔍 Preliminary analyses

🧪 Assessing internal consistency

  • For multiple-response measures, check the reliability of the measure.
  • Statistical programs can compute Cronbach's alpha or Cohen's kappa.
  • If those are beyond your comfort level, compute and evaluate a split-half correlation.

📈 Analyzing individual variables

Not necessary for manipulated independent variables (because the researcher determined the distribution).

For each important variable:

  • Make histograms
  • Note the shapes of distributions
  • Compute common measures of central tendency and variability
  • Understand what the statistics mean in terms of your actual variables

Example: A distribution of happiness ratings on a 1-to-10 scale might show: unimodal, negatively skewed, mean = 8.25, SD = 1.14. This means most participants rated themselves fairly high on happiness, with a small number rating themselves noticeably lower.

🎯 Identifying and handling outliers

🔎 Examine outliers closely

Possible causes:

  1. Data entry error: response entered incorrectly in the data file → simply correct it and move on
  2. Participant error or misunderstanding: e.g., a reaction time of 3 minutes when most took a few seconds → likely didn't understand the task
  3. Genuine extreme response: e.g., reporting 60–70 sexual partners when most report fewer than 15 → could be honest and accurate

⚖️ Decision strategies

SituationStrategy
Clear error or misunderstanding with large impact on mean/SDCan justify exclusion; keep notes on criteria and apply consistently
Possibly genuine extreme responseUse median and other robust statistics, or analyze both with and without outliers
Results same either wayLeave outliers in
Results differReport both analyses and discuss the differences

Critical rules:

  • Keep notes on which responses or participants you excluded and why
  • Apply the same criteria consistently to every response and every participant
  • Report how many you excluded and the specific criteria used
  • Never literally throw away or delete excluded data—set them aside for possible later review

🎯 Answering research questions

📊 Primary analyses

For group/condition differences:

  • Compute relevant group or condition means and standard deviations
  • Make a bar graph to display results
  • Compute Cohen's d

For correlations between quantitative variables:

  • Make a line graph or scatterplot
  • Check for nonlinearity and restriction of range
  • Compute Pearson's r

🎣 Exploratory analysis

The excerpt quotes advice to examine data from every angle:

  • Analyze subgroups separately (e.g., sexes)
  • Create new composite indexes
  • Look for additional evidence for new hypotheses suggested by the data
  • Reorganize data to bring dim patterns into bolder relief

Caution: Complex data sets are likely to include "patterns" that occurred entirely by chance. Results discovered while "fishing" should be replicated in at least one new study before being presented as new phenomena.

💡 Understanding your results

📖 Descriptive statistics tell the story

Descriptive statistics really tell "what happened" in the study.

Beginning researchers sometimes forget this and jump straight to inferential statistics.

Example showing descriptive clarity:

  • Treatment group: mean = 34.32, SD = 10.45 (n=50)
  • Control group: mean = 21.45, SD = 9.22 (n=50)
  • Cohen's d = 1.31 (extremely strong)
  • Even before any inferential test, it should be clear from descriptives alone that the treatment worked.

Another example:

  • Scatterplot shows an indistinct "cloud" of points
  • Pearson's r = −.02 (trivial)
  • Clear from descriptives alone that variables are essentially unrelated

🔄 Proper sequence

  1. First: thoroughly understand your results at a descriptive level
  2. Then: move on to inferential statistics (which will be covered in the next chapter)
  3. Both are required for formal reports, but descriptive understanding must come first

Don't confuse: Inferential statistics are important for deciding whether sample results apply to the population, but they don't replace the need to understand what actually happened in your sample.

40

Understanding Null Hypothesis Testing

Understanding Null Hypothesis Testing

🧭 Overview

🧠 One-sentence thesis

Null hypothesis testing is a formal method that helps researchers decide whether a statistical relationship found in a sample reflects a real relationship in the population or merely occurred by chance due to sampling error.

📌 Key points (3–5)

  • Purpose: to distinguish between two interpretations of sample statistics—either there is a real relationship in the population, or the sample relationship is just sampling error.
  • Core logic: assume the null hypothesis (no relationship in population) is true, calculate how likely the sample result would be under that assumption (the p value), then reject or retain the null hypothesis based on that probability.
  • What determines significance: both relationship strength and sample size matter—stronger relationships and larger samples make it more likely to reject the null hypothesis.
  • Common confusion: statistical significance ≠ practical significance; even very weak relationships can be statistically significant with large enough samples.
  • The p value misunderstanding: p is NOT the probability the null hypothesis is true; it is the probability of obtaining the sample result IF the null hypothesis were true.

🎯 The purpose and problem

🎯 Why we need null hypothesis testing

  • Researchers measure variables in a sample but want to draw conclusions about the population.
  • Sample statistics (like means or correlations) are not perfect estimates of population parameters.
  • Random variability exists from sample to sample, called sampling error.

Sampling error: random variability in a statistic from sample to sample (the term "error" does not imply a mistake).

🤔 The interpretation problem

Every statistical relationship in a sample can be interpreted two ways:

  • There IS a relationship in the population, and the sample reflects it.
  • There is NO relationship in the population, and the sample relationship is only sampling error (occurred "by chance").

Example: A correlation of r = −.29 in a sample might mean a negative relationship exists in the population, OR it might mean no relationship exists and the sample value is just random variation.

🧮 The logic and process

🧮 Two competing hypotheses

HypothesisSymbolMeaning
Null hypothesisH₀No relationship in the population; sample relationship is only sampling error
Alternative hypothesisH₁There IS a relationship in the population; sample reflects this real relationship

🔄 The decision steps

  1. Assume the null hypothesis is true (no relationship in population).
  2. Determine how likely the sample relationship would be if H₀ were true.
  3. Decide:
    • If the sample result would be extremely unlikely under H₀ → reject the null hypothesis in favor of H₁.
    • If the sample result would NOT be extremely unlikely under H₀ → retain the null hypothesis (never say "accept").

📊 The p value and alpha criterion

p value: the probability of obtaining the sample result (or more extreme) IF the null hypothesis were true.

Alpha (α): the criterion probability, almost always set to .05 (5%).

  • Low p value (less than .05) → sample result unlikely under H₀ → reject H₀ → result is statistically significant.
  • High p value (greater than .05) → sample result not unlikely under H₀ → retain H₀.

Don't confuse: The p value is NOT "the probability the null hypothesis is true" or "the probability the result occurred by chance." It is the probability of the sample result assuming H₀ is true.

⚖️ What determines statistical significance

⚖️ Two key factors

The p value (and thus the decision) depends on exactly two things:

  1. Relationship strength in the sample (e.g., Cohen's d or Pearson's r).
  2. Sample size (N).

The rule: Stronger relationships and larger samples → lower p values → more likely to reject H₀.

📏 Intuitive guidelines

The excerpt provides a rough table showing how strength and size combine:

Sample sizeWeak relationshipMedium relationshipStrong relationship
Small (N=20)NoNoMaybe/Yes
Medium (N=50)NoYesYes
Large (N=100)Maybe/YesYesYes
Extra large (N=500)YesYesYes

Key insights:

  • Weak relationships on small/medium samples are never significant.
  • Strong relationships on medium+ samples are always significant.
  • With very large samples, even weak relationships become significant.

Example: A study with 500 women and 500 men showing Cohen's d = 0.50 would be highly unlikely if there were truly no difference in the population. But a study with 3 women and 3 men showing d = 0.10 would be quite likely even with no population difference.

🧠 Developing intuition

Understanding these two factors lets you:

  • Predict whether a result will be significant before running formal tests.
  • Detect errors in your analyses when results don't match expectations.
  • Demonstrate you understand the logic, not just the computations.

⚠️ Statistical vs. practical significance

⚠️ A critical distinction

Practical significance: the importance or usefulness of a result in a real-world context (also called "clinical significance" in clinical practice).

The problem: Statistical significance does NOT mean the result is strong or important.

  • A very weak relationship can be statistically significant if the sample is large enough.
  • The word "significant" misleads people into thinking the effect is large or important.

🔍 Real-world implications

Example from the excerpt: Sex differences in mathematical problem-solving and leadership ability are statistically significant, which might lead people to think these differences are large and important enough to influence college course choices or voting decisions. However, these differences are actually quite weak—even "trivial"—despite being statistically significant.

Don't confuse:

  • Statistical significance = the result is unlikely to be due to chance alone.
  • Practical significance = the result is large enough to matter in real applications.

A new treatment for social phobia might produce a statistically significant positive effect, but if the effect is small and other easier/cheaper treatments work almost as well, the result lacks practical significance.

41

Some Basic Null Hypothesis Tests

Some Basic Null Hypothesis Tests

🧭 Overview

🧠 One-sentence thesis

The t test and ANOVA are the most common null hypothesis tests for comparing means, while Pearson's r test evaluates whether a correlation exists in the population.

📌 Key points (3–5)

  • Three types of t tests: one-sample (sample mean vs hypothetical population mean), dependent-samples (same participants, two conditions), and independent-samples (two separate groups).
  • ANOVA for multiple groups: used when comparing more than two means; produces an F ratio that compares between-group variance to within-group variance.
  • Critical values and p values: either use software to get p values directly, or compare computed test statistics to critical values in tables to decide whether to reject the null hypothesis.
  • Common confusion—one-tailed vs two-tailed: one-tailed tests require predicting the direction before data collection and only reject in that direction; two-tailed tests reject if the result is extreme in either direction.
  • Pearson's r test: evaluates whether a correlation in the sample reflects a real relationship in the population (null hypothesis: ρ = 0).

📊 The t Test family

📊 What all t tests share

  • All t tests compare means and produce a t statistic that follows a known distribution when the null hypothesis is true.
  • The distribution of t is unimodal, symmetrical, centered at zero, and its exact shape depends on degrees of freedom (df).
  • Decision rule: if p < .05, reject the null hypothesis; if p ≥ .05, retain it.
  • Software (online tools, Excel, SPSS) computes both t and p; alternatively, compare the computed t to critical values in a table.

🔬 One-sample t test

One-sample t test: compares a sample mean (M) with a hypothetical population mean (μ₀) that provides an interesting standard of comparison.

  • Null hypothesis: the population mean equals the hypothetical mean (μ = μ₀).
  • Alternative hypothesis: the population mean differs from the hypothetical mean (μ ≠ μ₀).
  • Formula in words: t = (sample mean minus hypothetical mean) divided by (sample standard deviation divided by square root of sample size).
  • Degrees of freedom: N − 1.
  • Example: A health psychologist tests whether students accurately estimate calories in a cookie (actual = 250). Sample of 10 students has mean 212 and SD 39.17. The computed t is extreme enough (p = .013) to reject the null hypothesis—students underestimate.

🔄 Dependent-samples t test

Dependent-samples t test (also called paired-samples t test): compares two means for the same sample tested at two different times or under two different conditions.

  • Appropriate for pretest-posttest designs or within-subjects experiments.
  • Key step: reduce each participant's two scores to a single difference score (subtract one from the other).
  • The test then becomes a one-sample t test on the difference scores, with hypothetical population mean μ₀ = 0.
  • Null hypothesis: mean difference in the population is zero.
  • Example: Testing a training program to improve calorie estimates. Pretest and posttest for 10 participants yield difference scores with mean 8.50 and SD 27.27. The t score is not extreme enough (p = .148), so the null hypothesis is retained—no evidence the program works.

⚖️ Independent-samples t test

Independent-samples t test: compares the means of two separate samples (M₁ and M₂).

  • Used for between-subjects experiments (different conditions) or correlational designs (preexisting groups like men vs women).
  • Null hypothesis: the two population means are equal (μ₁ = μ₂).
  • The formula is more complex because it accounts for two sample means, two standard deviations, and two sample sizes.
  • Degrees of freedom: N − 2 (total sample size minus 2).
  • Example: Comparing calorie estimates of junk food eaters (n=8, M=168.12) vs non-junk food eaters (n=7, M=220.71). The computed t is extreme (p = .015), so reject the null hypothesis—the two groups differ.

🔀 One-tailed vs two-tailed tests

  • Two-tailed test: reject the null if the sample result is extreme in either direction; use when you have no strong expectation about direction.
  • One-tailed test: reject only if the result is extreme in one pre-specified direction; use when you have good reason to expect a specific direction.
  • Trade-off: one-tailed tests have less extreme critical values (easier to reject if the result goes the expected way), but you cannot reject if the result goes the opposite way, no matter how extreme.
  • Don't confuse: the decision between one-tailed and two-tailed must be made before collecting data, based on theoretical expectations.

🧮 Analysis of Variance (ANOVA)

🧮 When and why ANOVA

  • Used when comparing more than two group means.
  • The one-way ANOVA is for between-subjects designs with a single independent variable.
  • Null hypothesis: all population means are equal (μ₁ = μ₂ = … = μ_G).
  • Alternative hypothesis: not all population means are equal (at least one differs).

📐 The F statistic

F statistic: the ratio of two estimates of population variance—mean squares between groups (MS_B) divided by mean squares within groups (MS_W).

  • MS_B: based on differences among the sample means (how much groups differ from each other).
  • MS_W: based on differences within each group (variability of individual scores around their group mean).
  • Formula in words: F = MS_B ÷ MS_W.
  • The F distribution is unimodal, positively skewed, and clusters around 1 when the null hypothesis is true.
  • Degrees of freedom: between-groups df = G − 1 (number of groups minus one); within-groups df = N − G (total sample size minus number of groups).

📋 ANOVA table and decision

  • Software outputs an ANOVA table showing sums of squares (SS), degrees of freedom (df), mean squares (MS), F ratio, p value, and critical F.
  • Decision: if p < .05, reject the null hypothesis (conclude the group means are not all the same); if p ≥ .05, retain it.
  • Example: Comparing calorie estimates of psychology majors (M=187.50), nutrition majors (M=195.00), and dieticians (M=238.13). F = 9.92, p = .0009. Reject the null hypothesis—the three groups differ.

🔍 Post hoc comparisons

  • A significant ANOVA tells you "not all means are equal" but not which specific pairs differ.
  • Post hoc comparisons: follow-up tests comparing selected pairs of means.
  • Problem with multiple t tests: conducting many t tests inflates the risk of Type I error (mistakenly rejecting a true null hypothesis).
  • Solution: use modified t test procedures (Bonferroni, Fisher's LSD, Tukey's HSD) that keep the overall Type I error rate near 5%.
  • Don't confuse: post hoc tests are done after finding a significant ANOVA result, not instead of the ANOVA.

🔁 Repeated-measures ANOVA

  • Used for within-subjects designs where the same participants are tested under different conditions or at different times.
  • Main advantage: can measure and subtract stable individual differences from MS_W, making the test more sensitive (higher F, easier to detect real effects).
  • Example: Some participants are naturally faster or slower in reaction time; in a within-subjects design, these stable differences can be removed from the error term.

🏗️ Factorial ANOVA

  • Used for factorial designs with more than one independent variable.
  • Produces separate F ratios and p values for each main effect and each interaction.
  • Example: Testing participant major (psychology vs nutrition) and food type (cookie vs hamburger) would yield three F ratios—one for the main effect of major, one for the main effect of food type, and one for the interaction.
  • Modifications depend on whether the design is between-subjects, within-subjects, or mixed.

🔗 Testing Pearson's r

🔗 Null hypothesis test for correlations

Test of Pearson's r: evaluates whether a correlation observed in a sample reflects a real relationship in the population.

  • Null hypothesis: no relationship in the population (ρ = 0, where ρ is the Greek letter rho representing the population correlation).
  • Alternative hypothesis: there is a relationship in the population (ρ ≠ 0).
  • Can be one-tailed (if you expect a specific direction) or two-tailed (if you have no directional expectation).

📊 How the test works

  • Pearson's r from the sample can be converted to a t score with N − 2 degrees of freedom, or treated as its own test statistic.
  • Software computes Pearson's r and provides the associated p value.
  • Decision: if p < .05, reject the null hypothesis (conclude there is a relationship); if p ≥ .05, retain it.
  • Alternatively, compare the sample r to critical values of r in a table (organized by sample size and whether the test is one-tailed or two-tailed).

🧪 Example

  • A health psychologist examines the correlation between calorie estimates and weight in 22 students.
  • He conducts a two-tailed test (no directional expectation).
  • Pearson's r = −.21, p = .348.
  • Because p > .05, he retains the null hypothesis—no evidence of a relationship between calorie estimates and weight.
  • If computing by hand: critical value for df = 20 (N − 2) is .444 for a two-tailed test; since |−.21| < .444, the result is not significant.

🛠️ Practical workflow

🛠️ Using software vs tables

  • Modern practice: enter data into online tools, Excel, or SPSS; software computes the test statistic and p value automatically.
  • Manual approach: compute the test statistic by hand, then look up the critical value in a table; if the computed statistic is more extreme than the critical value, p < .05.
  • Tables provided in the excerpt:
    • Table of critical t values (for various degrees of freedom, one-tailed and two-tailed).
    • Table of critical F values (for various between-groups and within-groups df).
    • Table of critical r values (for various sample sizes, one-tailed and two-tailed).

🎯 Key decision points

DecisionWhat it meansWhen to do it
One-tailed vs two-tailedPredict direction or notBefore data collection, based on theory
Which t testOne-sample, dependent, or independentDepends on design: one group vs hypothetical mean, same participants twice, or two separate groups
ANOVA vs t testMore than two means or just twoUse ANOVA for three or more groups; t test for two
Post hoc testsWhich specific pairs differAfter a significant ANOVA result

⚠️ Don't confuse

  • Dependent-samples vs independent-samples: dependent means the same participants measured twice; independent means two separate groups.
  • Rejecting vs retaining: rejecting the null means concluding there is an effect; retaining means there is not enough evidence to conclude there is an effect (not the same as proving no effect exists).
  • Critical value direction: for one-tailed tests, use only the critical value in the expected direction; for two-tailed tests, use both positive and negative critical values.
42

Additional Considerations in Null Hypothesis Testing

Additional Considerations

🧭 Overview

🧠 One-sentence thesis

Null hypothesis testing, while dominant in psychology, carries inherent risks of Type I and Type II errors and faces significant criticisms that researchers address through effect sizes, confidence intervals, and attention to statistical power.

📌 Key points (3–5)

  • Two types of errors: Type I (rejecting a true null hypothesis) and Type II (retaining a false null hypothesis) both occur due to sampling variability and design limitations.
  • Statistical power matters: the probability of correctly rejecting a false null hypothesis depends on sample size and expected relationship strength; adequate power (≥.80) should be ensured before data collection.
  • The file drawer problem: statistically significant results are more likely to be published than nonsignificant ones, distorting the published literature toward overestimating effect strengths.
  • Common confusion: a p-value of .05 does NOT mean 95% confidence the result will replicate; power and effect size determine replication likelihood.
  • Solutions to criticisms: researchers should report effect sizes, use confidence intervals, avoid rigid p-value cutoffs, and share nonsignificant results.

⚠️ Understanding errors in hypothesis testing

⚠️ Type I errors (false positives)

Type I error: rejecting the null hypothesis when it is actually true in the population.

  • Means concluding there is a relationship when none exists.
  • Occurs because sampling error alone can occasionally produce extreme results even when the null hypothesis is true.
  • When α = .05 and the null hypothesis is true, researchers will mistakenly reject it 5% of the time.
  • Why α is called the "Type I error rate": it directly sets the probability of this mistake.

Example: A researcher concludes a therapy works better than placebo when in reality both are equally effective—the observed difference was due to chance sampling variation.

⚠️ Type II errors (false negatives)

Type II error: retaining (failing to reject) the null hypothesis when it is actually false in the population.

  • Means concluding there is no relationship when one actually exists.
  • Occurs primarily because the research design lacks adequate statistical power (often due to small sample size).
  • The probability of Type II error = 1 − statistical power.

Example: A researcher concludes a therapy has no effect when it actually does help—the study simply didn't have enough participants to detect the real effect.

⚖️ The tradeoff between error types

ActionEffect on Type I errorEffect on Type II error
Lower α (e.g., from .05 to .01)Decreases (harder to reject true nulls)Increases (harder to reject false nulls too)
Raise α (e.g., from .05 to .10)Increases (easier to reject true nulls)Decreases (easier to reject false nulls)
  • The convention of α = .05 represents an agreed-upon balance keeping both error rates at acceptable levels.
  • Don't confuse: making one type of error less likely automatically makes the other more likely when only adjusting α.

📁 The file drawer problem

📁 What gets published vs what gets filed away

File drawer problem: the tendency for statistically significant results to be submitted and published while nonsignificant results are not submitted or not accepted, ending up "filed away."

  • When researchers obtain significant results → they submit for publication → editors/reviewers tend to accept.
  • When researchers obtain nonsignificant results → they often don't submit → or if submitted, editors/reviewers tend to reject.
  • These nonsignificant results end up in a file drawer (or computer folder).

📁 How this distorts the literature

  • The published literature contains a higher proportion of Type I errors than statistical theory alone would predict.
  • Even when a real relationship exists, the published literature overstates its strength.

Example: Suppose the true population correlation is weak and positive (ρ = +.10). Multiple studies will produce results ranging from weak negative (r = −.10) to moderately strong positive (r = +.40) due to sampling error. Only the moderate-to-strong positive results get published, making the effect appear stronger than it really is.

📁 Potential solutions

  • Blind evaluation: journal editors and reviewers evaluate research without knowing the results—judging only whether the question is interesting and the method sound.
  • Share nonsignificant results: researchers should keep and widely share their nonsignificant findings (e.g., at conferences).
  • Specialized journals: some disciplines now have journals devoted to publishing nonsignificant results (e.g., Journal of Articles in Support of the Null Hypothesis).

📁 P-hacking and the p-curve

  • In 2014, researchers accused the field of creating too many Type I errors by "p-hacking"—using sophisticated statistical techniques to chase a significant p-value.
  • P-curve: a proposed tool to determine whether a dataset with a certain p-value is credible or reflects multiple attempts to find significance.
  • This contributed to major conversations about publishing standards and result reliability.

💪 Statistical power

💪 What power measures

Statistical power: the probability of rejecting the null hypothesis given the sample size and expected relationship strength in the population.

  • Power is the complement of Type II error probability: Power = 1 − P(Type II error).
  • Example: with 50 participants and an expected r = +.30, power = .59 (59% chance of correctly rejecting a false null hypothesis; 41% chance of Type II error).

💪 Why power matters before data collection

  • Researchers should ensure adequate power before collecting data to avoid Type II errors.
  • Common guideline: power of .80 is adequate (80% chance of rejecting the null hypothesis for the expected relationship strength).

💪 Sample sizes needed for adequate power

The excerpt provides this table for achieving power = .80:

Relationship strengthIndependent-samples t-testTest of Pearson's r
Strong (d = .80, r = .50)52 participants28 participants
Medium (d = .50, r = .30)128 participants84 participants
Weak (d = .20, r = .10)788 participants782 participants
  • Key insight: weak relationships require very large samples for adequate power.

💪 What to do with inadequate power

Example scenario: 20 participants per condition, expecting medium difference (d = .50) → power = only .34 (one in three chance of rejecting null; two in three chance of Type II error).

Two strategies to increase power:

  1. Increase relationship strength: use stronger manipulation or control extraneous variables better (e.g., within-subjects instead of between-subjects design).
  2. Increase sample size: the usual strategy; for any expected relationship strength, some sample size will achieve adequate power.

Computing power: online tools and programs (like G*Power) allow researchers to compute power by entering sample size, expected relationship strength, and α level.

🔍 Criticisms of null hypothesis testing

🔍 Misunderstandings by researchers

  • P-value misinterpretation: many believe p is the probability the null hypothesis is true (it's actually the probability of the sample result if the null were true).
  • Replication probability error: many believe 1 − p is the probability of replicating a significant result.
    • In one study, 60% of professional researchers thought p = .01 meant 99% replication chance.
    • Reality: even with a large population difference, the example would require 26 participants per sample for .80 power and 59 per sample for .99 power.

🔍 Problems with the .05 convention

  • The rigid dividing line (reject if p < .05, retain if p > .05) makes little sense to many critics.
  • Example: two similar studies, one with p = .04 (considered significant and publishable) and one with p = .06 (considered not significant) have produced essentially the same result but receive very different treatment.
  • This convention prevents good research from being published and contributes to the file drawer problem.

🔍 Limited informativeness

  • Rejecting the null hypothesis only says there is some nonzero relationship in the population—not very informative.
  • Analogy: imagine if chemistry could only tell us there is some relationship between gas temperature and volume, rather than providing a precise equation.
  • Extreme criticism: some argue the null hypothesis (relationship = precisely 0) is never literally true if carried to enough decimal places, so rejecting it tells us nothing new.

🔍 Defense of null hypothesis testing

  • Some researchers (like Robert Abelson) argue that when correctly understood and carried out, null hypothesis testing serves an important purpose.
  • Especially with new phenomena, it gives researchers a principled way to convince others that results are not mere chance occurrences.

🔍 The end of p-values?

  • In 2015, editors of Basic and Applied Social Psychology banned null hypothesis testing and related procedures.
  • Authors can submit papers with p-values, but editors remove them before publication.
  • The editors emphasized descriptive statistics and effect sizes instead, continuing the conversation about what psychology actually knows.

✅ Recommended solutions

✅ Report effect sizes

  • Each null hypothesis test should be accompanied by an effect size measure (e.g., Cohen's d or Pearson's r).
  • Provides an estimate of how strong the relationship is in the population, not just whether one exists.
  • Don't confuse: p-value cannot substitute for relationship strength because it also depends on sample size—even very weak results can be significant with large samples.

✅ Use confidence intervals

Confidence interval: a range of values computed so that some percentage of the time (usually 95%) the population parameter will lie within that range.

Example: 20 students estimate a cookie has 200 calories on average, with 95% confidence interval of 160 to 240. There is a very good chance the true population mean lies between 160 and 240.

Advantages:

  • Much easier to interpret than null hypothesis tests.
  • Provide information needed to do null hypothesis tests: the sample mean is significantly different (at .05 level) from any hypothetical population mean outside the confidence interval.
  • In the example, 200 is significantly different from a hypothetical mean of 250.

✅ Alternative approaches

  • Bayesian statistics: an approach where the researcher specifies probabilities that the null and alternative hypotheses are true before the study, conducts the study, then updates probabilities based on data.
  • Too early to say whether this will become common in psychology.
  • Current status: null hypothesis testing—supported by effect size measures and confidence intervals—remains the dominant approach.

✅ Practical implications for interpreting research

  • Be cautious about interpreting any individual study because it might reflect Type I or Type II error.
  • Why replication matters: each time a study is replicated with similar results, confidence grows that the result represents a real phenomenon, not just an error.
43

From the "Replicability Crisis" to Open Science Practices

From the“Replicability Crisis”to Open Science Practices

🧭 Overview

🧠 One-sentence thesis

Psychology's recent replicability crisis—where many published findings fail to replicate—has spurred the adoption of open science practices that increase transparency and scientific rigor.

📌 Key points (3–5)

  • The replicability crisis: Many psychology studies cannot be replicated; the Reproducibility Project found only 36 of 100 studies replicated successfully.
  • Questionable research practices: Selective deletion of outliers, cherry-picking results, HARKing (hypothesizing after results are known), and p-hacking undermine research integrity.
  • Common confusion: A failed replication doesn't automatically discredit the original—differences in power, populations, procedures, or moderating variables may explain different results.
  • Open science response: Pre-registration of hypotheses, sharing raw data and materials, and digital badges promote transparency.
  • Why it matters: These practices enhance scientific rigor, counteract publication bias, and restore trust in psychological research.

🔬 The Replicability Crisis Explained

📉 What the evidence shows

The excerpt describes two major replication efforts that revealed widespread problems:

  • The Many Labs Replication Project: Failed to replicate the original finding that washing hands leads people to view moral transgressions as less wrong.
  • The Reproducibility Project: Coordinated effort by over 270 psychologists worldwide to test 100 previously published experiments.

The Replicability Crisis: the inability of researchers to replicate earlier research findings.

📊 The scale of the problem

Key findings from the Reproducibility Project:

Original studiesReplications that succeededEffect sizes in replications
97 of 100 had statistically significant effectsOnly 36 replicated successfullyOn average, half the size of original studies

⚠️ Important nuance

The excerpt emphasizes that replication failure alone doesn't necessarily discredit original work. Differences may stem from:

  • Statistical power variations
  • Different populations sampled
  • Procedural differences
  • Effects of moderating variables

Don't confuse: A non-replication with definitive proof the original was wrong—multiple factors can explain divergent results.

🚫 Questionable Research Practices

🎯 Five problematic behaviors

The excerpt identifies specific practices that damage research integrity:

  1. Selective deletion of outliers: Removing data points to artificially inflate statistical relationships among measured variables.

  2. Selective reporting (cherry-picking): Reporting only findings that support one's hypotheses while hiding contradictory results.

  3. HARKing: "Hypothesizing After the Results are Known"—mining data without a priori hypotheses, then claiming a statistically significant result had been originally predicted.

  4. P-hacking: Performing inferential statistical calculations to check if a result is significant before deciding whether to recruit additional participants and collect more data. The probability of finding statistically significant results is influenced by participant numbers, making this practice manipulative.

  5. Data fabrication: Outright fraud (the excerpt mentions Diederik Stapel from Chapter 3), though this goes beyond "questionable" into criminal territory.

🔍 Why these practices matter

  • They wreak damage to the integrity and reputation of the discipline.
  • They contribute to low replicability rates.
  • The excerpt mentions the "Replication Index," a statistical "doping test" developed by Ulrich Schimmack in 2014 for estimating the replicability of studies, journals, and specific researchers.

💡 The underlying problem

The excerpt suggests systematic issues with conventional scholarship, including:

  • Publication bias: Favors discovery and publication of counter-intuitive but statistically significant findings.
  • Neglect of replication: The "duller but incredibly vital" process of replicating previous findings to test robustness is undervalued.

✅ Enhancing Scientific Rigor

🔧 Four key improvements

The excerpt outlines what researchers should do:

  1. Design studies with sufficient statistical power: Increases the reliability of findings by ensuring adequate sample sizes and effect detection capability.

  2. Publish both null and significant findings: Counteracts publication bias and reduces the "file drawer problem" (where non-significant results remain unpublished).

  3. Describe research designs in sufficient detail: Enables other researchers to replicate your study using identical or very similar procedures.

  4. Conduct and publish high-quality replications: Makes replication a valued scientific contribution rather than "dull" work.

🌐 Open Science Practices

🏅 Digital badges and incentives

The excerpt describes how journals like Psychological Science (flagship journal of the Association for Psychological Science) now issue digital badges to researchers who:

  • Pre-registered their hypotheses and data analysis plans
  • Openly shared their research materials with other researchers (enabling replication attempts)
  • Made available their raw data with other researchers

These badges come from the Center for Open Science.

📋 Transparency and Openness Promotion (TOP) Guidelines

The excerpt includes a detailed table showing four levels (0–3) of transparency across nine criteria:

CriterionWhat it covers
Citation StandardsWhether and how data, code, and materials are cited
Data TransparencyWhether data are available and where to access them
Analytic Methods (Code) TransparencyWhether analysis code is available and accessible
Research Materials TransparencyWhether materials are available and accessible
Design and Analysis TransparencyStandards for transparent research design
Preregistration of studiesWhether studies are registered before data collection
Preregistration of analysis plansWhether analysis plans are registered in advance
ReplicationJournal policy on replication study submissions

Level progression: From Level 0 (journal says nothing or only encourages) to Level 3 (journal requires and enforces, with independent reproduction of analyses).

Example: For data transparency, Level 0 means the journal says nothing, Level 1 requires stating whether data are available, Level 2 requires posting to a trusted repository, and Level 3 requires posting and independent reproduction of analyses before publication.

🌍 Widespread adoption

The excerpt reports impressive uptake:

  • More than 500 journals have formally adopted TOP guidelines
  • More than 50 organizations have adopted them
  • The list grows each week

💰 Funding agency requirements

Federal funding agencies now mandate openness:

  • Canada (Tri-Council): Requires publication of publicly-funded research in open access journals
  • United States (National Science Foundation): Similar open access requirements

The excerpt concludes that "the future of science and psychology will be one that embraces greater 'openness.'"

🎯 What open science accomplishes

Open science practices increase transparency and openness of the scientific enterprise by:

  • Making research processes visible from hypothesis to data
  • Enabling verification and replication by other researchers
  • Reducing opportunities for questionable research practices
  • Building public trust through accountability

Don't confuse: Open science with simply publishing results—it encompasses the entire research lifecycle, from pre-registration through data sharing and materials availability.