🧭 Overview
🧠 One-sentence thesis
The chi-square distribution provides the mathematical foundation for testing whether observed categorical data patterns differ from what we would expect by chance alone, using a test statistic that follows a known distribution when the null hypothesis is true.
📌 Key points (3–5)
- What the chi-square distribution is: A right-skewed distribution with one parameter (degrees of freedom) used to characterize always-positive statistics
- How shape changes with df: As degrees of freedom increase, the distribution becomes more symmetric, the center moves right, and variability increases
- When to use it: If the null hypothesis is true in goodness-of-fit tests, the X² statistic follows a chi-square distribution with k−1 degrees of freedom (k = number of categories)
- Sample size requirement: Each expected count must be at least 5 to safely apply the chi-square distribution
- Common confusion: Larger chi-square values provide stronger evidence against the null hypothesis, so we always use the upper tail for p-values
📊 Understanding the chi-square distribution
📊 What it is and its single parameter
Chi-square distribution: A distribution sometimes used to characterize data sets and statistics that are always positive and typically right skewed, with just one parameter called degrees of freedom (df).
- Unlike the normal distribution (which has mean and standard deviation), the chi-square has only degrees of freedom
- The df parameter influences the shape, center, and spread of the distribution
- This distribution is specifically designed for statistics that cannot be negative
📈 How the distribution changes with degrees of freedom
Three general properties emerge as df increases:
| Property | What happens as df increases |
|---|
| Shape | Becomes more symmetric (less skewed) |
| Center | Moves to the right (mean equals df) |
| Variability | Increases (spread becomes larger) |
Example: With df = 2, the distribution is very strongly skewed. With df = 4 or df = 9, distributions become more symmetric.
🔍 Finding areas under the chi-square curve
🔍 Methods for calculating tail areas
Three common approaches:
- Using computer software
- Using a graphing calculator
- Using a chi-square table (Appendix C.3)
Don't confuse with: Normal distribution lookups—chi-square tables work differently because the distribution shape changes with each df value.
🎯 Working through examples
The excerpt provides several worked examples:
Example with df = 3, cutoff at 6.25: The upper tail area is 0.1001 (about 10%)
Example with df = 2, cutoff at 4.3: The tail area is 0.1165 (between 0.1 and 0.2 if using a table)
Example with df = 5, cutoff at 5.1: The tail area is 0.4038 (larger than 0.3 if using a table)
Key pattern: We always look at the upper tail because larger chi-square values indicate stronger evidence against the null hypothesis.
🧪 Applying chi-square to hypothesis testing
🧪 The test statistic X²
When testing goodness of fit, the test statistic is:
X² = (O₁ − E₁)² / E₁ + (O₂ − E₂)² / E₂ + ... + (Oₖ − Eₖ)² / Eₖ
Where:
- O represents observed counts in each category
- E represents expected counts under the null hypothesis
- k is the number of categories
🔑 Degrees of freedom for goodness-of-fit tests
Degrees of freedom for chi-square test: When testing k categories, df = k − 1
Example: In the juror example with 4 racial categories (White, Black, Hispanic, other), df = 4 − 1 = 3.
Why k−1 and not k: This reflects the constraint that all counts must sum to the total sample size.
✅ Conditions that must be checked
Two essential conditions before performing a chi-square test:
- Independence: Each case contributing a count must be independent of all other cases
- Sample size/distribution: Each cell must have at least 5 expected cases
Important exception: When examining a table with just two bins, use one-proportion methods instead (from Section 6.1).
📉 Interpreting p-values from chi-square tests
The p-value represents the upper tail area of the chi-square distribution.
Example: In the juror case with X² = 5.89 and df = 3, the p-value is 0.1171. Since this is larger than typical significance levels (like 0.05), we do not reject the null hypothesis—the data don't provide convincing evidence of racial bias in juror selection.
Why upper tail only: Larger X² values correspond to greater differences between observed and expected counts, providing stronger evidence against the null hypothesis.
🔬 Real-world application: Stock market independence
🔬 Testing if trading days are independent
The excerpt examines whether daily stock returns from the S&P 500 show independence using waiting times until "Up" days.
Setup:
- Label each day as Up or Down
- Count days until each Up day occurs
- If days are independent, waiting times should follow a geometric distribution
Null hypothesis: Stock market being up or down on a given day is independent from all other days (waiting times follow geometric distribution)
Alternative hypothesis: Days are not independent
📊 Expected vs observed counts
For 1,362 waiting time observations:
| Days waited | Observed | Expected (Geometric) |
|---|
| 1 | 717 | 743 |
| 2 | 369 | 338 |
| 3 | 155 | 154 |
| 7+ | 10 | 12 |
The chi-square statistic calculated: X² = 4.61 with df = 6, giving p-value = 0.5951
Conclusion: Cannot reject the notion that trading days are independent—no strong evidence that the market is "due" for a correction after down days.
Practical implication: The analysis suggests any dependence between days is very weak, contradicting the common belief that markets are "due" for reversals.
🎲 Two-way tables vs one-way tables
🎲 Key distinction
One-way table: Describes counts for each outcome in a single variable
Two-way table: Describes counts for combinations of outcomes for two variables
When analyzing two-way tables, the central question becomes: Are these variables related (dependent) or unrelated (independent)?
Don't confuse: The mechanics are similar, but two-way tables test relationships between variables, while one-way tables test whether a single variable follows a specified distribution.
Chi-square distribution and finding areas
🧭 Overview
🧠 One-sentence thesis
The chi-square distribution enables us to test whether observed patterns in categorical data differ meaningfully from expected patterns by providing a probability model for the test statistic when the null hypothesis is true.
📌 Key points (3–5)
- What it measures: The chi-square distribution characterizes always-positive, typically right-skewed statistics using a single parameter (degrees of freedom)
- How it changes: As degrees of freedom increase, the distribution becomes more symmetric, shifts right, and becomes more variable
- When to apply it: Use chi-square when testing goodness of fit with k categories (df = k−1) or independence in two-way tables (df = (rows−1)×(columns−1))
- Critical condition: Each expected count must be at least 5 to safely use the chi-square distribution
- Common confusion: We always examine the upper tail for p-values because larger X² values indicate stronger evidence against the null hypothesis, not both tails like some other tests
📐 The chi-square distribution fundamentals
📐 Definition and single parameter
Chi-square distribution: A distribution used to characterize data sets and statistics that are always positive and typically right skewed, having just one parameter called degrees of freedom (df).
- Contrasts with normal distribution, which requires two parameters (mean and standard deviation)
- The df parameter alone determines the distribution's shape, center, and spread
- Used primarily for calculating p-values in categorical data analysis
📈 How distribution properties change with df
Three systematic changes occur as degrees of freedom increase:
Shape: The distribution starts very strongly skewed (df = 2) and becomes progressively more symmetric with larger df values (df = 4, df = 9, and beyond)
Center: The mean of each chi-square distribution equals its degrees of freedom—so the center moves rightward as df increases
Variability: The spread (variability) inflates as degrees of freedom increase
Example: A chi-square distribution with df = 2 is extremely right-skewed, while one with df = 9 appears much more bell-shaped and symmetric.
🔢 Computing tail areas
🔢 Three methods available
The excerpt describes three approaches for finding areas:
- Statistical software (most precise)
- Graphing calculator
- Chi-square table (Appendix C.3, gives ranges rather than exact values)
Practical note: With tables, you can only identify a range (e.g., "between 0.1 and 0.2"), while software provides exact values.
🎯 Worked examples of area calculations
Example 1: Chi-square with df = 3, upper tail starting at 6.25
- Result: Shaded area = 0.1001 (about 10%)
Example 2: Chi-square with df = 2, upper tail bound at 4.3
- Result: Tail area = 0.1165
- Using table: between 0.1 and 0.2
Example 3: Chi-square with df = 5, cutoff at 5.1
- Result: Tail area = 0.4038
- Using table: larger than 0.3
Example 4: Chi-square with df = 7, cutoff at 11.7
- Result: Area = 0.1109
- Using table: between 0.1 and 0.2
Example 5: Chi-square with df = 4, cutoff at 10
- Result: Precise value = 0.0404
- Using table: between 0.02 and 0.05
🧮 The chi-square test for goodness of fit
🧮 The test statistic formula
When evaluating whether observed counts O₁, O₂, ..., Oₖ in k categories differ unusually from expected counts E₁, E₂, ..., Eₖ:
X² = (O₁ − E₁)² / E₁ + (O₂ − E₂)² / E₂ + ... + (Oₖ − Eₖ)² / Eₖ
When this works: If each expected count is at least 5 and the null hypothesis is true, this statistic follows a chi-square distribution with k−1 degrees of freedom.
Why we square differences: Squaring ensures all contributions are positive and gives more weight to larger deviations (being off by 4 is more than twice as bad as being off by 2).
🎲 Degrees of freedom calculation
Degrees of freedom for one-way table: df = k − 1, where k is the number of categories or bins
Example: The juror example examined 4 racial categories (White, Black, Hispanic, other), so df = 4 − 1 = 3.
Don't confuse: This is different from two-way tables, where df = (number of rows − 1) × (number of columns − 1).
✅ Required conditions
Two conditions must be verified:
Independence: Each case contributing a count to the table must be independent of all other cases in the table
Sample size/distribution: Each particular cell count must have at least 5 expected cases
Special case: When examining a table with just two bins, use the one-proportion methods from Section 6.1 instead.
Consequence of violating conditions: Failing to check may affect the test's error rates (Type I and Type II errors).
📊 Finding and interpreting p-values
The p-value comes from the upper tail of the chi-square distribution.
Why upper tail: Larger chi-square values indicate greater discrepancies between observed and expected counts, providing stronger evidence against the null hypothesis.
Example: In the juror analysis:
- X² = 5.89 with df = 3
- P-value = 0.1171 (the area in the upper tail beyond 5.89)
- Interpretation: If there truly was no racial bias, the probability of observing a test statistic this large or larger is about 11.71%
- Conclusion: Since p-value > 0.05, we do not reject the null hypothesis—insufficient evidence of racial bias
📈 Real application: Testing stock market independence
📈 The research question
Can we determine if daily stock market movements are independent using S&P 500 data from 10 years?
Method: Examine waiting times until positive trading days
- Each "Up" day = success
- Each "Down" day = failure
- If days are independent, waiting times should follow a geometric distribution
🔬 Setting up the test
Null hypothesis (H₀): The stock market being up or down on a given day is independent from all other days; waiting times follow a geometric distribution
Alternative hypothesis (Hₐ): Days are not independent; waiting times do not follow a geometric distribution
Why this matters: If past days predict future days, traders could gain an advantage.
📉 Computing expected counts
For geometric distribution with success probability 0.545:
Method:
- Identify probability of waiting D days: P(D) = (1 − 0.545)^(D−1) × (0.545)
- Multiply by total number of observations (1,362)
Example: Waiting 3 days occurs about 0.455² × 0.545 = 11.28% of the time, corresponding to 0.1128 × 1,362 = 154 expected occurrences.
🧪 Test results
Observed vs expected counts for 1,362 waiting periods:
| Days | Observed | Expected |
|---|
| 1 | 717 | 743 |
| 2 | 369 | 338 |
| 3 | 155 | 154 |
| 4 | 69 | 70 |
| 7+ | 10 | 12 |
Calculation: X² = (717−743)²/743 + (369−338)²/338 + ... + (10−12)²/12 = 4.61
Degrees of freedom: k = 7 groups, so df = 7 − 1 = 6
P-value: 0.5951 (from chi-square distribution with df = 6)
💡 Conclusion and practical meaning
Since p-value (0.5951) > 0.05, we do not reject H₀.
What this means: Cannot reject the notion that trading days are independent during the last 10 years of data.
Practical implication: The market is not "due" for an Up day after several Down days. Any dependence between days is very weak. This analysis suggests that patterns traders think they see may just be chance.
Important caveat: Not rejecting H₀ doesn't prove independence—it just means we lack strong evidence of dependence.
🔀 Two-way tables: Testing independence between variables
🔀 One-way vs two-way distinction
One-way table: Describes counts for each outcome in a single variable
Two-way table: Describes counts for combinations of outcomes for two variables
Key question for two-way tables: Are the variables related (dependent) or unrelated (independent)?
🧮 Computing expected counts in two-way tables
Formula for expected count: Expected Count(row i, col j) = (row i total) × (column j total) / table total
Example: For the iPod disclosure study with 219 participants:
- Row 1 total (Disclose): 61
- Column 1 total (General question): 73
- Table total: 219
- Expected count = (61 × 73) / 219 = 20.33
Logic: If variables are independent, we'd expect the same proportion in each column to fall in each row.
🎲 Degrees of freedom for two-way tables
Formula: df = (R − 1) × (C − 1), where R = number of rows and C = number of columns
Example: The iPod study had 2 rows (Disclose/Hide) and 3 columns (three question types), so df = (2−1) × (3−1) = 2
Don't confuse: This is different from one-way tables where df = k − 1. The multiplication accounts for testing independence between two dimensions.
Special guideline: When analyzing 2-by-2 contingency tables, use the two-proportion methods from Section 6.2 instead.
📊 The iPod disclosure experiment
Context: Researchers wanted to know which questions get sellers to disclose product problems.
Setup: 219 participants sold an iPod known to have frozen twice. Three scripted questions:
- General: "What can you tell me about it?"
- Positive Assumption: "It doesn't have any problems, does it?"
- Negative Assumption: "What problems does it have?"
Results:
| Question type | Disclosed | Hid problem | Total |
|---|
| General | 2 | 71 | 73 |
| Positive Assumption | 23 | 50 | 73 |
| Negative Assumption | 36 | 37 | 73 |
| Total | 61 | 158 | 219 |
Test statistic: X² = 40.13 with df = 2
P-value: Extremely small (about 0.000000002)
Conclusion: Strong evidence that the question asked affected seller's likelihood to disclose the freezing problem. The "What problems does it have?" question was most effective.
🏥 Two-way table application: Diabetes treatments
🏥 The study design
Experiment compared three treatments for Type 2 Diabetes in patients aged 10-17:
- Continued metformin (met)
- Metformin + rosiglitazone (rosi)
- Lifestyle intervention program
Outcome: Whether patient lacked glycemic control (failure) or maintained control (success)
🧪 Hypotheses
H₀: There is no difference in effectiveness of the three treatments
Hₐ: There is some difference in effectiveness between treatments (e.g., perhaps rosi performed better than lifestyle)
📊 Computing expected counts
Total results for 699 patients:
| Treatment | Failure | Success | Total |
|---|
| lifestyle | 109 | 125 | 234 |
| met | 120 | 112 | 232 |
| rosi | 90 | 143 | 233 |
| Total | 319 | 380 | 699 |
Example calculation: Expected count for row 1, column 1:
(234 × 319) / 699 = 106.8
All expected counts:
- Row 1: 106.8, 127.2
- Row 2: 105.9, 126.1
- Row 3: 106.3, 126.7
All exceed 5, so the sample size condition is met.
🔍 Test results
Test statistic: X² = 8.16 with df = (3−1) × (2−1) = 2
P-value: 0.017
Conclusion: Since p-value < 0.05, reject H₀. At least one treatment is more or less effective than the others at treating Type 2 Diabetes for glycemic control.
What we can't conclude: The test doesn't tell us which specific treatments differ—only that not all are equally effective.