Testing theory an introduction

1

Introduction to Testing Theory in Geodetic Adjustment

Introduction

🧭 Overview

🧠 One-sentence thesis

This book develops methods for detecting and identifying errors in the functional model of geodetic measurements by using hypothesis testing on redundant observations, enabling users to assess model validity and design reliable measurement setups before data collection.

📌 Key points (3–5)

  • Why redundant measurements matter: they allow both increased accuracy and the ability to check for mistakes or errors in the functional model.
  • What the book addresses: detecting and identifying errors in the functional model (not the stochastic model), because functional errors are more common and have more serious consequences in geodetic applications.
  • How errors are traced: through a four-step process—formulate a null hypothesis, detect problems using residuals, identify the most likely alternative hypothesis, then adapt the data or model.
  • Common confusion: not all model imperfections are "errors"—a modelling error only matters when discrepancies between data and model cannot be explained by normal measurement uncertainty.
  • Reliability concept: internal and external reliability measures let designers predict in advance the size of minimal detectable biases and their impact on estimated parameters.

📚 Background and scope

📚 Relationship to adjustment theory

  • This book is a follow-up to Adjustment theory (TU Delft Open Publishing, 2024).
  • Adjustment theory covers the optimal combination of redundant measurements and estimation of unknown parameters.
  • The present book focuses on the second reason for redundant measurements: checking for mistakes or errors (the first reason being accuracy improvement).

🎯 What is covered and what is not

  • Covered: methods for detecting and identifying errors in the functional model (the set of functional relations the observables are assumed to obey).
  • Not covered: errors in the stochastic model (the model of measurement uncertainty).
  • Justification: from past experience, modelling errors in geodesy usually occur in the functional model, not the stochastic model; functional errors have more serious consequences; and practitioners are usually capable of making justifiable stochastic model choices.
  • Assumption throughout: the stochastic model is specified correctly.

🧩 Mathematical models in adjustment

🧩 Two parts of a model

A mathematical model for adjustment consists of:

PartWhat it containsExample
Functional modelThe set of functional relations the observables are assumed to obeyThree angles of a triangle should sum to π (planar Euclidean geometry)
Stochastic modelThe measurement uncertainty, captured through random variablesObservations are independent samples from a normal distribution

🧩 Why models matter for least-squares

  • Least-squares estimators have two important properties: unbiasedness (they coincide with their target value on average) and minimum variance (smallest possible sum of squares of variations about the target).
  • Critical caveat: these properties only hold when the mathematical model is correct.
  • If the model is misspecified:
    • Errors in the functional model → biased estimators (off target).
    • Errors in the stochastic model → less precise estimators (larger variations).

🔍 Understanding modelling errors

🔍 What counts as a modelling error

A modelling error exists when the discrepancies between the observations and the model cannot be explained by, or attributed to, the unavoidable measurement uncertainty.

  • Important nuance: every model is a caricature of reality, so every model has shortcomings. Strictly speaking, every model is "in error" to begin with.
  • The notion of modelling error must be considered with care: it is felt only in the confrontation between data and model.
  • Don't confuse: a model imperfection vs. a detectable modelling error—only the latter produces discrepancies larger than expected from measurement uncertainty alone.

🔍 Types of errors in the functional model

The excerpt distinguishes two categories:

TypeCauseWhat it affectsExamples
Blunders / gross errorsMistakes by the observer or defective instrumentsIndividual observationsReading the leveling rod incorrectly; aiming the theodolite at the wrong point
Systematic errorsCommon cause affecting whole sets of observationsWhole sets of observationsDefective instruments; mistakes in formulating functional relations between observables

🛠️ The four-step process for error detection and identification

🛠️ Step (i): Formulate the null hypothesis

  • Start with a model believed to give an adequate enough description of reality.
  • Usually the simplest model possible that has proven itself in similar situations based on past experience.
  • Ordinarily assume measurements and modelling are done with utmost care, so no allowances for mistakes or errors are made at this stage.
  • This first model is called the null hypothesis.

🛠️ Step (ii): Detect untrustworthy models

  • One can never be sure about the absence of mistakes, so always check the validity of the null hypothesis.
  • How detection works: adjust the redundant measurements and compute (least-squares) residuals.
  • Residuals measure how well the measurements fit the model:
    • Large residuals → poor fit (often indicative of problems).
    • Smaller residuals → better fit.
  • Residuals are used as input for deciding whether to accept or reject the null hypothesis.

🛠️ Step (iii): Consider alternative hypotheses

  • If the null hypothesis is rejected, the measurements do not support the assumption that the model is adequate.
  • Must look for an alternative hypothesis (alternative model).
  • Challenge: one seldom knows beforehand which alternative to consider; many different errors could have led to rejection.
  • In practice, various alternatives must be considered, depending on the particular situation.

🛠️ Step (iv): Identify and adapt

  • Identification: search for the alternative hypothesis that best fits the measurements.
  • Since each alternative describes a particular mistake or modelling error, the most likely mistake corresponds with the most likely hypothesis.
  • Adaptation: once confident that errors have been identified, either:
    • Re-measure the erroneous data, or
    • Include additional parameters in the model to account for the modelling errors.

⚖️ Decision uncertainty and reliability

⚖️ Two kinds of wrong decisions

Because decisions are based on uncertain measurements, their outcomes are uncertain. Two types of wrong decisions can occur:

TypeWhat happensConsequence
Wrong decision of the 1st kindReject the null hypothesis when it is actually trueWrongly believe a mistake occurred; may lead to unnecessary re-measurement
Wrong decision of the 2nd kindAccept the null hypothesis when it is actually falseWrongly believe mistakes are absent; obtain biased adjustment results

⚖️ Factors affecting traceability

Not all errors can be traced equally well. How well errors can be traced depends on:

  • The model used (the null hypothesis).
  • The type and size of the error (the alternative hypothesis).
  • The decision procedure used for accepting or rejecting the null hypothesis.

⚖️ Reliability measures

  • Internal reliability: measures related to how well errors can be detected within the measurement setup.
  • External reliability: measures related to the impact of undetected errors on the estimated parameters of interest.
  • Key advantage: these measures enable a user to determine in advance (at the designing stage, before actual measurements are collected):
    • The size of the minimal detectable biases.
    • The size of their potential impact on the estimated parameters.
  • Mastering these concepts enables formulation of guidelines for the reliable design of measurement setups.

📖 Example: Testing collinearity of three points

📖 The hypothesis

  • Postulated theory: three points (1, 2, and 3) lie on one straight line.
  • Notation: H : (assertion specifying the hypothesis).

📖 The experiment design

  • Measure three distances: l₁₂, l₂₃, and l₁₃.
  • If the hypothesis is correct, the distances should satisfy the relation:
    • l₁₂ + l₂₃ − l₁₃ = 0 (under the assumption that the hypothesis is correct).
  • Compute l₁₂ + l₂₃ − l₁₃ and verify whether this computed value agrees or disagrees with the theoretically predicted value (zero).
  • If it agrees → inclined to accept the hypothesis.
  • If it disagrees → inclined to reject hypothesis H.

📖 Complication: measurement uncertainty

  • In practice, experimental outcomes (especially measurements) are not exact; they are affected by uncertainty due to measurement errors.
  • To handle this, the book models uncertainty using random variables from probability theory.
  • Statistical hypothesis: an assertion or conjecture about the probability distribution of one or more random variables, for which a random sample (mostly through measurements) is available.
  • The structure of a statistical hypothesis H is:
    • The observable random variable has a probability density function given by a known form, except for an unknown parameter.
    • By specifying (fully or partially) the parameter, one specifies the hypothesis.
2

Statistical hypotheses

1.1 Statistical hypotheses

🧭 Overview

🧠 One-sentence thesis

Statistical hypothesis testing provides a formal framework for deciding whether to reject a postulated theory by comparing experimental outcomes (modeled as random variables) against theoretically predicted values, while accounting for measurement uncertainty.

📌 Key points (3–5)

  • What a statistical hypothesis is: an assertion about the probability distribution of random variables, usually specifying the functional form and parameters (except unknown ones).
  • Two types of hypotheses: the null hypothesis (H₀, the hypothesis being tested) and the alternative hypothesis (Hₐ, what is true if H₀ is false).
  • Simple vs composite: a simple hypothesis completely specifies the distribution (form and all parameter values); a composite hypothesis leaves some parameters unspecified.
  • Common confusion: random variables cannot equal constants—hypotheses about relationships (e.g., three distances summing to zero) must be stated about expected values, not the random variables themselves.
  • Why it matters: hypothesis testing enables reliable design of measurement setups and prevents biased adjustment results by distinguishing real modeling errors from measurement noise.

🎯 The testing problem

🎯 What hypothesis testing addresses

Many problems in science and engineering boil down to asking: is a particular theory true or false?

  • The classical approach: design an experiment whose outcome can be predicted by the theory, then compare the experimental result with the prediction.
  • If they disagree → reject the theory.
  • If they agree → no evidence yet to reject the theory (note: this does not prove the theory is true).

⚠️ Why uncertainty complicates testing

Experiments involving measurements do not produce exact outcomes.

  • Measurement errors introduce uncertainty.
  • Because outcomes are uncertain, two kinds of wrong decisions can occur:
    • Wrong decision of the 1st kind: reject the null hypothesis when it is actually true (false alarm; might lead to unnecessary re-measurement).
    • Wrong decision of the 2nd kind: accept the null hypothesis when it is actually false (miss; leads to biased adjustment results because mistakes or modeling errors go undetected).

🔧 The solution: model uncertainty with random variables

To handle measurement uncertainty, the excerpt models experimental outcomes as random variables with probability distributions.

  • A statistical hypothesis is an assertion about the probability distribution of one or more random variables, for which a random sample (usually measurements) is available.
  • The structure: "According to H, the observable random variable y has probability density function p(y; θ)."
  • The parameter θ indicates that the distribution is known except for the unknown parameter; specifying θ (fully or partially) makes an assertion about the density.

📐 Example: three points on a line

📐 The geometric hypothesis

Postulated theory: three points (1, 2, 3) lie on one straight line.

  • Mathematically, if true, the three distances l₁₂, l₂₃, and l₁₃ should satisfy:
    l₁₂ + l₂₃ − l₁₃ = 0.
  • The experiment: measure the three distances, compute l₁₂ + l₂₃ − l₁₃, and check if it agrees with zero.

🎲 Modeling the measurements as random variables

The excerpt models the three distances as normally distributed random variables (based on experience that geodetic measurement uncertainty is often well-modeled by the normal distribution).

  • Assume the three distances are uncorrelated and have the same known variance σ².
  • The simultaneous probability density function becomes:
    p(l₁₂, l₂₃, l₁₃; E(l₁₂), E(l₂₃), E(l₁₃)) = normal with unknown means and known variance matrix σ²I.

This statement (call it statement 3) is already a statistical hypothesis—it asserts the observables are normally distributed with unknown mean but known variance.

🚫 Why you cannot write hypotheses about random variables directly

Common mistake: trying to write l₁₂ + l₂₃ − l₁₃ = 0 for the random variables themselves.

  • Random variables cannot equal a constant; a statement like "l₁₂ + l₂₃ − l₁₃ = 0" is nonsensical.
  • Correct approach: state the relation for the expected values:
    E(l₁₂) + E(l₂₃) − E(l₁₃) = 0.
  • Interpretation: if the measurement experiment were repeated many times, on average the measurements would satisfy the relation.

📝 The final statistical hypothesis

Combining the distribution assumption and the expected-value relation, the statistical hypothesis (call it H) becomes:

H: the observables l₁₂, l₂₃, l₁₃ are normally distributed with known variance σ²I and unknown means satisfying E(l₁₂) + E(l₂₃) − E(l₁₃) = 0.

The three means play the role of the parameter θ in the general structure.

🔀 Null and alternative hypotheses

🔀 Two hypotheses in testing problems

Most hypothesis-testing problems discuss two hypotheses:

HypothesisNameNotationRole
FirstNull hypothesisH₀The hypothesis being tested
SecondAlternative hypothesisHₐWhat is true if H₀ is false
  • The thinking: if H₀ is false, then Hₐ is true, and vice versa.
  • We say "H₀ is tested against (or versus) Hₐ."

🔀 Example: null and alternative for the three-points problem

Null hypothesis H₀ (the hypothesis to be tested):

H₀: the observables are normally distributed with known variance σ²I and E(l₁₂) + E(l₂₃) − E(l₁₃) = 0.

Alternative hypothesis Hₐ:

  • We want to find out whether E(l₁₂) + E(l₂₃) − E(l₁₃) = 0 or not.
  • Naively, the alternative might be E(l₁₂) + E(l₂₃) − E(l₁₃) ≠ 0.
  • However, from the geometry of the problem, the left-hand side can never be negative.
  • Therefore, the alternative should read: E(l₁₂) + E(l₂₃) − E(l₁₃) > 0.

Hₐ: the observables are normally distributed with known variance σ²I and E(l₁₂) + E(l₂₃) − E(l₁₃) > 0.

Note: the type of distribution and the variance matrix are not in question—they are assumed known and identical under both H₀ and Hₐ.

🏷️ Simple vs composite hypotheses

🏷️ Definitions

Simple hypothesis: a hypothesis that completely specifies the distribution—both its functional form and the values of all its parameters.

Composite hypothesis: a hypothesis that does not completely specify the distribution (leaves some parameters unspecified).

🏷️ Example classification

In the three-points example:

  • Both H₀ and Hₐ are composite hypotheses because the individual expectations of the observables are not fully specified (only a relation among them is given).
  • H₀ would become a simple hypothesis if the individual expectations E(l₁₂), E(l₂₃), E(l₁₃) were assumed known.

🧪 Test of statistical hypotheses

🧪 What a test is

Test of a statistical hypothesis: a rule or procedure in which a random sample of y is used for deciding whether to reject or not reject H₀.

A test is completely specified by the critical region (denoted K).

Critical region K: the set of sample values of y for which H₀ is to be rejected.

  • Thus, H₀ is rejected if y belongs to K (written y ∈ K).

🧪 Choosing a critical region

The excerpt notes that we want to choose a critical region to obtain a test with desirable properties—a test that is "best" in a certain sense.

  • Criteria for comparing tests and the theory for obtaining "best" tests will be developed in later sections.
  • The excerpt provides a simple example (Example 2) where an acceptable critical region can be found on intuitive grounds.

🧪 Example: testing a scalar measurement

Setup:

  • A geodesist measures a scalar variable modeled as a random variable y.
  • Assumption (not being tested, because the geodesist is certain): y has a normal distribution with unit variance, i.e., p(y; E(y)) = normal with variance 1.
  • Uncertainty: the value of the expectation E(y).
  • Geodesist's assumption: E(y) = x₀ (this is the hypothesis to be tested).

Hypotheses:

  • Null hypothesis H₀: y is normally distributed with variance 1 and E(y) = x₀ (a simple hypothesis).
  • Alternative hypothesis Hₐ: y is normally distributed with variance 1 and E(y) ≠ x₀ (a composite hypothesis).

Test design:

  • To test H₀, a single observation on y is made.
  • (The excerpt notes that in real-life problems one usually takes several observations, but avoids complicating the example.)

Don't confuse: the assumption about the normal distribution with unit variance is itself a statistical hypothesis, but it is not being tested here—the geodesist is certain of its validity. Only the value of the expectation is in question.

3

Test of statistical hypotheses

1.2 Test of statistical hypotheses

🧭 Overview

🧠 One-sentence thesis

A statistical hypothesis test uses a critical region to decide whether to reject the null hypothesis based on sample data, and this decision process involves two types of errors with quantifiable probabilities.

📌 Key points (3–5)

  • What a test is: a rule that uses a random sample to decide whether to reject the null hypothesis, completely specified by a critical region.
  • Two competing hypotheses: the null hypothesis (H₀, being tested) versus the alternative hypothesis (Hₐ, what is true if H₀ is false).
  • Simple vs composite hypotheses: simple hypotheses completely specify the distribution (form and parameters); composite hypotheses do not.
  • Two types of errors: Type I error (rejecting H₀ when it is true) and Type II error (accepting H₀ when it is false); their probabilities are denoted α and β respectively.
  • Common confusion: the critical region is not arbitrary—it should contain sample values that are unlikely under H₀ but more likely under Hₐ.

🧩 Hypothesis structure and types

🧩 Null and alternative hypotheses

Null hypothesis (H₀): the hypothesis being tested.

Alternative hypothesis (Hₐ): the hypothesis that is true if the null hypothesis is false.

  • The thinking is that if H₀ is false, then Hₐ is true, and vice versa.
  • We say that H₀ is tested "against" or "versus" Hₐ.
  • Example: In the geometry problem, H₀ states that three mean differences equal zero; Hₐ states that their sum is greater than zero (since geometry shows it cannot be negative).

🔍 Simple vs composite hypotheses

Simple hypothesis: a hypothesis that completely specifies the distribution—both its functional form and the values of its parameters.

Composite hypothesis: a hypothesis that does not completely specify the distribution.

TypeWhat it specifiesExample from excerpt
SimpleDistribution form + all parameter valuesIf individual expectations of observables were assumed known
CompositeDistribution form or some parameters left unspecifiedH₀ in equation (7) and Hₐ in equation (8)
  • In the geometry example, both H₀ and Hₐ are composite because they assume the distribution type and variance matrix are known and identical, but do not specify all parameters.
  • Don't confuse: a hypothesis can specify the distribution type without being simple—it must specify all parameters to be simple.

🎯 Test structure and critical region

🎯 What a test is

Test of a statistical hypothesis: a rule or procedure in which a random sample of y is used for deciding whether to reject or not reject H₀.

  • A test is completely specified by its critical region K.
  • The decision rule: reject H₀ if the sample value y falls in K (i.e., y ∈ K).

📍 Critical region

Critical region K: the set of sample values of y for which H₀ is to be rejected.

  • Thus, H₀ is rejected if y ∈ K.
  • The goal is to choose K to obtain a test with desirable properties—a test that is "best" in a certain sense.
  • Example: In the geodesist problem (Example 2), the critical region should contain sample values remote enough from the expected value under H₀.

🧭 Intuitive construction of critical region

The excerpt illustrates how to construct K for a simple case:

  • Setup: A geodesist measures a scalar variable y with normal distribution (unit variance), testing whether the expectation equals x₀.
  • Reasoning: If H₀ is true, the probability of y falling far from x₀ is small; if Hₐ is true, this probability may be large.
  • Conclusion: K should contain sample values remote enough from the expected value under H₀.
  • Symmetry: Since the alternative can be on either side of x₀ and the distribution is symmetric, K should have portions in both the left and right tails, symmetric about x₀.
  • Don't confuse: "remote enough" is not arbitrary—it depends on the desired error probabilities (discussed in the next section).

⚠️ Two types of errors

⚠️ Type I error

Type I error: rejection of H₀ when in fact H₀ is true.

  • Size of Type I error (α): the probability that a sample value of y falls in the critical region when H₀ is true.
  • Also called the size of the test or level of significance.
  • Formula: α = P(type I error) = P(rejection of H₀ when H₀ true) = P(y ∈ K | H₀ true).
  • Can be computed once K and the probability density function of y under H₀ are known.

⚠️ Type II error

Type II error: acceptance of H₀ when in fact H₀ is false.

  • Size of Type II error (β): the probability that a sample value of y falls outside the critical region when H₀ is false.
  • Formula: β = P(type II error) = P(acceptance of H₀ when H₀ is false) = P(y ∉ K | H₀ false).
  • Can be computed once K and the probability density function of y under Hₐ are known.

📊 Decision table

True stateReject H₀ (y ∈ K)Accept H₀ (y ∉ K)
H₀ trueWrong (Type I error)Correct
H₀ falseCorrectWrong (Type II error)
  • Don't confuse: α is computed under H₀, while β is computed under Hₐ—they use different probability distributions.
  • The excerpt notes that criteria for comparing tests and theory for obtaining "best" tests will be developed in later sections, suggesting that balancing these error probabilities is a key goal.
4

Two types of errors

1.3 Two types of errors

🧭 Overview

🧠 One-sentence thesis

When testing a statistical hypothesis, two types of errors can occur—rejecting a true null hypothesis (Type I) or accepting a false null hypothesis (Type II)—and the Neyman-Pearson principle addresses this trade-off by fixing the Type I error size and minimizing the Type II error size.

📌 Key points (3–5)

  • Type I error: rejecting the null hypothesis H₀ when it is actually true; its size is denoted by α (the significance level).
  • Type II error: accepting the null hypothesis H₀ when it is actually false; its size is denoted by β.
  • The trade-off: decreasing α tends to increase β, and vice versa—you cannot minimize both simultaneously.
  • Common confusion: the two error types are not symmetric in practice; the Neyman-Pearson principle treats Type I error as more serious and fixes it at a small value (often 0.05 or 0.01), then minimizes Type II error.
  • Power of a test: the probability (1 − β) of correctly rejecting H₀ when the alternative Hₐ is true; higher power means lower Type II error.

⚠️ The two error types

⚠️ Type I error (α)

Type I error: Rejection of H₀ when in fact H₀ is true.

  • What it means: you conclude that the null hypothesis is false when it is actually true.
  • Size of Type I error (α): the probability that the sample value y falls in the critical region K when H₀ is true.
  • Also called the size of the test or level of significance.
  • Formula: α = P(rejection of H₀ when H₀ is true) = P(y ∈ K when H₀ is true).
  • How to compute α: once you know the critical region K and the probability density function of y under H₀, you can calculate α as the area under the H₀ distribution curve over the interval K.

⚠️ Type II error (β)

Type II error: Acceptance of H₀ when in fact H₀ is false.

  • What it means: you conclude that the null hypothesis is true when it is actually false (i.e., the alternative hypothesis Hₐ is true).
  • Size of Type II error (β): the probability that the sample value y falls outside the critical region K when H₀ is false (i.e., when Hₐ is true).
  • Formula: β = P(acceptance of H₀ when H₀ is false) = P(y ∉ K when Hₐ is true).
  • How to compute β: once you know the critical region K and the probability density function of y under Hₐ, you can calculate β as the area under the Hₐ distribution curve over the interval outside K.

📊 Decision table

The excerpt provides a decision table summarizing the four possible outcomes:

True stateDecision: Reject H₀ (y ∈ K)Decision: Accept H₀ (y ∉ K)
H₀ is trueWrong (Type I error)Correct
H₀ is falseCorrectWrong (Type II error)
  • When H₀ is true and you reject it → Type I error.
  • When H₀ is false and you accept it → Type II error.
  • The other two cells represent correct decisions.

🔄 The trade-off between α and β

🔄 Why you cannot minimize both

  • The excerpt states: "As we decrease α, we tend to increase β, and vice versa."
  • Why this happens: the critical region K determines both error sizes. Making K smaller (to reduce α, the probability of falling in K when H₀ is true) increases the probability of falling outside K when Hₐ is true (increasing β).
  • Example: if you make the critical region very small to avoid rejecting a true H₀, you also make it harder to reject H₀ even when it is false, so you accept H₀ more often when you shouldn't.
  • Don't confuse: α and β are not independent; they are linked through the choice of the critical region K.

🎯 Power of the test (1 − β)

  • Power: the probability of correctly rejecting H₀ when the alternative Hₐ is true.
  • Formula: power = 1 − β = P(rejection of H₀ when Hₐ is true) = P(y ∈ K when Hₐ is true).
  • The excerpt shows (in Figure 1.7) that power depends on the true mean under Hₐ: as the alternative mean moves further from the null mean, power increases.
  • Example: in the excerpt's example, when the power is required to be at least 0.80, the unknown mean under Hₐ must be at least 7.34 (given x₀ = 1, σ = 2, and α = 0.01).
  • Higher power means lower Type II error, which is desirable.

🧪 The Neyman-Pearson principle

🧪 The principle

Neyman-Pearson principle: Among all tests or critical regions possessing the same size Type I error α, choose one for which the size of the Type II error β is as small as possible.

  • What it does: fixes α (usually at a small value like 0.05 or 0.01) and then minimizes β.
  • Why fix α: the excerpt explains that in many testing situations, one type of error is more serious than the other. The hypotheses are formulated so that Type I error is the more serious, so you want to ensure it is small.
  • Practical implication: you first choose the significance level α based on how much Type I error you are willing to accept (e.g., 1 out of 100 experiments rejecting a true H₀ if α = 0.01), then design the test to minimize β.

🧪 Why this principle is useful

  • The excerpt mentions that other principles could be suggested (e.g., minimizing α + β), but the Neyman-Pearson principle has proved very useful in practice.
  • It provides a workable solution to the problem that you cannot minimize both α and β simultaneously.
  • The excerpt states that the book will base its method of finding tests on this principle.

📐 Example: computing α and β

📐 Setup of the example

  • The excerpt provides a detailed example (Example 3) to illustrate how to compute α and β.
  • Assumptions: y is normally distributed with known variance σ². The null hypothesis is H₀: E(y) = x₀, and the alternative is Hₐ: E(y) = xₐ > x₀ (a simple one-sided alternative).
  • Critical region: since Hₐ is located to the right of H₀, a right-sided critical region K is chosen: K = {y: y ≥ kₐ}, where kₐ is the critical value.

📐 Computing α (size of Type I error)

  • α is the probability that y falls in K when H₀ is true.
  • Formula: α = P(y ≥ kₐ when H₀ is true).
  • Since y is normally distributed under H₀ with mean x₀ and variance σ², you can standardize: z = (y − x₀)/σ is standard normal under H₀.
  • Then α = P(z ≥ (kₐ − x₀)/σ), which can be computed from standard normal tables.
  • Example values from the excerpt (for x₀ = 1 and σ = 2):
    • If α = 0.1, then kₐ = 3.56.
    • If α = 0.05, then kₐ = 4.29.
    • If α = 0.01, then kₐ = 5.65.
    • If α = 0.001, then kₐ = 7.18.
  • Interpretation: choosing α = 0.01 means you are willing to accept that 1 out of 100 experiments will lead to rejecting H₀ when it is actually true.

📐 Computing β (size of Type II error)

  • β is the probability that y falls outside K when Hₐ is true.
  • Formula: β = P(y < kₐ when Hₐ is true).
  • Equivalently, 1 − β = P(y ≥ kₐ when Hₐ is true), which is the power of the test.
  • Since y is normally distributed under Hₐ with mean xₐ and variance σ², you can standardize and compute 1 − β from standard normal tables.
  • The excerpt shows (Figure 1.7) that power (1 − β) increases as xₐ moves further from x₀.
  • Example: for α = 0.01, x₀ = 1, σ = 2, if you require power ≥ 0.80, then xₐ must be at least 7.34.

📐 Key insight from the example

  • The location of the critical region K is determined by the critical value kₐ, which in turn is determined by the chosen significance level α.
  • Once α is fixed, the critical region is determined, and then β (or power 1 − β) can be computed for any specific alternative xₐ.
  • Don't confuse: α is computed under H₀, while β is computed under Hₐ; they use different probability distributions.
5

A Testing Principle

1.4 A testing principle

🧭 Overview

🧠 One-sentence thesis

The Neyman-Pearson principle solves the trade-off between type I and type II errors by fixing the size of the type I error (α) and then minimizing the size of the type II error (β).

📌 Key points (3–5)

  • Two types of errors: rejecting H₀ when it is true (type I, size α) and accepting H₀ when it is false (type II, size β).
  • The fundamental trade-off: decreasing α tends to increase β, and vice versa—you cannot minimize both simultaneously.
  • Neyman-Pearson principle: fix the type I error size α (often 0.05 or 0.01) and choose the critical region K that minimizes β.
  • Common confusion: why fix α instead of minimizing α + β? The principle assumes one error (type I) is more serious, so it must be controlled first.
  • Practical implication: the principle provides a systematic method for choosing the critical region K in hypothesis testing.

⚖️ The two types of errors

❌ Type I error

Type I error: rejecting the null hypothesis H₀ when in fact H₀ is true.

  • Size of this error is denoted α.
  • This is the probability of falsely rejecting the null hypothesis.
  • Example: concluding that a treatment has an effect when it actually does not.

❌ Type II error

Type II error: accepting the null hypothesis H₀ when in fact H₀ is false (i.e., when the alternative hypothesis Hₐ is true).

  • Size of this error is denoted β.
  • This is the probability of failing to detect a true effect.
  • Example: concluding that a treatment has no effect when it actually does.

🔄 The trade-off problem

  • Ideally, both α and β should be 0.
  • In practice, this is impossible: as we decrease α, we tend to increase β, and vice versa.
  • We cannot define a critical region K that simultaneously minimizes both errors.
  • This fundamental trade-off requires a decision rule.

🎯 The Neyman-Pearson principle

📜 The principle statement

Neyman-Pearson principle: Among all tests or critical regions possessing the same size type I error α, choose one for which the size of the type II error β is as small as possible.

  • First, fix the size of the type I error α.
  • Then, among all tests with that fixed α, select the one with the smallest β.
  • This gives a systematic method for choosing the critical region K.

🤔 Why fix α rather than minimize something else?

  • The justification comes from testing situations where one type of error is more serious than the other.
  • Hypotheses are stated so that the type I error is the more serious error.
  • By fixing α to be small (usually 0.05 or 0.01), we ensure protection against the more serious error.
  • Don't confuse: other principles could be suggested (e.g., minimizing α + β), but the Neyman-Pearson principle has proved very useful in practice and is the basis for methods in this book.

🔧 How the principle works

  1. Specify the desired size α for the type I error (e.g., 0.01 or 0.05).
  2. Among all possible critical regions K that give this same α, find the one that minimizes β.
  3. This automatically determines the best critical region K.

📐 Example application: comparing critical regions

🔍 The setup

  • The excerpt presents a probability density function for y: p(y|x) = x e^(-yx), x > 0, y ≥ 0.
  • Two simple hypotheses: H₀ (with parameter x₀ = 2) and Hₐ (with parameter xₐ = 1).
  • Question: should the critical region K be right-sided or left-sided?

➡️ Right-sided critical region

  • Define K as y ≥ c for some constant c.
  • Compute the size of type I error α using the integral under H₀.
  • Compute the size of type II error β using the integral under Hₐ.

⬅️ Left-sided critical region

  • Define K as y ≤ c′ for some constant c′.
  • Compute α and β for this alternative critical region.

🏆 Comparing the two tests

  • According to the Neyman-Pearson principle, both tests should have the same size α.
  • Set the two α values equal: this determines the relationship between c and c′.
  • Compare the corresponding β values: the excerpt shows that β (right-sided) < β (left-sided).
  • Conclusion: the right-sided critical region K is the best test in the sense of the Neyman-Pearson principle, because it has smaller β for the same α.

📋 General steps in hypothesis testing

🗂️ Step-by-step procedure

The excerpt summarizes the main steps for testing hypotheses:

StepActionPurpose
(a)Identify H₀ and Hₐ from experimental data and assertionsDefine what you are testing
(b)Choose the form of critical region KUse Neyman-Pearson principle to select the best form
(c)Specify the size α of type I errorUsually 0.05 or 0.01; use tables to determine K location
(d)Compute the size β of type II errorEnsure reasonable protection against type II errors
(e)Determine if observation y falls in KReject H₀ if y ∈ K; accept H₀ if y ∉ K

⚠️ Important caution

  • Never claim that the hypotheses have been proved false or true by the testing.
  • Hypothesis testing provides evidence for or against hypotheses, but does not constitute proof.
  • Don't confuse: "rejecting H₀" means the data are inconsistent with H₀ at the chosen α level, not that H₀ is definitively false.
6

General steps in testing hypotheses

1.5 General steps in testing hypotheses

🧭 Overview

🧠 One-sentence thesis

The general hypothesis-testing procedure follows five systematic steps—from formulating hypotheses through choosing a critical region to making a decision—while recognizing that no test can definitively prove a hypothesis true or false.

📌 Key points (3–5)

  • The five-step framework: identify hypotheses, choose critical region form, specify type I error size, compute type II error, and make a decision based on whether the observation falls in the critical region.
  • Neyman-Pearson principle guides choice: use this principle to select the form of the critical region that gives the best test.
  • Two error types must be balanced: specify the acceptable type I error (α) and verify that type II error (β) provides reasonable protection.
  • Common confusion: rejecting or accepting a hypothesis does not prove it false or true—testing only provides a decision rule under uncertainty.
  • Critical region determines the decision: reject the null hypothesis if the observation falls in K, accept if it does not.

📋 The five-step testing procedure

📋 Step (a): Identify hypotheses

  • From the experimental data and the assertions to be examined, identify:
    • The appropriate null hypothesis (H₀)
    • The alternative hypothesis (Hₐ)
  • This step depends on the nature of the data and what you want to test.

📋 Step (b): Choose the critical region form

  • Choose the form of the critical region K that is likely to give the best test.
  • Use the Neyman-Pearson principle to make this choice.
  • The excerpt's earlier comparison showed that a right-sided critical region can outperform a left-sided one under this principle.
  • Example: For the same type I error size, compare type II errors to determine which critical region form is superior.

📋 Step (c): Specify type I error size

  • Specify the size of the type I error, α, that you wish to assign to the testing process.
  • Use tables to determine the location of the critical region K from this specified α.
  • This step sets the threshold for how much false-positive risk you will tolerate.

📋 Step (d): Compute type II error size

  • Compute the size of the type II error (β).
  • Purpose: to ensure that there exists reasonable protection against type II errors.
  • Don't confuse: both error types matter—specifying α alone is not enough; you must verify that β is also acceptable.

📋 Step (e): Make the decision

After the test has been explicitly formulated:

  • Determine whether the sample or observation y falls in the critical region K or not.
  • Decision rule:
    • Reject H₀ if y ∈ K
    • Accept H₀ if y ∉ K

⚠️ Critical limitation of hypothesis testing

⚠️ Tests do not prove hypotheses

Never claim that the hypotheses have been proved false or true by the testing.

  • Testing provides a decision procedure, not proof.
  • Even when you reject H₀, you have not proven it false; you have only decided that the evidence is inconsistent with it under your chosen error rates.
  • Even when you accept H₀, you have not proven it true; you have only decided that the evidence does not warrant rejection.
  • This is a fundamental limitation of the hypothesis-testing framework.

🔗 Connection to simple likelihood ratio testing

🔗 Simple hypotheses context

The excerpt transitions to testing simple hypotheses:

  • A simple null hypothesis H₀ specifies a single parameter value.
  • A simple alternative hypothesis Hₐ also specifies a single parameter value.
  • Example context: deciding whether an observation y came from distribution p(y|x₀) or from distribution p(y|xₐ).

🔗 Likelihood ratio principle

The simple likelihood ratio test compares two likelihood values:

  • Reject H₀ if the ratio p(y|x₀) / p(y|xₐ) is small (less than a constant α).
  • This means: reject H₀ if it is more likely that the observation came from Hₐ than from H₀.
  • The constant α defines a family of tests; different values of α yield different tests.
  • This approach extends the maximum likelihood principle from estimation theory to hypothesis testing.

🔗 How it works

  • In estimation theory: maximize p(y|x) over all possible x to find the most likely parameter value.
  • In hypothesis testing: only compare the two specific likelihood values p(y|x₀) and p(y|xₐ).
  • Decide that y came from H₀ if p(y|x₀) > p(y|xₐ); decide y came from Hₐ if p(y|x₀) < p(y|xₐ).
  • This method produces a family of tests that contain "some good tests" according to the excerpt.
7

2.1 The simple likelihood ratio test

2.1 The simple likelihood ratio test

🧭 Overview

🧠 One-sentence thesis

The simple likelihood ratio test provides a systematic method for choosing between two simple hypotheses by comparing their likelihood values, and the Neyman-Pearson theorem proves that this test is most powerful among all tests of the same size.

📌 Key points (3–5)

  • What the test does: compares the likelihood of an observation under the null hypothesis versus the alternative hypothesis, rejecting the null when the ratio is small.
  • How it works: reject H₀ if p(y|x₀) / p(y|xₐ) < a, where a is a positive constant that defines different tests.
  • Why it matters: the Neyman-Pearson theorem guarantees that the simple likelihood ratio test is a most powerful test—it minimizes type II error (β) for any fixed type I error (α).
  • Common confusion: the test is not about proving hypotheses true or false; it is about deciding which distribution the observation more likely came from.
  • Power vs type II error: power (γ = 1 - β) is the probability of correctly rejecting H₀ when Hₐ is true; higher power means better protection against type II errors.

🎯 The testing problem setup

🎯 Simple hypotheses

A simple hypothesis specifies a single, known value for the parameter.

  • The excerpt considers testing a simple null hypothesis H₀ against a simple alternative hypothesis Hₐ.
  • Both hypotheses specify exact parameter values: H₀: x = x₀ versus Hₐ: x = xₐ.
  • The m×1 random vector y is distributed according to one of these two parameter values.

🔍 The decision question

  • Given an observation y, determine from which distribution the observation came: from p(y|x₀) or from p(y|xₐ)?
  • This is not estimation (finding an unknown parameter) but decision-making (choosing between two known possibilities).
  • The method is closely related to the maximum likelihood principle from estimation theory.

🧩 Likelihood principle applied

  • In estimation theory: maximize p(y|x) over all possible x to find the most likely parameter value.
  • In hypothesis testing: only compare two specific likelihood values, p(y|x₀) and p(y|xₐ).
  • Intuition: decide that y came from H₀ if p(y|x₀) > p(y|xₐ), and from Hₐ if p(y|x₀) < p(y|xₐ).
  • The higher the probability (likelihood) of the observed y under a given parameter, the more attracted we are to that explanation.

📐 The simple likelihood ratio test definition

📐 Test structure

The simple likelihood ratio test is defined by: reject H₀ if p(y|x₀) / p(y|xₐ) < a, where a is a positive constant.

  • For each different value of a, we have a different test.
  • The test says to reject H₀ if the ratio of likelihoods is small—that is, if it is more likely that the observation came from p(y|xₐ) than from p(y|x₀).
  • This expands the simple comparison into a family of tests.

🔧 How to execute the test

The excerpt outlines a five-step procedure (referenced from earlier material):

  1. Choose hypotheses: formulate the null H₀ and alternative Hₐ.
  2. Choose critical region form K: use the Neyman-Pearson principle to select the best test form.
  3. Specify type I error size α: fix the acceptable probability of rejecting H₀ when it is true.
  4. Compute type II error size β: ensure reasonable protection against accepting H₀ when Hₐ is true.
  5. Make the decision: reject H₀ if y ∈ K, accept if y ∉ K; never claim the hypotheses have been proved true or false.

⚠️ Important warning

  • The excerpt emphasizes: "Never claim however that the hypotheses have been proved false or true by the testing."
  • Testing is about decision-making under uncertainty, not about proof.

🧪 Example 1: Testing variance (normal distribution)

🧪 Problem setup

  • The m×1 random vector y is normally distributed.
  • H₀: y ~ N(0, σ₀² I_m) versus Hₐ: y ~ N(0, σₐ² I_m), with σₐ² > σ₀².
  • Both hypotheses have zero mean; only the variance differs.

🧮 Deriving the test

Starting with the likelihood ratio:

  • p(y|x₀) involves exp(- y'y / (2σ₀²)) and (2π σ₀²)^(m/2).
  • p(y|xₐ) involves exp(- y'y / (2σₐ²)) and (2π σₐ²)^(m/2).
  • The ratio simplifies to: (σₐ² / σ₀²)^(m/2) exp(- y'y / 2 × (1/σ₀² - 1/σₐ²)).

Taking logarithms and simplifying (since σₐ² > σ₀², division by (1/σ₀² - 1/σₐ²) reverses the inequality):

  • The test becomes: reject H₀ if y'y > k_a, where k_a is a critical value determined by a.

🎲 Intuitive appeal

  • For m = 1, it seems intuitively appealing to reject H₀ if the observation y is remote from the zero mean, symmetric about 0.
  • For m > 1, reject H₀ if y'y (the squared length of the observation vector) is large.
  • The simple likelihood ratio test produces exactly this critical region, matching intuition.

📊 Computing critical values and errors

  • Under H₀, y'y / σ₀² is distributed as a central chi-squared distribution with m degrees of freedom.
  • Use chi-squared tables to find k_a from the chosen α.
  • Example values (excerpt provides tables):
    • For m = 1, σ₀² = 2: α = 0.05 gives k_a = 7.68.
    • For m = 4, σ₀² = 2: α = 0.05 gives k_a = 18.98.
  • Under Hₐ, y'y / σₐ² is also chi-squared with m degrees of freedom; use this to compute β.
  • Example: for m = 1, σₐ² = 4, α = 0.05, the type II error β ≈ 0.17.

📈 Effect of sample size

  • Comparing m = 1 and m = 4 at the same α: β is smaller for m = 4 than for m = 1.
  • Increasing the number of observations increases the probability of correctly accepting Hₐ (higher power).
  • Don't confuse: if the alternative variance is closer to the null variance (e.g., σₐ² = 3 instead of 4), β increases—harder to distinguish the two hypotheses.

🧪 Example 2: Testing mean (normal distribution, known variance)

🧪 Problem setup

  • y is distributed as N(x, σ²) with known variance σ².
  • H₀: E(y) = x₀ versus Hₐ: E(y) = xₐ, with xₐ > x₀.

🧮 Deriving the test

  • The likelihood ratio involves exp(- (y - x₀)² / (2σ²)) versus exp(- (y - xₐ)² / (2σ²)).
  • Taking logarithms and simplifying: the test reduces to reject H₀ if y > k_a.
  • This is identical to the critical region chosen on intuitive grounds in an earlier example.

🔄 Standardized form

  • Transform y to the standard normal variable: (y - x₀) / σ.
  • The test becomes: reject H₀ if (y - x₀) / σ > k_a / σ.
  • This standardized form is useful for computing α and β using standard normal tables.

🖼️ Critical region visualization

  • The critical region K is the set of y values greater than k_a.
  • Figure 2.4 in the excerpt shows this one-sided critical region.

🧪 Example 3: Testing mean (normal distribution, another case)

🧪 Problem setup

  • y is distributed as N(x, σ²).
  • H₀: E(y) = x₀ versus Hₐ: E(y) = xₐ (no restriction on whether xₐ > x₀ or xₐ < x₀ in this excerpt snippet).

🧮 Deriving the test

  • The likelihood ratio simplifies through logarithms and algebraic manipulation.
  • The final test form depends on the relationship between xₐ and x₀.
  • The excerpt references comparison with Example 4 of the previous chapter, suggesting consistency with earlier intuitive choices.

🏆 Most powerful tests and the Neyman-Pearson theorem

🏆 Power of a test

The power of a test (γ) is the probability of correctly rejecting H₀ when Hₐ is true.

  • Power γ = 1 - β, where β is the type II error.
  • Power is calculated as: γ = integral over K of p(y|xₐ) dy.
  • Higher power means better ability to detect when Hₐ is true.

🥇 Definition of most powerful test

A test with critical region K and size α is a most powerful test of size α if and only if: (i) the probability that y falls in K under H₀ equals α, and (ii) for any other test with critical region K' and the same size α, the power of K is at least as large as the power of K'.

  • Among all tests with the same type I error α, the most powerful test minimizes type II error β (or equivalently, maximizes power γ).
  • This is the Neyman-Pearson testing principle rephrased in terms of power.

🎖️ Neyman-Pearson theorem

Let y be a sample from p(y|x) where x is one of two known values x₀ or xₐ. Let 0 < α < 1 be fixed, a be a positive constant, and K be a subset of the sample space satisfying: (i) the integral over K of p(y|x₀) dy = α, and (ii) p(y|x₀) / p(y|xₐ) < a for y in K, and p(y|x₀) / p(y|xₐ) ≥ a for y not in K. Then the test corresponding to the critical region K (the simple likelihood ratio test) is a most powerful test of size α for testing H₀: x = x₀ versus Hₐ: x = xₐ.

🔍 Why this matters

  • The theorem guarantees optimality: the simple likelihood ratio test is not just intuitively appealing, it is provably the best test in the Neyman-Pearson sense.
  • For any fixed α, no other test can achieve lower β (or higher power).
  • This provides a systematic, principled method for constructing optimal tests.

📝 Proof sketch

The excerpt begins the proof:

  • Let K' be any other critical region of size α.
  • Both K and K' satisfy: integral over K of p(y|x₀) dy = integral over K' of p(y|x₀) dy = α.
  • The common part (region 2) of K and K' cancels out, reducing the comparison to the non-overlapping parts.
  • The power of K is: integral over K of p(y|xₐ) dy.
  • The power of K' is: integral over K' of p(y|xₐ) dy.
  • The proof (not completed in the excerpt) shows that the power of K is at least as large as the power of K', establishing optimality.

🔄 Relationship to estimation theory

🔄 Shared principle

  • Both estimation and hypothesis testing ask: which parameter value x most likely produced the observed y?
  • In estimation: maximize p(y|x) over all possible x (no constraints).
  • In hypothesis testing: compare p(y|x) for only two specific values, x₀ and xₐ.

🔄 Maximum likelihood connection

  • The simple likelihood ratio test is closely related to the maximum likelihood principle.
  • Instead of finding the maximum, we compare two specific likelihood values.
  • This connection explains why the method is systematic and theoretically grounded.
8

Most Powerful Tests

2.2 Most powerful tests

🧭 Overview

🧠 One-sentence thesis

The Neyman-Pearson theorem proves that the simple likelihood ratio test is the most powerful test of a given size, meaning it maximizes the probability of correctly rejecting the null hypothesis when the alternative is true.

📌 Key points (3–5)

  • Power definition: the probability of correctly rejecting the null hypothesis when the alternative is true (denoted 1 - β or g).
  • Most powerful test criterion: among all tests with the same size α, choose the one with the smallest type II error β (equivalently, the largest power g).
  • Neyman-Pearson theorem: the simple likelihood ratio test is a most powerful test—it achieves the highest power for a given size α.
  • Common confusion: power vs size—size α is the probability of rejecting when the null is true (type I error), while power g is the probability of rejecting when the alternative is true (correct decision).
  • Practical implication: power increases when the hypotheses are farther apart or when measurement precision improves (smaller standard deviation).

🎯 Understanding power and the Neyman-Pearson principle

🎯 What power means

The power of a test is the probability of correctly rejecting the null hypothesis H₀.

  • Power is calculated as: the probability that the sample falls in the critical region K when the alternative hypothesis H_A is true.
  • It is denoted by g or equivalently 1 - β, where β is the type II error probability.
  • Why it matters: higher power means the test is better at detecting when the alternative hypothesis is actually true.
  • Example: if power is 0.80, the test correctly rejects the null 80% of the time when the alternative is true.

🔄 Rephrasing the Neyman-Pearson principle

  • The original principle: among all tests with the same size α, choose the one with the smallest β.
  • Rephrased in terms of power: among all tests with the same size α, choose the one with the largest power g.
  • These are equivalent because g = 1 - β, so maximizing g is the same as minimizing β.

📏 Definition of a most powerful test

A test of H₀: x = x₀ versus H_A: x = x_A with critical region K and size α is most powerful if and only if:

  • (i) The size condition holds: the probability that the sample falls in K when H₀ is true equals α.
  • (ii) For any other test with critical region K′ and the same size α, the power of K is at least as large as the power of K′.

Don't confuse: "most powerful" does not mean "largest critical region"—it means highest probability of correct rejection when the alternative is true, among all tests with the same false-positive rate.

🏆 The Neyman-Pearson theorem

🏆 Statement of the theorem

The Neyman-Pearson theorem establishes that the simple likelihood ratio test is optimal.

Theorem: Let y be a sample from p_y(y|x) where x is one of two known values x₀ or x_A, and let 0 < α < 1 be fixed. Let a be a positive constant and K be a subset of the sample space satisfying:

  • (i) The probability that y falls in K when H₀ is true equals α.
  • (ii) For every point y in K, the likelihood ratio satisfies: p_y(y|x₀) / p_y(y|x_A) ≤ a; and for every point y outside K, the likelihood ratio satisfies: p_y(y|x₀) / p_y(y|x_A) ≥ a.

Then the test corresponding to critical region K (the simple likelihood ratio test) is a most powerful test of size α for testing H₀: x = x₀ versus H_A: x = x_A.

🔍 Key insight from the theorem

  • The theorem does not explicitly tell you how to find the constant a and the region K.
  • Implicitly, it does: the form of the critical region K is given by condition (ii).
  • In practice, you manipulate the inequality p_y(y|x₀) / p_y(y|x_A) ≤ a into an equivalent, easier form and express the test in terms of the new inequality.
  • Example: the excerpt shows transforming the likelihood ratio inequality into a standard normal test statistic.

📐 Proof sketch

The proof compares the simple likelihood ratio test's critical region K with any other critical region K′ of the same size α:

  1. Both K and K′ have size α, so the probability under H₀ is the same for both.
  2. The regions overlap in some common part; the difference in power comes from the non-overlapping parts.
  3. Using condition (ii), every point in K but not in K′ has a favorable likelihood ratio, and every point in K′ but not in K has an unfavorable likelihood ratio.
  4. This guarantees that the power of K is at least as large as the power of K′.
  5. Since K′ was arbitrary, K is most powerful.

Don't confuse: the proof does not require computing the constant a explicitly—it uses the structure of the likelihood ratio inequality to show optimality.

📊 Multi-dimensional example and power properties

📊 Multi-dimensional generalization (Example 4)

The excerpt extends the one-dimensional case to an m-dimensional random vector y distributed as normal with mean and known variance σ².

Hypotheses:

  • H₀: E(y) = x₀
  • H_A: E(y) = x_A

Simple likelihood ratio test: After manipulating the likelihood ratio inequality, the test reduces to:

  • Reject H₀ if: (unit vector c)ᵀ y > k_α
  • where c is the unit vector in the direction of (x_A - x₀), and k_α is the critical value.

Test statistic: The scalar random variable z = (unit vector c)ᵀ y has a standard normal distribution under H₀.

📈 How power depends on problem parameters

The power g can be calculated and depends on:

  • Distance between hypotheses: denoted by the length of (x_A - x₀), written as a bar over (x_A - x₀).
  • Measurement precision: the standard deviation σ.
FactorEffect on powerIntuition
Larger distance between H₀ and H_APower increasesHypotheses are easier to distinguish
Smaller standard deviation σPower increasesBetter precision makes detection easier

Formula: Under H_A, the test statistic is distributed as normal with mean (distance / σ) and variance 1. Power is a monotone increasing function of (distance / σ).

Example: If the alternative hypothesis is farther from the null (larger distance), the test has higher power for the same size α. If observations are more precise (smaller σ), the test also has higher power.

⚠️ Common confusion: size vs power

  • Size α: probability of rejecting H₀ when H₀ is true (type I error, false positive).
  • Power g: probability of rejecting H₀ when H_A is true (correct decision, true positive).
  • These are computed under different hypotheses: size under H₀, power under H_A.
  • The Neyman-Pearson approach fixes α and maximizes g, not the other way around.
9

The w-Teststatistic

2.3 The -Teststatistic

🧭 Overview

🧠 One-sentence thesis

The w-teststatistic provides a simple likelihood ratio test for detecting errors in observations by transforming a composite hypothesis about observation means into a simple hypothesis about misclosures, enabling practical hypothesis testing in geodetic applications.

📌 Key points (3–5)

  • The transformation strategy: composite hypotheses about observations (H₀ and Hₐ) are transformed into simple hypotheses about misclosures t = By, making the likelihood ratio test applicable.
  • What w measures: the w-teststatistic is the orthogonal projection of the misclosure vector t onto the direction of the expected error, standardized by its variance.
  • Common confusion: accepting H₀ (the simple hypothesis about misclosures) does not necessarily mean accepting H₀ (the composite hypothesis about observations), because H₀ can be false while H₀ is true; however, rejecting H₀ does imply rejecting H₀.
  • Power dependencies: the test's ability to detect errors increases with larger error size (—), better observation precision (smaller σ), and better network design (more condition equations involving the observation).
  • Practical computation: w can be computed directly from least-squares adjustment results using residuals and their variance matrix.

🔄 From composite to simple hypotheses

🔄 The starting problem

The linear model of observation equations is:

  • y is an m×1 random vector, normally distributed
  • Mean: E(y) = Ax
  • Variance matrix: Dᵧ = Qᵧ
  • The null hypothesis H₀: E(y) = Ax is composite because the n×1 parameter vector x is unspecified.

The theory developed in previous sections applies only to simple hypotheses, so a transformation is needed.

🔄 Transformation to condition equations

The equivalent linear model of condition equations is:

  • B^T E(y) = 0
  • The matrices A and B satisfy: BA = 0
  • This formulation is completely equivalent to the observation equation model.

The null hypothesis becomes H₀: B^T E(y) = 0, which is still composite because only b linear independent functions of E(y) are specified, leaving m - b = n functions unspecified.

🔄 The misclosure vector transformation

Define the misclosure vector:

t = By (b×1 vector)

Under H₀, t is normally distributed with:

  • Mean: E(t) = 0
  • Variance matrix: Qₜ = BQᵧB^T

This gives the simple hypothesis H₀: E(t) = 0.

Important limitation: H₀ follows from H₀, but H₀ does not follow from H₀ because matrix B is not invertible. Therefore:

  • If H₀ is rejected → H₀ must be rejected
  • If H₀ is accepted → be very careful about accepting H₀

Example scenario: If the true hypothesis has E(y) = Ax + Δy with Δy ≠ 0 or ΔQᵧ ≠ 0, but the vector Δy and columns of ΔQᵧ lie in the nullspace of B (i.e., BΔy = 0 and BΔQᵧB^T = 0), then the distribution of t under the true hypothesis becomes identical to H₀, even though the true hypothesis differs from H₀.

🎯 Alternative hypothesis and error modeling

🎯 Modeling errors in observations

The alternative hypothesis specifies that y has a different mean:

  • Hₐ: E(y) = Ax + cᵧ—
  • cᵧ is a known m×1 vector modeling the error structure
  • — is a known positive scalar representing the error magnitude

🎯 Types of errors modeled

Single observation error: To model an error in the i-th observation:

  • cᵧ has 1 in the i-th position, 0 elsewhere
  • — represents the blunder in that observation

Systematic error: If all observations contain a systematic error —:

  • cᵧ is a vector of all ones
  • — represents the common systematic error

🎯 Effect on misclosures

Under Hₐ, the distribution of t = By becomes:

  • Mean: E(t) = Bcᵧ— = cₜ—
  • Variance matrix: Qₜ = BQᵧB^T (unchanged)

This gives the simple alternative hypothesis Hₐ: E(t) = cₜ—, where cₜ = Bcᵧ.

To make Hₐ a simple hypothesis, both cₜ and — must be known (though this assumption will be relaxed in later chapters).

📐 The w-teststatistic definition and geometry

📐 Deriving the test

The probability density functions under H₀ and Hₐ are:

  • Under H₀: normal with mean 0, variance Qₜ
  • Under Hₐ: normal with mean cₜ—, variance Qₜ

The simple likelihood ratio test inequality simplifies to:

  • cₜ^T Qₜ⁻¹ t > kₐ

Standardizing by dividing by the square root of (cₜ^T Qₜ⁻¹ cₜ) gives the test in standard normal form.

📐 Definition of w

w-teststatistic: w = (cₜ^T Qₜ⁻¹ t) / sqrt(cₜ^T Qₜ⁻¹ cₜ)

The simple likelihood ratio test becomes:

  • Reject H₀ if w > kₐ

📐 Distribution of w

Under H₀: w is distributed as N(0, 1) (standard normal) Under Hₐ: w is distributed as N(—/σ, 1)

📐 Geometric interpretation in misclosure space

Define an inner product in the b-dimensional space with metric Qₜ⁻¹:

  • Norm: ||t|| = sqrt(t^T Qₜ⁻¹ t)
  • Inner product: ⟨t₁, t₂⟩ = t₁^T Qₜ⁻¹ t₂

Then w can be written as:

  • w = ⟨t, cₜ⟩ / ||cₜ||

Interpretation: w is the orthogonal projection of the misclosure vector t onto the line with direction vector cₜ, in the metric defined by Qₜ⁻¹.

The critical region K consists of all t such that this projection exceeds kₐ.

🔧 Expression in terms of observations and residuals

🔧 Using observation quantities

Substituting t = By, Qₜ = BQᵧB^T, and cₜ = Bcᵧ into the w-teststatistic gives:

  • w = (cᵧ^T B^T (BQᵧB^T)⁻¹ By) / sqrt(cᵧ^T B^T (BQᵧB^T)⁻¹ Bcᵧ)

🔧 Using least-squares residuals

From adjustment theory, the least-squares residual vector and its variance matrix are:

  • ê = Qᵧ B^T (BQᵧB^T)⁻¹ By
  • Q_ê = Qᵧ - Qᵧ B^T (BQᵧB^T)⁻¹ BQᵧ

Substituting these into the expression for w gives:

  • w = (cᵧ^T Qᵧ⁻¹ ê) / sqrt(cᵧ^T Qᵧ⁻¹ Q_ê Qᵧ⁻¹ cᵧ)

Practical advantage: w can be computed directly from the results of least-squares adjustment of either the observation equation model or the condition equation model.

🔧 Geometric interpretation in sample space

Using the projection operator P_A^⊥ (orthogonal projection onto the orthogonal complement of the range space of A):

  • P_A^⊥ y = ê (the residual vector)
  • P_A^⊥ cᵧ models the projected error direction

Define an inner product in the m-dimensional sample space with metric Qᵧ⁻¹:

  • w = ⟨P_A^⊥ y, P_A^⊥ cᵧ⟩ / ||P_A^⊥ cᵧ||

Interpretation: w is the orthogonal projection of P_A^⊥ y onto the line with direction P_A^⊥ cᵧ, where both are projections onto the orthogonal complement of R(A).

📊 Power and design considerations

📊 Computing the power

The power γ (probability of correctly rejecting H₀ when Hₐ is true) is:

  • γ = P(w > kₐ | Hₐ)

Since w is distributed as N(—/σ, 1) under Hₐ, this becomes:

  • γ = Φ(—/σ - kₐ)

where Φ is the standard normal cumulative distribution function.

📊 Three factors affecting power

FactorEffect on powerWhy
Error size (—)Larger — → higher γLarger errors are easier to detect
Observation precision (σ)Smaller σ → higher γBetter precision makes errors more distinguishable
Network designBetter design → higher γReflected in the scalar sqrt(cₜ^T Qₜ⁻¹ cₜ)

📊 Network design impact

The design/structure of the network is contained in:

  • sqrt(cₜ^T Qₜ⁻¹ cₜ) = sqrt(cᵧ^T B^T (BQᵧB^T)⁻¹ Bcᵧ)

This depends on:

  • Matrix B (the structure of condition equations)
  • Matrix Qᵧ (the precision of observations)

Example from levelling networks: An observation that appears in two independent condition equations (two loops) has higher detection power than one appearing in only one condition equation (one loop). Specifically, if y₂ appears in two loops, the power is higher than if it appears in only one loop, for the same error size — and precision σ.

Design principle: When designing geodetic networks, ensure that observations occur in enough condition equations to maintain adequate error-detection power.

📊 General power formula

For the general test, the power can be written as:

  • γ = Φ(sqrt(cₜ^T Qₜ⁻¹ cₜ) · — - kₐ)

This shows that γ decreases if:

  • — decreases (smaller errors)
  • sqrt(cₜ^T Qₜ⁻¹ cₜ) decreases (worse design or precision)

📋 Summary of derivation steps

The table in the excerpt summarizes the transformation process:

  1. Start: Composite hypotheses H₀ and Hₐ about y (normally distributed, m×1, with variance Qᵧ)

    • H₀: E(y) = Ax or B^T E(y) = 0
    • Hₐ: E(y) = Ax + cᵧ— or B^T E(y) = cₜ—
  2. Transformation: Define t = By with E(t) = B E(y) and Qₜ = BQᵧB^T

  3. Simple hypotheses:

    • H₀: E(t) = 0
    • Hₐ: E(t) = cₜ—, where cₜ = Bcᵧ
  4. Test: Simple likelihood ratio test of size α

    • Reject H₀ if w > kₐ
    • w = (cₜ^T Qₜ⁻¹ t) / sqrt(cₜ^T Qₜ⁻¹ cₜ)
    • Equivalently: w = (cᵧ^T Qᵧ⁻¹ ê) / sqrt(cᵧ^T Qᵧ⁻¹ Q_ê Qᵧ⁻¹ cᵧ)

This w-teststatistic plays a very important role in hypothesis testing for geodetic applications.

10

The w-Teststatistic and v-Teststatistic

2.4 The -Teststatistic

🧭 Overview

🧠 One-sentence thesis

The w-teststatistic and v-teststatistic are mathematically equivalent simple likelihood ratio tests that differ in their application context: w tests whether observations match a known structure, while v tests whether parameters are significantly different from zero.

📌 Key points (3–5)

  • What w tests: whether the expected value of transformed observations equals zero (null hypothesis) or a known vector (alternative hypothesis).
  • What v tests: whether a linear function of parameters equals zero (null hypothesis) or a known nonzero scalar (alternative hypothesis).
  • Mathematical equivalence: both are simple likelihood ratio tests with the same structure, but v is expressed in terms of parameter estimates rather than observation residuals.
  • Common confusion: w and v test different things (observations vs parameters) but use the same underlying likelihood ratio framework; rejecting the transformed hypothesis H₀' implies rejecting the original H₀, but accepting H₀' does not necessarily mean accepting H₀.
  • Practical use: v is particularly useful for testing the significance of parameters in mixed models, with an intuitive interpretation when testing individual parameter components.

📐 The w-teststatistic framework

📐 What w measures

The w-teststatistic: a simple likelihood ratio test for testing whether the expected value of transformed observations equals zero versus a known nonzero vector.

  • The test operates on transformed observations t = By, where B is a transformation matrix and y is the original observation vector.
  • The null hypothesis H₀ states that the expected value E(t) = 0.
  • The alternative hypothesis Hₐ states that E(t) = cₜ, where cₜ is a known vector.
  • Both hypotheses are simple (not composite), meaning they specify exact parameter values.

🔢 The w formula

The w-teststatistic is computed as:

w = (cₜ' Qₜ⁻¹ t) / (cₜ' Qₜ⁻¹ cₜ)

where:

  • t is the transformed observation vector
  • cₜ is the known vector under the alternative hypothesis
  • Qₜ is the variance matrix of t

Alternative expression in sample space:

w = (c_y' Q_y⁻¹ ê) / (c_y' Q_y⁻¹ Q_ê Q_y⁻¹ c_y)

where:

  • ê is the least-squares residual vector
  • Q_ê is the variance matrix of the residuals
  • c_y is the known vector in the original observation space

⚖️ The decision rule

The simple likelihood ratio test of size α rejects H₀ if:

w > k_α

where k_α is a threshold determined by the significance level α.

  • The test compares the observed w value to a critical value.
  • Larger w values provide stronger evidence against the null hypothesis.
  • The network structure is reflected in matrix B, and observation precision is reflected in Q_y.

🎯 The v-teststatistic framework

🎯 What v measures

The v-teststatistic: a simple likelihood ratio test for testing whether a linear function of parameters equals zero versus a known nonzero scalar.

  • This test addresses the problem of testing parameter significance.
  • The null hypothesis H₀ states that b'x = 0, where b is a known vector and x is the parameter vector.
  • The alternative hypothesis Hₐ states that b'x = ω, where ω is a known nonzero scalar.
  • Mathematically equivalent to the w-test but expressed in terms of parameters rather than observations.

🔄 Transformation to equivalent form

To apply the likelihood ratio framework, the hypotheses are rewritten using:

General solution of b'x = ω:

x = b(b'b)⁻¹ω + b̂λ

where:

  • b(b'b)⁻¹ω is a particular solution
  • b̂ is an n×(n-1) matrix whose columns are orthogonal to b (so b'b̂ = 0)
  • λ is a vector of free parameters

This transforms the hypotheses into:

  • H₀: E(y) = Ab̂λ (equivalent to b'x = 0)
  • Hₐ: E(y) = Ab(b'b)⁻¹ω + Ab̂λ (equivalent to b'x = ω)

The matrix Ab̂ plays the role of A in the w-test framework, and Ab(b'b)⁻¹ plays the role of c_y.

🔢 The v formula

The v-teststatistic simplifies to:

v = (b'x̂_A) / sqrt(b'Q_x̂_A b)

where:

  • x̂_A is the least-squares estimate of parameters under the alternative hypothesis
  • Q_x̂_A is the variance matrix of x̂_A

Equivalently:

v = ω / sqrt(b'Q_x̂_A b)

This shows that v measures how many standard deviations the hypothesized value ω is from zero.

⚖️ The decision rule for v

The simple likelihood ratio test of size α rejects H₀ if:

v > k_α

  • Same structure as the w-test but applied to parameters.
  • The test is intuitively appealing: if b = (0...1...0) with 1 in the i-th position, then v = x̂ᵢ_A / sᵢ_A, testing whether the i-th parameter is significantly different from zero.

📊 Mixed model context

📊 Least-squares solution

For the mixed model under H₀:

E(y) = Ax, b'x = 0

The least-squares solution is:

x̂_A = (A'Q_y⁻¹A)⁻¹A'Q_y⁻¹y

Q_x̂_A = (A'Q_y⁻¹A)⁻¹

The residual vector and its variance matrix are:

ê = y - Ax̂_A

Q_ê = Q_y - AQ_x̂_A A'

🎲 Distribution under alternative hypothesis

Under Hₐ, the parameter estimate is distributed as:

x̂_A ~ N(b̂λ + b(b'b)⁻¹ω, Q_x̂_A)

Therefore, the v-teststatistic is distributed under Hₐ as:

v ~ N(ω / sqrt(b'Q_x̂_A b), 1)

This is a normal distribution with mean ω / sqrt(b'Q_x̂_A b) and variance 1.

📈 Power of the test

The power of the test (probability of correctly rejecting H₀ when Hₐ is true) is:

γ = P(v > k_α | Hₐ is true)

This depends on:

  • The significance level α (which determines k_α)
  • The true value ω
  • The variance b'Q_x̂_A b

Larger ω or smaller variance leads to higher power.

🔬 Example: levelling network

🔬 Problem setup

A levelling network measures height differences between points 1 and 2.

Assumptions:

  • Observations are normally distributed
  • Observations are uncorrelated
  • All observations have equal variance s²

Hypotheses:

  • H₀: the height difference between points 1 and 2 equals zero
  • Hₐ: the height difference equals ω (a known nonzero value)

🧮 Computing the teststatistic

To compute v, we need:

  • Vector b = (1, -1)' (representing the height difference)
  • Parameter estimate x̂_A from least-squares adjustment
  • Variance matrix Q_x̂_A

The model is:

E(y) = Ax, b'x = 0 (under H₀) or b'x = ω (under Hₐ)

After least-squares adjustment, substitute the results into:

v = ω / sqrt(b'Q_x̂_A b)

✅ Interpretation

  • If v > k_α, reject H₀ and conclude the height difference is significantly different from zero.
  • The test quantifies whether the observed height difference is consistent with the hypothesized value ω.
  • Don't confuse: the test does not tell us the true height difference; it only tests whether a specific hypothesized value (zero vs ω) is more consistent with the data.

🔗 Relationship between w and v

🔗 Structural equivalence

Aspectw-teststatisticv-teststatistic
TestsTransformed observationsParameters
Null hypothesisE(t) = 0b'x = 0
Alternative hypothesisE(t) = cₜb'x = ω
Formula structureRatio involving residualsRatio involving parameter estimates
Mathematical frameworkSame likelihood ratio testSame likelihood ratio test

⚠️ Important distinction

  • Both use the simple likelihood ratio test framework.
  • w is expressed in terms of observation residuals and their variances.
  • v is expressed in terms of parameter estimates and their variances.
  • The transformation t = By connects the observation space to the hypothesis space for w.
  • The decomposition x = b(b'b)⁻¹ω + b̂λ connects the parameter space to the hypothesis space for v.

🔄 Acceptance vs rejection

  • Rejecting H₀' (the transformed hypothesis) implies rejecting H₀ (the original hypothesis).
  • Accepting H₀' does not necessarily imply accepting H₀.
  • This asymmetry is fundamental to hypothesis testing: we can only reject or fail to reject, not definitively accept.
11

The Generalized Likelihood Ratio Test

3.1 The generalized likelihood ratio test

🧭 Overview

🧠 One-sentence thesis

The generalized likelihood ratio test provides a general method for testing composite hypotheses by comparing the maximum likelihood under the null hypothesis to the maximum likelihood over all parameter values, and it tends to reject when this ratio is small.

📌 Key points (3–5)

  • What it tests: composite hypotheses (where parameters can take multiple values) rather than just simple hypotheses.
  • How the test works: computes a ratio of maximum likelihoods—numerator maximizes over the null hypothesis parameter set, denominator maximizes over all possible parameter values.
  • The ratio's range: always lies between 0 and 1 because the denominator maximizes over a larger set than the numerator.
  • Common confusion: this test resembles the simple likelihood ratio test but does not reduce to it; the simple likelihood ratio is not restricted to the interval [0,1].
  • When to reject: reject the null hypothesis when the ratio is smaller than a threshold constant a (where 0 < a < 1), because a small ratio indicates the null hypothesis is unlikely.

🔍 What the test measures

🔍 Composite vs simple hypotheses

Composite hypothesis: the parameter vector can take values from a subset of possible values, not just a single value.

  • The excerpt contrasts this with simple hypotheses tested in the previous chapter.
  • The null hypothesis is denoted H₀ and corresponds to parameter values in subset F₀.
  • The alternative hypothesis is H_A and corresponds to parameter values in the complement of F₀.
  • Example: testing whether a mean equals a specific value (simple) vs whether a mean is greater than a specific value (composite).

📐 The likelihood ratio formula

The generalized likelihood ratio test is defined by:

  • Numerator: maximum of the probability density function over parameter values in F₀ (the null hypothesis set).
  • Denominator: maximum of the probability density function over all parameter values in F (the full parameter space).
  • Decision rule: reject H₀ if this ratio is less than constant a.

The ratio measures how well the null hypothesis explains the data compared to the best possible explanation.

🎯 Properties of the ratio

🎯 Why the ratio lies in [0,1]

  • Lower bound (≥ 0): both numerator and denominator are nonnegative quantities (probabilities).
  • Upper bound (≤ 1): the denominator maximizes over a larger set of parameter values than the numerator, so it cannot be smaller than the numerator.
  • Don't confuse: the simple likelihood ratio from Chapter 2 is not restricted to [0,1].

🎯 Choosing the threshold constant a

  • The constant a must lie in the open interval (0,1).
  • Why exclude a = 0: we want to reject H₀ when the ratio equals zero (perfect evidence against the null).
  • Why exclude a = 1: we want to accept H₀ when the ratio equals one (null hypothesis explains data as well as any alternative).

🧠 Intuition and practical considerations

🧠 Why the test makes sense

The ratio tends to be small when H₀ is not true, because:

  • If the null hypothesis is false, the true parameter lies outside F₀.
  • The denominator can find a much better fit by maximizing over all parameters.
  • The numerator is constrained to the null hypothesis set and produces a smaller maximum.
  • Result: the ratio becomes small, triggering rejection.

⚠️ Potential drawbacks

The excerpt identifies two difficulties:

ChallengeDescription
Finding the maximumIt can be difficult to compute the maximum of the probability density function over the parameter set
Finding the distributionIt can be difficult to determine the probability distribution of the ratio, which is needed to evaluate the test's size and power
  • Size α: the probability of incorrectly rejecting H₀ when it is true.
  • Power γ: the probability of correctly rejecting H₀ when it is false.

🔧 General performance

  • In general (but not always), a generalized likelihood ratio test will be a good test.
  • The excerpt does not guarantee optimality in all cases.

📊 Worked examples

📊 Example with exponential distribution

The excerpt presents an example where:

  • The random variable has an exponential-like probability density function.
  • H₀ is a simple hypothesis (parameter equals a specific value x₀).
  • H_A is a composite hypothesis (parameter is less than x₀).
  • The test reduces to: reject H₀ if y/x₀ > k, where k is a constant greater than 1.

📊 Example with normal distribution (known variance)

Another example considers:

  • A normally distributed random variable with known variance σ².
  • Testing whether the mean equals x₀ (null) versus the mean is greater than x₀ (alternative).
  • The test reduces to: reject H₀ if (y - x₀)/σ > k_a, where k_a > 0.
  • Under H₀, the test statistic (y - x₀)/σ has a standard normal distribution.

📊 Example with normal distribution (unknown variance)

A more complex example involves:

  • A vector of normally distributed observations with unknown variance.
  • Testing whether the variance equals a specific value.
  • The test statistic involves a chi-squared distribution under H₀.
  • The generalized likelihood ratio test reduces to comparing a sum of squared deviations to a threshold.

🎓 Connection to uniformly most powerful tests

🎓 The power function concept

Power function γ(x): the function of parameter x that gives the probability the sample will fall in the critical region when x is the true parameter value.

  • For composite hypotheses, power depends on which specific alternative parameter value is true.
  • To compare tests, we must compare power across all possible alternative values, not just one.

🎓 When the generalized likelihood ratio test is optimal

Uniformly most powerful test: a test with critical region K is uniformly most powerful of size α if it has maximum power for all alternative parameter values among all tests of size α.

  • A uniformly most powerful test does not exist for all testing problems.
  • When one exists, the generalized likelihood ratio test may coincide with it.
  • The Neyman-Pearson theorem can help identify uniformly most powerful tests when H₀ is simple and H_A is composite.
  • If the same test emerges for all alternative parameter values, it is uniformly most powerful; otherwise, no uniformly most powerful test exists.
12

3.2 Uniformly most powerful tests

3.2 Uniformly most powerful tests

🧭 Overview

🧠 One-sentence thesis

A uniformly most powerful test maximizes the probability of correctly rejecting the null hypothesis across all alternative parameter values, though such tests do not exist for every testing problem and often require restricting the class of critical regions through principles like invariance.

📌 Key points (3–5)

  • What the power function measures: the probability that a sample falls in the critical region as a function of the true parameter value, allowing comparison of tests across all alternatives.
  • What makes a test uniformly most powerful: it has the greatest chance of rejecting the null hypothesis whenever it should, uniformly across all alternative parameter values, among all tests of the same size.
  • When uniformly most powerful tests exist: sometimes derivable using the Neyman-Pearson theorem when the null is simple and the alternative composite, by showing the same test works for all alternative parameter values.
  • Common confusion: many testing problems have no uniformly most powerful test because the class of critical regions is too large; restricting the class (e.g., through invariance) can yield a uniformly most powerful invariant test.
  • Key result: generalized likelihood ratio tests in linear models turn out to be uniformly most powerful invariant tests.

📊 Power function and optimality

📊 What the power function is

Power function g(x): the function of the parameter x that gives the probability that the sample or observation will fall in the critical region of the test when x is the true value of the parameter.

  • Recall that power is the probability of correctly rejecting the null hypothesis.
  • For simple alternatives, power can be calculated for one specific alternative value.
  • For composite alternatives (a class of alternative parameter values), power depends on which particular alternative is true.
  • The power function allows comparing tests across all possible alternative values, not just one.
  • Calculation: g(x) is the probability that the observation falls in the critical region K when x is the true parameter.

🎯 Definition of uniformly most powerful test

A test of H₀: x in F₀ versus Hₐ: x in F \ F₀ with critical region K is uniformly most powerful of size α if and only if:

  • (i) The maximum of g(x) over x in F₀ is at most α (the size condition).
  • (ii) g(x) ≥ g*(x) for all x in F \ F₀, for any competing test with critical region K* and size α.

Key insight: "uniformly" refers to all alternative x values—the test has the greatest chance of rejecting H₀ whenever it should, across the entire alternative space.

🔍 Why uniformly most powerful tests are desirable

  • Among all tests of the same size α, a uniformly most powerful test maximizes power for every possible alternative parameter value.
  • This is "quite a nice test" because it performs optimally no matter which alternative is true.
  • However, such tests do not exist for all testing problems.

🔧 Finding uniformly most powerful tests

🔧 Using the Neyman-Pearson theorem

When H₀ is simple and Hₐ is composite, the Neyman-Pearson theorem can sometimes help:

  1. Choose a particular parameter value x₁ from the alternative space F \ x₀.
  2. Apply the Neyman-Pearson theorem to construct the most powerful test for the simple vs. simple problem H₀: x = x₀ versus Hₐ: x = x₁.
  3. If the same test (same critical region) results when x₁ is replaced by any other arbitrary parameter from F \ x₀, then this test is uniformly most powerful.
  4. If different tests result for different choices of x₁, then no uniformly most powerful test exists.

Example scenario (from Example 6):

  • Testing H₀: x = x₀ versus Hₐ: x < x₀ yields one test.
  • Testing H₀: x = x₀ versus Hₐ: x > x₀ yields a different test.
  • Since the two tests are not identical, no uniformly most powerful test exists for the two-sided alternative H₀: x = x₀ versus Hₐ: x ≠ x₀.

📉 When no uniformly most powerful test exists

  • Many hypothesis-testing problems have no uniformly most powerful test.
  • The excerpt states: "this is the case for all testing problems that will be considered in the remaining part of these lecture notes."
  • Reason: the class of critical regions being considered is too large.
  • Don't confuse: absence of a uniformly most powerful test does not mean no good test exists—it means no single test is best for all alternatives simultaneously.

🔄 Invariance and restricted classes

🔄 The principle of invariance

When no uniformly most powerful test exists in the full class of critical regions, restrict the class and search for a uniformly most powerful test within that restricted class.

Invariance principle: if a testing problem is invariant under a transformation, the critical region should also be invariant under that transformation.

🔄 Example of invariance (Example 9)

Consider testing:

  • H₀: E(y) = 0 versus Hₐ: E(y) ≠ 0, where y is an m×1 random vector with known covariance.
  • Apply an invertible linear transformation v = Ry.
  • If R is orthogonal (RR* = I_m), the testing problem in terms of v is equivalent to the original problem.
  • Because of equivalence, we want the same test for both problems: if y is in K, then v should be in K, and vice versa.
  • This implies K must be invariant under orthogonal transformations.
  • From the transformation structure, the critical region must have a (hyper)spherical shape centered at 0.
  • Two possibilities: reject if y is too large (outside a sphere) or too small (inside a sphere).

🏆 Uniformly most powerful invariant test

Uniformly most powerful invariant test: a test that is uniformly most powerful within the restricted class of invariant critical regions.

  • In Example 9, the critical region that rejects when the norm of y exceeds a threshold gives the most power.
  • The generalized likelihood ratio test for that problem turns out to be the uniformly most powerful invariant test.
  • Important conclusion (stated without proof): all generalized likelihood ratio tests in the following chapters are in fact uniformly most powerful invariant tests.

🧪 Worked examples

🧪 Example with chi-squared distribution (Example 7)

  • Random variable y has a chi-squared distribution with m degrees of freedom and noncentrality parameter λ.
  • Hypotheses: H₀: λ = 0 versus Hₐ: λ > 0.
  • First consider simple hypotheses: H₀: λ = 0 versus Hₐ: λ = λₐ for a specific λₐ > 0.
  • The simple likelihood ratio is a monotone decreasing function of y.
  • Most powerful test: reject H₀ if y > k_α for some positive constant k_α.
  • Since the inequality y > k_α is independent of the specific value λₐ > 0, the same test works for all λₐ > 0.
  • Conclusion: this test is uniformly most powerful for H₀: λ = 0 versus Hₐ: λ > 0.

🧪 Example with F-distribution (Example 8)

  • Random variable y has an F-distribution with m and n degrees of freedom and noncentrality parameter λ.
  • Hypotheses: H₀: λ = 0 versus Hₐ: λ > 0.
  • Similar reasoning: the simple likelihood ratio is a monotone decreasing function of y.
  • Most powerful test for simple alternatives: reject if y > k_α.
  • Since y > k_α is independent of the specific λₐ > 0, the test is uniformly most powerful for the composite alternative.

🧪 Comparison table: when uniformly most powerful tests exist

Testing problemOne-sided or two-sidedUniformly most powerful test exists?Reason
H₀: x = x₀ vs. Hₐ: x < x₀One-sidedYesSame test for all x < x₀
H₀: x = x₀ vs. Hₐ: x > x₀One-sidedYesSame test for all x > x₀
H₀: x = x₀ vs. Hₐ: x ≠ x₀Two-sidedNoDifferent tests for x < x₀ and x > x₀
Chi-squared, H₀: λ=0 vs. Hₐ: λ>0One-sidedYesMonotone likelihood ratio
F-distribution, H₀: λ=0 vs. Hₐ: λ>0One-sidedYesMonotone likelihood ratio

Don't confuse: the generalized likelihood ratio test for a two-sided alternative (Example 6) cannot be uniformly most powerful because no such test exists, but it may still be a good test (and in fact is uniformly most powerful invariant under appropriate restrictions).

13

The models of condition and observation equations

4.1 The models of condition and observation equations

🧭 Overview

🧠 One-sentence thesis

The generalized likelihood ratio test for linear models can be formulated in multiple equivalent ways—using residuals, fitted values, or misclosures—and is a uniformly most powerful invariant test that compares a restricted null hypothesis against a more relaxed alternative hypothesis with additional explanatory variables.

📌 Key points (3–5)

  • Two equivalent formulations: hypotheses can be stated either as condition equations (B times E(y) equals zero) or as observation equations (E(y) equals A times x), and both lead to the same test.
  • Multiple test expressions: the generalized likelihood ratio test has five equivalent forms (equations 16, 22, 32, 34, 36), but only the null-hypothesis computation is needed—no explicit computation under the alternative hypothesis is required.
  • What the test detects: the test checks whether additional explanatory variables (modeled by the unknown vector in the alternative hypothesis) should be included, such as blunders, refraction effects, or departures from model assumptions.
  • Common confusion: the test result is only indicative, never proof—every model is an approximation, and both type I and type II errors are possible.
  • Distribution and power: the test statistic T_q follows a chi-squared distribution under both hypotheses (with different non-centrality parameters), and the generalized likelihood ratio test is uniformly most powerful invariant.

📐 Formulating hypotheses in linear models

📐 Condition equations vs observation equations

The excerpt presents two ways to express the same hypotheses:

Condition equations (equation 2):

  • Null hypothesis H₀: B times E(y) equals zero
  • Alternative hypothesis Hₐ: B times E(y) equals C_t times unknown vector
  • B is m-by-b of full rank b; C_t is b-by-q of full rank q

Observation equations (equations 6–7):

  • Null hypothesis H₀: E(y) equals A times x
  • Alternative hypothesis Hₐ: E(y) equals A times x plus C_y times unknown vector
  • A is m-by-n of full rank n; C_y is m-by-q of full rank q

The transformation between them uses the fact that the solution to an inhomogeneous system equals a particular solution plus the homogeneous solution. Matrix A has columns orthogonal to the columns of B (so B times A equals zero), and C_y satisfies B times C_y equals C_t.

🎯 How hypotheses are built in practice

  • Practitioners usually start with a model (either condition or observation equations) that becomes the null hypothesis H₀.
  • While formulating H₀, assumptions are made: data are free from blunders, refraction is negligible, points lie in a plane, etc.
  • The alternative hypothesis Hₐ is more relaxed and introduces additional explanatory variables (the unknown vector) to model effects assumed absent in H₀.
  • Example: the unknown vector might model the presence of one or more blunders, refraction effects, or departures from a two-dimensional plane.
  • The test of H₀ versus Hₐ informs whether the additional variables should be taken into account.

Important caveat:

The result of a test is only indicative and never a proof of the correctness of one model over another, because every model is only an approximation and both type I and type II errors are possible.

🧮 Deriving the generalized likelihood ratio test

🧮 Probability densities under H₀ and Hₐ

The excerpt assumes the m-by-1 vector of observables y is normally distributed with known variance matrix Q_y.

  • Under H₀ (equation 8): the density depends on x (n-dimensional parameter).
  • Under Hₐ (equation 9): the density depends on both x and the unknown q-dimensional vector.

The generalized likelihood ratio test compares:

  • Numerator: maximum of the density under H₀ over x.
  • Denominator: maximum of the density under Hₐ over both x and the unknown vector.

🔍 Maximum likelihood and least-squares estimates

  • The value that maximizes the density under H₀ is denoted x-hat₀; under Hₐ, the maximizers are x-hat_A and the estimate of the unknown vector.
  • For normal distributions, maximum likelihood estimates are identical to least-squares estimates.
  • The least-squares residual vector under H₀ is e-hat₀ = y minus y-hat₀; under Hₐ it is e-hat_A = y minus y-hat_A.

From equations 12 and 15, the generalized likelihood ratio becomes:

  • The ratio of the numerator (equation 12) to the denominator (equation 15) involves the quadratic forms of the residuals weighted by the inverse variance matrix.

📊 Five equivalent test expressions

ExpressionEquationFormWhat it uses
First(16)Ratio of residual normse-hat₀ and e-hat_A
Second(22)Difference of fitted valuesy-hat₀ and y-hat_A
Third(32)Estimated unknown vectorestimate of unknown vector and its variance
Fourth(34)Only null-hypothesis residuale-hat₀ and Q-hat-e₀
Fifth(36)Misclosures (condition form)t = B times y and Q_t

Key insight from equation 34: The test can be performed using only the least-squares computation under H₀. The residuals, fitted values, and unknown-vector estimate under Hₐ are not explicitly needed.

Don't confuse: although the test compares H₀ and Hₐ, you do not need to solve the least-squares problem under Hₐ to compute the test statistic.

📈 Distribution and test decision

📈 The test statistic T_q

The test statistic (equation 37) is defined as the left-hand side of the inequalities in the five expressions. Using the expression from equation 36:

T_q = (transpose of t) times (inverse of a matrix involving Q_t and C_t) times t

📊 Distribution under H₀ and Hₐ

From equations 40–42, the test statistic T_q is distributed as:

  • A chi-squared distribution with q degrees of freedom.
  • Under H₀: non-centrality parameter equals zero (central chi-squared).
  • Under Hₐ: non-centrality parameter is non-zero (non-central chi-squared).

The generalized likelihood ratio test (equation 42) rejects H₀ if:

  • T_q is greater than the critical value k_α, where k_α is determined by the size α of the test (the probability of type I error).

✅ Intuition behind the test

  • Equation 22: reject H₀ if y-hat₀ differs considerably from y-hat_A (measured in the Q-inverse metric).
  • Equation 32: reject H₀ if the estimate of the unknown vector (which is supposed to be zero under H₀) is large.
  • The test makes sense because a large discrepancy or a large estimated unknown vector suggests the additional explanatory variables are needed.

🔗 Geometric interpretation

🔗 Range spaces and projections

The excerpt introduces a geometric view (section 4.2 preview):

  • The columns of matrix A span an n-dimensional subspace R(A) of m-dimensional space.
  • The columns of the augmented matrix (A, C_y) span an (n + q)-dimensional subspace R(A, C_y).
  • Since the columns of A can be written as linear combinations of the columns of (A, C_y), R(A) is a linear subspace of R(A, C_y).

Hypotheses in geometric terms (equation 45):

  • Under H₀: the mean E(y) lies in R(A).
  • Under Hₐ: the mean E(y) lies in the larger space R(A, C_y).

🧭 Orthogonal projection and least-squares

  • The least-squares estimate y-hat₀ is the orthogonal projection of y onto R(A).
  • The least-squares estimate y-hat_A is the orthogonal projection of y onto R(A, C_y).
  • Orthogonality is measured with respect to the Q-inverse metric: the inner product of vectors u and v is (transpose of u) times (inverse of Q_y) times v.
  • The residual (y minus y-hat₀) is orthogonal to y-hat₀ in this metric (equation 46).

Example: imagine y is a point in m-dimensional space; y-hat₀ is its "shadow" on the smaller subspace R(A), and y-hat_A is its shadow on the larger subspace R(A, C_y). The test compares how much closer y-hat_A is to y than y-hat₀ is.

Don't confuse: "orthogonal" here does not mean the usual Euclidean perpendicular; it is weighted by the inverse variance matrix Q_y.

🏆 Uniformly most powerful invariant property

🏆 What the excerpt states

From the introductory paragraph (equations 76–78), the excerpt notes:

  • The generalized likelihood ratio test for the hypotheses in this chapter is in fact a uniformly most powerful invariant test.
  • This means: within the class of tests that are invariant under certain transformations, no other test has greater power for all parameter values.
  • The proof is referenced to Arnold (1981); the excerpt states this result without proof for all generalized likelihood ratio tests in the following chapters.

📌 Summary table

The excerpt concludes section 4.1 with Table 4.1, which provides an overview of:

  • The hypotheses (H₀ and Hₐ in both condition and observation forms).
  • The test statistic T_q (five equivalent expressions).
  • The distribution of T_q (chi-squared with q degrees of freedom, with non-centrality parameter zero under H₀ and non-zero under Hₐ).
  • The generalized likelihood ratio test decision rule: reject H₀ if T_q is greater than the critical value k_α.
14

A geometric interpretation of T_q

4.2 A geometric interpretation of Tq

🧭 Overview

🧠 One-sentence thesis

The test statistic T_q can be understood geometrically as measuring distances and orthogonal projections in vector spaces, which reveals why five algebraically different expressions are actually equivalent.

📌 Key points (3–5)

  • Core idea: The test statistic T_q has five algebraic expressions that are all equal, and this equality can be shown through geometric reasoning using orthogonal projections.
  • Geometric setup: Under H₀, the expected value of y lies in the range space R(A); under H_A, it lies in the larger range space R([A C_y]).
  • Orthogonality relations: Least-squares estimates correspond to orthogonal projections, and the vectors y, ŷ₀, and ŷ_A form a right-angled triangle.
  • Common confusion: The first four expressions involve vectors in m-dimensional space, but the fifth expression (involving misclosures t) lives in b-dimensional space with a different inner product.
  • Why it matters: The geometric view clarifies which expressions are computationally convenient and why they all measure the same underlying quantity.

📐 The geometric setup

📐 Range spaces and hypotheses

The excerpt establishes that:

  • Matrix A has rank n, so dim R(A) = n
  • Matrix [A C_y] has rank n + q, so dim R([A C_y]) = n + q
  • Both matrices have m rows, so their column vectors are elements of m-dimensional space

The hypotheses can be translated geometrically as: under H₀, E(y|H₀) ∈ R(A); under H_A, E(y|H_A) ∈ R([A C_y]).

  • Since the columns of A can be written as linear combinations of the columns of [A C_y], we have R(A) ⊆ R([A C_y])
  • This means R(A) is a linear subspace of R([A C_y])

🎯 Orthogonal projections

The least-squares method is interpreted as orthogonal projection:

  • ŷ₀ follows from the orthogonal projection of y onto R(A)
  • ŷ_A follows from the orthogonal projection of y onto R([A C_y])

Important: Orthogonality is measured with respect to the Q_y⁻¹-metric, meaning:

  • Inner product: (u, v) = u^T Q_y⁻¹ v
  • Norm: ||u|| = sqrt(u^T Q_y⁻¹ u)

🔺 The right-angled triangle

🔺 Four orthogonality relations

The excerpt derives four key orthogonality relations:

  1. (y - ŷ₀, ŷ₀) = 0 (because ŷ₀ is the orthogonal projection of y onto R(A))
  2. (y - ŷ_A, ŷ_A) = 0 (because ŷ_A is the orthogonal projection of y onto R([A C_y]))
  3. (y - ŷ_A, ŷ₀) = 0 (because y - ŷ_A is orthogonal to R([A C_y]), which contains R(A), which contains ŷ₀)
  4. (ŷ_A - ŷ₀, ŷ₀) = 0 (obtained by subtracting relation ii from relation iii)

📏 Pythagorean theorem application

The vectors y, ŷ₀, and ŷ_A form a right-angled triangle.

By the Pythagorean theorem:

  • ||y - ŷ₀||² = ||y - ŷ_A||² + ||ŷ_A - ŷ₀||²

In matrix form:

  • (y - ŷ₀)^T Q_y⁻¹ (y - ŷ₀) = (y - ŷ_A)^T Q_y⁻¹ (y - ŷ_A) + (ŷ_A - ŷ₀)^T Q_y⁻¹ (ŷ_A - ŷ₀)

This corresponds to the first two expressions in the five algebraic forms of T_q.

🧩 Decompositions and projections

🧩 Decomposition of ŷ_A

The estimate ŷ_A can be written as:

  • ŷ_A = A x̂_A + C_y λ̂

The vector C_y λ̂ can be further decomposed into:

  • A part in R(A): P_A C_y λ̂
  • A part in the orthogonal complement R(A)^⊥: P_A^⊥ C_y λ̂

This gives:

  • ŷ_A = A x̂_A + P_A C_y λ̂ + P_A^⊥ C_y λ̂

🔄 Relating ŷ₀ and ŷ_A

Since ŷ_A - ŷ₀ ∈ R([A C_y]) and ŷ₀ ∈ R(A), it follows that:

  • P_A^⊥ (ŷ_A - ŷ₀) = P_A^⊥ ŷ_A

Substituting the decomposition:

  • ŷ_A - ŷ₀ = P_A^⊥ C_y λ̂

Taking the norm:

  • ||ŷ_A - ŷ₀||² = ||P_A^⊥ C_y λ̂||²

In matrix form:

  • (ŷ_A - ŷ₀)^T Q_y⁻¹ (ŷ_A - ŷ₀) = λ̂^T C_y^T P_A^⊥ Q_y⁻¹ P_A^⊥ C_y λ̂

Using the result from Adjustment theory that C_y^T P_A^⊥ Q_y⁻¹ P_A^⊥ C_y = Q_λ̂⁻¹:

  • This equals λ̂^T Q_λ̂⁻¹ λ̂

This corresponds to the second and third expressions in the five algebraic forms.

🔀 Residual vector relationships

From ŷ_A - ŷ₀ = P_A^⊥ C_y λ̂, it follows that:

  • ê_A - ê₀ = -P_A^⊥ C_y λ̂

Since ê_A ∈ R([A C_y])^⊥ and R(P_A^⊥ C_y) ⊆ R([A C_y]), premultiplying by the projection P onto R([A C_y])^⊥ gives:

  • ê_A = P(P_A^⊥ C_y λ̂)

Taking the norm:

  • ||ê_A||² = ||P(P_A^⊥ C_y λ̂)||²

The matrix P(P_A^⊥ C_y) can be simplified:

  • P(P_A^⊥ C_y) = P_A^⊥ C_y - P_A ê₀

Since P_A^⊥ ê₀ = ê₀:

  • ||ê_A||² = ||P_A^⊥ ê₀||²

This corresponds to the fourth expression in the five algebraic forms.

🌐 The misclosure vector case

🌐 Different geometry for the fifth expression

Don't confuse: The first four expressions involve vectors in m-dimensional space, but the fifth expression is fundamentally different.

The vectors in the first four expressions:

  • ê₀ ∈ ℝ^m, ê_A ∈ ℝ^m, ŷ₀ ∈ ℝ^m, ŷ_A ∈ ℝ^m, C_y λ̂ ∈ ℝ^m

The misclosure vector:

  • t ∈ ℝ^b (where b is the number of condition equations)
  • t ∉ ℝ^m

🔢 The fifth quadratic form

If we consider the space ℝ^b to have an inner product defined by the Q_t⁻¹-matrix, the fifth expression becomes:

  • T_q = t^T Q_t⁻¹ t

The excerpt notes this follows from the relationships between t, y, and the matrices B and C, but the geometric interpretation is different because the vector lives in a different space.

📊 Summary table

The excerpt provides a summary showing the equivalence:

ExpressionFormSpace
1(y - ŷ₀)^T Q_y⁻¹ (y - ŷ₀) - (y - ŷ_A)^T Q_y⁻¹ (y - ŷ_A)ℝ^m
2(ŷ_A - ŷ₀)^T Q_y⁻¹ (ŷ_A - ŷ₀)ℝ^m
3λ̂^T Q_λ̂⁻¹ λ̂Parameter space
4ê₀^T Q_y⁻¹ P_A^⊥ ê₀ℝ^m
5t^T Q_t⁻¹ tℝ^b

All five expressions equal T_q, but expressions 1–4 share a common geometric framework in m-dimensional space, while expression 5 requires a different geometric interpretation in b-dimensional space.

15

The case q = 1: the w-Teststatistic

4.3 The case q = 1: the -Teststatistic

🧭 Overview

🧠 One-sentence thesis

When the number of additional parameters q equals 1, the generalized likelihood ratio test reduces to the w-teststatistic (or its square, the T-teststatistic), which is widely used in geodetic practice for detecting blunders in individual observations through a procedure called datasnooping.

📌 Key points (3–5)

  • Range of q: q must lie between 1 and m - n (the number of rows minus the number of columns in the design matrix), because q = 0 makes the hypotheses identical and q > m - n is impossible.
  • Three equivalent expressions: the 1-dimensional T-teststatistic can be written in three forms, with the first two being most useful because they do not require explicit computation under the alternative hypothesis.
  • Practical application: the w-teststatistic is used for blunder detection in observations by testing one observation at a time (datasnooping), where the observation with the largest absolute test value is suspected of containing a gross error.
  • Common confusion: the test involves q = 1 (one additional parameter at a time), not testing all observations simultaneously; this convention assumes only one blunder is present at a time.

📏 The special case q = 1

📏 Why q = 1 matters

  • The excerpt focuses on the case where q = 1, meaning only one additional explanatory variable is added to the null hypothesis model.
  • When q = 1, the matrices C_t (b × q) and C_y (m × q) reduce to vectors (b × 1 and m × 1), denoted by lowercase c_t and c_y to emphasize this reduction.
  • This simplification makes the test more tractable and directly applicable to practical problems.

🔢 Range constraints on q

The range of q is given by: 1 ≤ q ≤ m - n.

  • Why q cannot be zero: if q = 0, the matrix C_t would not exist and the null hypothesis H₀ and alternative hypothesis H_A would be identical.
  • Why q cannot exceed m - n: the rank of the combined matrix (A C_y) cannot be larger than m (the number of rows), so q ≤ m - n.
  • Example: if you have m = 10 observations and n = 4 parameters in the null model, then q can be at most 6.

🧮 Three expressions for the T-teststatistic

🧮 First expression (condition equations)

  • For q = 1, the first expression becomes: T₁ = (w)², where w is the w-teststatistic from Section 2.3.
  • The test can be written as: reject H₀ if T₁ > k_α, or equivalently, reject H₀ if |w| > square root of k_α.
  • When to use: this form is useful when hypotheses are formulated in terms of condition equations.

🧮 Second expression (most common)

  • The second expression involves the least-squares residual vector under H₀.
  • When to use: this is the most commonly used expression in practice because it does not need explicit results of least-squares computation under H_A.
  • Both the first and second expressions are more useful than the third because they avoid computing the alternative model explicitly.

🧮 Third expression (variance-based)

  • The third expression can be written as: T₁ = (numerator) / (variance of the test statistic).
  • The denominator equals the variance of the estimator, linking the test to the model error variance.
  • This form is less commonly used because it requires more computational steps.

🔍 Blunder detection and datasnooping

🔍 The blunder detection problem

  • In geodetic applications, misspecifications in the null hypothesis H₀ are very often caused by blunders or gross errors in observations.
  • The challenge: one never knows whether blunders are present, how many are present, or in which observations they are present.
  • The convention: to test for blunders, assume only one blunder is present at a time, requiring only one additional explanatory variable in the alternative hypothesis.

🔍 Testing for a blunder in the i-th observation

  • To test for a blunder in the i-th observation, the hypotheses take the form shown in equation (80).
  • The corresponding test reads: reject H₀ if the test statistic exceeds the critical value k_α.
  • The test statistic is given by equation (83): it depends on the i-th residual and the variance-covariance matrix Q_y.
  • If the test rejects H₀: a blunder or gross error in the i-th observation is suspected, and checking or remeasurement is necessary.

🔍 Datasnooping procedure

Datasnooping: the procedure of screening the entire observation vector for observational blunders by taking i successively as 1, ..., m.

  • By testing each observation one at a time, the whole observation vector can be screened.
  • Decision rule: generally, the observation with the largest value of the test statistic (in absolute sense) should be rejected.
  • Simplification: in many applications, the variance matrix Q_y is diagonal, which simplifies the test statistic to equation (84).
  • Don't confuse: datasnooping tests one observation at a time, not all observations simultaneously; this is why q = 1 is appropriate.

📊 Summary of the w-teststatistic

📊 Key components

ComponentDescription
HypothesesH₀ (null) vs H_A (alternative with one additional parameter)
Test statisticw-teststatistic (or its square, T₁)
DistributionUnder H₀, w follows a known distribution (related to the chi-square or F-distribution)
Decision ruleReject H₀ if the test statistic exceeds the critical value k_α
ApplicationDatasnooping for blunder detection in observations

📊 Practical use

  • The w-teststatistic is the 1-dimensional case of the generalized likelihood ratio test.
  • It provides a systematic way to test individual observations for gross errors.
  • The test can be applied repeatedly (datasnooping) to screen all observations, with the largest absolute test value indicating the most suspicious observation.
16

The case q = m − n: the s²-Teststatistic

4.4 The case q = m − n : the -Teststatistic

🧭 Overview

🧠 One-sentence thesis

When the number of additional parameters q equals the total redundancy m − n, the generalized likelihood ratio test simplifies to an overall model test that requires no specification of the alternative hypothesis and serves as an important safeguard for detecting any misspecification in the null hypothesis.

📌 Key points (3–5)

  • What q = m − n means: the alternative hypothesis H_A imposes no restrictions on the mean of y, so redundancy under H_A equals zero.
  • Why this case is special: the test statistic simplifies because matrix C_t becomes square and invertible, eliminating the need to specify C_y or C_t.
  • The s²-hat notation: the test statistic divided by (m − n) is denoted s²-hat because it is an unbiased estimator of the variance factor of unit weight σ².
  • Common confusion: unlike tests with q < m − n, this overall model test does not require you to specify what kind of misspecifications to expect—it detects any deviation from H_0.
  • Practical importance: this test acts as a safeguard when you cannot fully specify the class of alternative hypotheses, which is always the case in practice.

🔧 Simplification when q = m − n

🔧 What happens to the model under H_A

When q = m − n, the matrix C_y is chosen so that (A C_y) is square and of full rank, meaning no restrictions are placed on the mean of y under H_A.

  • In other words: E{y | H_A} belongs to the full m-dimensional space.
  • This implies the redundancy (overtalligheid) of the linear model under H_A equals zero.
  • Consequence: the least-squares residual vector under H_A becomes zero (ê_A = 0).

🔧 Simplification of the test statistic

The generalized likelihood ratio test statistic T_q has two expressions (from equation 88). When q = m − n:

First expression simplification:

  • Since ê_A = 0, the first expression becomes: T_{m−n} = ê^t Q_y^{−1} ê
  • Here ê is the least-squares residual vector under H_0 (the index "0" is dropped for clarity).

Second expression simplification:

  • The matrix C_t has b rows and (m − n) columns, but b = m − n, so C_t is square and of full rank.
  • Therefore C_t is invertible and gets eliminated from the second expression.
  • Result: T_{m−n} = ê^t Q_y^{−1} ê (same as the first expression).

📐 The s²-hat test statistic

📐 Definition and notation

The test statistic s²-hat is defined as: s²-hat = T_{m−n} / (m − n) = (ê^t Q_y^{−1} ê) / (m − n).

  • The test using s²-hat is completely identical to the test using T_{m−n}.
  • The notation "s²-hat" is used because this quantity has a special interpretation.

📐 Why s²-hat is an unbiased estimator

From Adjustment theory (Section 2.4):

  • If the variance matrix of y is D_y = σ² Q_y, where σ² is the variance factor of unit weight,
  • Then: E{ê^t D_y^{−1} ê} = (m − n)
  • This implies: E{s²-hat} = σ²

Interpretation:

  • s²-hat is an unbiased estimator of the variance factor of unit weight σ².
  • This is the reason for the notation "s²-hat" in the formula.

📐 The test formulation

The generalized likelihood ratio test reads:

  • Compute T_{m−n} = ê^t Q_y^{−1} ê
  • Or equivalently: compute s²-hat = T_{m−n} / (m − n)
  • Reject H_0 if T_{m−n} > critical value from chi-squared distribution with (m − n) degrees of freedom
  • Or equivalently: reject H_0 if s²-hat > (critical value) / (m − n)

Distribution:

  • Under H_0: T_{m−n} follows a central chi-squared distribution with (m − n) degrees of freedom
  • Under H_A: T_{m−n} follows a non-central chi-squared distribution with (m − n) degrees of freedom and non-centrality parameter λ

🛡️ Practical importance as an overall model test

🛡️ No need to specify the alternative hypothesis

Key advantage:

  • For q = m − n, no matrix C_y or C_t needs to be specified.
  • This contrasts with all cases where q < m − n, which require you to specify C_y or C_t.

Why this matters:

  • To specify C_y or C_t, you need some idea of what kind of misspecifications to expect in H_0.
  • In some cases this is possible (e.g., conventional alternative hypotheses in datasnooping for geodetic networks).
  • But you can never completely specify the class of alternative hypotheses for a particular problem, because you never know beforehand what misspecification has occurred in H_0.

🛡️ Role as a safeguard

The test for q = m − n gives an indication of the validity of H_0 without the need to specify the alternative hypothesis through C_y or C_t. As such it can be considered an overall model test.

Example scenario:

  • Suppose you have a geodetic network and suspect certain types of errors (e.g., blunders in specific observations).
  • You can test for those specific errors using q < m − n tests.
  • But there may be other misspecifications you haven't thought of.
  • The overall model test (q = m − n) acts as a catch-all: if H_0 is invalid for any reason, this test has a chance of detecting it.

Don't confuse:

  • The overall model test (q = m − n) vs. specific alternative tests (q < m − n):
    • Overall test: detects any deviation from H_0, but doesn't tell you what the problem is.
    • Specific test: targets a particular type of misspecification, more powerful for that specific case but blind to others.
  • Appendix C elaborates on the relation between the overall model test and the w-test (from the previous section).

🛡️ Summary table

The excerpt provides Table 4.4 summarizing:

AspectDescription
HypothesesH_0 or H_A (overall model test)
Test statistics²-hat or T_{m−n}
DistributionChi-squared with (m − n) degrees of freedom (central under H_0, non-central under H_A)
Generalized likelihood ratio testReject H_0 if test statistic exceeds critical value
17

Internal reliability

4.5 Internal reliability

🧭 Overview

🧠 One-sentence thesis

Internal reliability measures the size of model error (misspecification) that a hypothesis test can detect with a chosen probability, and it improves when the test's power is increased through better measurement precision, more observations, or optimal network design.

📌 Key points (3–5)

  • What internal reliability measures: the model error vector that can be detected with a reference probability γ₀ (e.g., 80%) by the generalized likelihood ratio test.
  • How to compute it: fix the detection probability γ₀, compute the corresponding non-centrality parameter λ₀, then solve for the model error vector that produces that λ₀.
  • What affects the power (and thus internal reliability): test size α, degrees of freedom q, and non-centrality parameter λ—which depends on precision (Qᵧ), design matrix (A), and the separation between hypotheses.
  • Common confusion: power γ vs internal reliability—power is the probability of rejecting H₀ when Hₐ is true; internal reliability is the size of the model error that can be detected with that probability.
  • Practical implication: internal reliability can be improved by increasing measurement precision, adding more observations, or optimizing the network structure.

📊 Power of the generalized likelihood ratio test

📊 What the power depends on

The generalized likelihood ratio test (equations 102 or 106) has power:

γ = P(reject H₀ | Hₐ is true)

This power depends on three factors:

  1. Test size α: the probability of type I error (rejecting H₀ when it is true).
  2. Degrees of freedom q: the number of constraints or additional parameters in Hₐ.
  3. Non-centrality parameter λ: a measure of the separation between H₀ and Hₐ.

🔼 How power changes with α, q, and λ

FactorDirection of changeEffect on power γWhy
αIncreasePower increasesLarger α → smaller critical value → easier to reject H₀
qIncreasePower decreasesMore parameters in Hₐ → less "information" → harder to detect Hₐ
λIncreasePower increasesLarger λ → greater separation between H₀ and Hₐ → easier to distinguish
  • Don't confuse: increasing α improves power but also increases the risk of false rejection (type I error); in practice, α is usually fixed at a standard value.
  • Example: Table 4.7 shows that for q=1, λ=18, α=0.05, the power is 0.9888; but for q=7 with the same λ and α, the power drops to 0.8946.

🧮 The non-centrality parameter λ

The non-centrality parameter is defined as:

λ = (Cᵧ∇)ᵀ Qᵧ⁻¹ (Cᵧ∇)

where:

  • Cᵧ∇: the difference between the expected value of y under Hₐ and under H₀ (the "separation").

  • Qᵧ: the variance matrix of the observables.

  • A: the design matrix (implicitly in the formula through the projection).

  • The non-centrality parameter λ measures the squared separation between H₀ and Hₐ, weighted by the precision of the observations.

  • Larger λ → greater separation → easier to detect the alternative hypothesis.

🛠️ How to improve internal reliability

🔬 Increase measurement precision (Qᵧ)

  • If the variance matrix is replaced by μQᵧ (where μ is a positive scalar), the non-centrality parameter becomes λ/μ.
  • Smaller μ → higher precision → larger λ → better power.
  • Practical implication: choosing more precise measurement equipment improves the test's ability to detect model errors.

🏗️ Change the network structure (A)

  • In geodetic network applications, the design matrix A depends on the network structure.
  • Changing the network structure changes A and therefore changes λ.
  • Key result: one can design a network that is optimal in the sense that it gives a test with sufficient power.

➕ Add more observations

  • Increasing the number of observations increases the non-centrality parameter.
  • The excerpt proves this by comparing two models: one with m observations and one with m+1 observations.
  • The proof shows that λ_new ≥ λ_old (equation 126), meaning the power always improves (or stays the same) when an observation is added.
  • Example: comparing a two-loop leveling network (equations 156–158) with a one-loop network (equations 159–160) shows that the two-loop network detects a blunder in the second observation better (λᵧ = 2 vs λᵧ = 1).

📏 Increase the separation Cᵧ∇

  • The non-centrality parameter λ increases if the separation between E{y|Hₐ} and E{y|H₀} increases.
  • Note: only the component of Cᵧ∇ that lies outside the range space R(A) affects λ; the component inside R(A) has no effect.
  • In practice, Cᵧ∇ is unknown, but one can choose representative values (e.g., expected blunder sizes) to compute what the power would be.
  • Don't confuse: the actual separation vs a hypothetical separation—since Cᵧ∇ is unknown, one computes power for assumed separations to assess detectability.

🎯 From power to internal reliability

🔄 Reversing the logic

  • In geodetic practice, one is not primarily interested in the power γ for a given model error.
  • Instead, one wants to know: what size of model error can be detected with a chosen probability γ₀?
  • The approach:
    1. Fix a reference detection probability γ₀ (e.g., 50%, 60%, 70%, or typically 80%).
    2. From α, q, and γ₀, compute the corresponding non-centrality parameter λ₀.
    3. Solve the equation λ₀ = (Cᵧ∇)ᵀ Qᵧ⁻¹ (Cᵧ∇) for the vector ∇.
    4. The model error is then ∇ᵧ = Cᵧ∇.

Internal reliability: the vector ∇ᵧ that describes the model error in H₀ (with respect to Hₐ) that can be detected with probability γ₀.

  • Don't confuse the geodetic usage of "betrouwbaarheid" (reliability) with its usage in mathematical statistics—they are different concepts.

🔢 Case 1: q = 1 (single constraint)

When q=1, the matrix Cᵧ reduces to a vector cᵧ, and ∇ reduces to a scalar ∇.

The solution for ∇ is:

∇ = ± √(λ₀ / (cᵧᵀ Qᵧ⁻¹ cᵧ - cᵧᵀ Qᵧ⁻¹ A(Aᵀ Qᵧ⁻¹ A)⁻¹ Aᵀ Qᵧ⁻¹ cᵧ))

  • This is called the minimal detectable bias (grenswaarde).
  • One can only determine the size of ∇, not its sign.
  • Geometric interpretation: the denominator equals (cᵧᵀ Qᵧ⁻¹ cᵧ) cos²θ, where θ is the angle between cᵧ and the range space R(A).
  • Key insight: ∇ is large (poor internal reliability) when θ is close to 90°, meaning cᵧ is nearly orthogonal to R(A).
  • If cᵧ lies entirely in R(A), then θ=90° and ∇→∞, meaning the model error can never be detected.

📐 Case 2: 1 < q ≤ m-n (multiple constraints)

For q ≥ 2, the equation λ₀ = ∇ᵀ Cᵧᵀ Qᵧ⁻¹ Cᵧ ∇ describes an ellipsoid (or hyperellipsoid).

The vector ∇ can be parametrized as:

∇ = √λ₀ · (Cᵧᵀ Qᵧ⁻¹ Q̂ₑ Qᵧ⁻¹ Cᵧ)⁻¹/² d

where d is a unit vector in q-dimensional space.

  • As d scans the unit sphere, ∇ scans the ellipsoid.
  • The principal axes correspond to the eigenvectors of the matrix Cᵧᵀ Qᵧ⁻¹ Q̂ₑ Qᵧ⁻¹ Cᵧ.

🔍 Local redundancy and minimal detectable bias

🔢 Local redundancy number

In the special case of datasnooping (testing one observation at a time) with a diagonal variance matrix Qᵧ, the minimal detectable bias for the i-th observation is:

∇ᵢ = √λ₀ · σᵧᵢ / √(σ²ŷᵢ / σ²ᵧᵢ)

The dimensionless ratio:

rᵢ = σ²ŷᵢ / σ²ᵧᵢ

is called the i-th local redundancy number.

  • Since 0 ≤ σ²ŷᵢ ≤ σ²ᵧᵢ, the local redundancy number always lies in the interval [0, 1].
  • Why "redundancy"? The sum of all local redundancy numbers equals the total redundancy (m - n), where m is the number of observations and n is the number of parameters.
  • The average redundancy is r̄ = (m - n) / m.
  • A rough approximation for the minimal detectable bias is:

∇ᵢ ≈ √(λ₀ / r̄) · σᵧᵢ

📊 Bias-to-noise ratio

Instead of evaluating the full vector ∇ᵧ for every alternative hypothesis (which can be cumbersome), one can use a scalar measure:

Bias-to-noise ratio: λᵧ = ∇ᵧᵀ Qᵧ⁻¹ ∇ᵧ

  • Large λᵧ → the model error ∇ᵧ is significant.
  • Small λᵧ → the model error is insignificant.
  • Note that λᵧ = ||Cᵧ∇||², the squared separation between E{y|H₀} and E{y|Hₐ}.

For datasnooping with diagonal Qᵧ:

λᵧᵢ = ∇ᵢ² / σ²ᵧᵢ = λ₀ · rᵢ

The maximum value λₘₐₓ equals the largest eigenvalue of the generalized eigenvalue problem, and provides an upper bound: λᵧ ≤ λₘₐₓ.

🌉 Example: leveling networks

🔁 Two-loop vs one-loop network

Two-loop network (Figure 4.15):

  • Model: 5 observations, 2 condition equations.
  • For the second observation y₂, the variance of the residual is σ²ê₂ = ½σ²ᵧ₂.
  • Local redundancy: r₂ = ½.
  • Minimal detectable bias: ∇₂ = √(2λ₀) · σᵧ₂.
  • Bias-to-noise ratio: λᵧ₂ = 2λ₀.

One-loop network (Figure 4.16):

  • Model: 4 observations, 1 condition equation.
  • For the second observation y₂, the variance of the residual is σ²ê₂ = ¼σ²ᵧ₂.
  • Local redundancy: r₂ = ¼.
  • Minimal detectable bias: ∇₂ = 2√λ₀ · σᵧ₂.
  • Bias-to-noise ratio: λᵧ₂ = λ₀.

Comparison: the two-loop network has better internal reliability for y₂ (smaller minimal detectable bias, larger bias-to-noise ratio) because it has more redundancy.

  • Key takeaway: adding observations (more loops) improves the ability to detect blunders.
18

External reliability

4.6 External reliability

🧭 Overview

🧠 One-sentence thesis

External reliability quantifies how undetected model errors in observations propagate into the final adjusted parameters and derived quantities, providing bounds on the bias-to-noise ratio for different levels of results.

📌 Key points (3–5)

  • What external reliability measures: the influence of possibly non-detected model errors (bias vector —y) on final results like adjusted parameters , not just on adjusted observations.
  • Three levels of influence: (i) on the full parameter vector , (ii) on a subset x̂₁, and (iii) on a scalar linear function q̂ = a'x̂.
  • Key relationship: the bias-to-noise ratio for parameters lx̂ relates to the internal reliability measure ly via lx̂ = ly − lx̂, forming an orthogonal decomposition of the model error.
  • Common confusion: don't confuse internal reliability (detectability of errors) with external reliability (impact of undetected errors on results); external reliability assumes the error was not detected.
  • Practical use: external reliability provides upper bounds on how much individual parameters or functions can be biased by undetected errors, with sharper bounds available for subsets (lx̂₁ ≤ lx̂).

📐 Definition and motivation

📐 What external reliability is

External reliability (uitwendige betrouwbaarheid): the influence of the model error —y on the final results of a geodetic computation or adjustment.

  • Internal reliability (from Section 4.5) describes the model error —y that can be detected with probability β₀ using the generalized likelihood ratio test.
  • External reliability addresses what happens if that error is not detected: how does it affect the final parameters?
  • The excerpt emphasizes that final results are usually not the adjusted observations ŷ, but derived quantities like coordinates (parameters x).

🎯 Why it matters

  • Geodetic computations produce parameters (e.g., coordinates) from observations.
  • Even if an observation error passes the detection test, it still biases the parameters.
  • External reliability quantifies this bias relative to the parameter noise (variance), giving a "bias-to-noise" ratio.

🧩 Influence on the full parameter vector

🧩 The bias vector —x̂

  • Under the null hypothesis H₀, the least-squares estimator is with expectation E[x̂|H₀] = x.

  • Under the alternative Hₐ (model error —y present), the expectation shifts: E[x̂|Hₐ] = E[x̂|H₀] + —x̂.

  • Substituting E[y|Hₐ] = Ax + Cy— gives:

    —x̂ = (A'Q⁻¹A)⁻¹A'Q⁻¹Cy—

  • Using abbreviations — = Cy— and —x̂ = E[x̂|Hₐ] − E[x̂|H₀], this becomes:

    (162) —x̂ = (A'Q⁻¹A)⁻¹A'Q⁻¹—

🔺 Orthogonal decomposition

  • The model error —y decomposes orthogonally into two components:

    • A—x̂: the part that affects the parameters (lies in the range space of A).
    • Pₐ⊥—y: the part orthogonal to A (does not affect parameters).
  • Formula (163):

    —y = A—x̂ + Pₐ⊥—y

    where Pₐ⊥ is the orthogonal projector onto the orthogonal complement of the range of A.

  • This is illustrated in Figure 4.17: —y splits into a component in R(A) and a component in R(A)⊥.

📊 Bias-to-noise ratio lx̂

  • Define the squared bias-to-noise ratio for as:

    (164) lx̂ = —x̂'Qx̂⁻¹—x̂

    where Qx̂ is the variance matrix of .

  • Large lx̂ → the model error significantly biases the parameters.

  • Small lx̂ → the bias is small relative to parameter noise.

  • Since Qx̂⁻¹ = A'Qy⁻¹A, formula (165) shows:

    lx̂ = —y'Qy⁻¹Pₐ—y = ||Pₐ—y||²

    (the squared norm of the projection of —y onto R(A)).

🔗 Relationship to internal reliability ly

  • From the Pythagoras theorem applied to (163):

    ||—y||² = ||Pₐ—y||² + ||Pₐ⊥—y||²

  • Recognizing ly = ||—y||² (internal reliability measure from (157)) and lx̂ = ||Pₐ—y||², we get:

    (167) lx̂ = ly − lx̂

  • This provides two ways to compute lx̂:

    • Directly via (164) using —x̂ and Qx̂.
    • Indirectly via (167) using ly (easier when Qy is diagonal).

📏 Upper bound for individual parameters

  • For the i-th element —x̂ᵢ of —x̂, the excerpt derives:

    (170) |—x̂ᵢ| / σx̂ᵢ ≤ √lx̂

    where σx̂ᵢ = √[cᵢ'(A'Qy⁻¹A)⁻¹cᵢ] is the standard deviation of x̂ᵢ and cᵢ is the i-th unit vector.

  • This bound follows from the cosine rule applied to the inner product representation of —x̂ᵢ.

  • Example: if lx̂ = 4, then any individual parameter's bias-to-noise ratio is at most 2.

🔄 Data snooping case

  • For data snooping (testing one observation at a time, q=1) with diagonal Qy, formula (171) simplifies:

    lx̂ = (δ₀² / σ²ŷᵢ) · rᵢ

    where rᵢ is the local redundancy number (from (172)).

  • This shows lx̂ depends on the minimal detectable bias δ₀ and the redundancy of the tested observation.

🧩 Influence on a subset x̂₁

🧩 Partitioned parameters

  • Partition x̂ = (x̂₁, x̂₂) where x̂₁ is the subset of interest (e.g., coordinates of a specific point).

  • The linear model partitions as y = A₁x₁ + A₂x₂ + e.

  • The bias vector for x̂₁ is:

    (180) —x̂₁ = (A₁'Qy⁻¹A₁)⁻¹A₁'Qy⁻¹—

    (compare with (162) for the full ).

🔺 Refined orthogonal decomposition

  • The projection Pₐ—y (which affects all parameters) further decomposes:

    (184) Pₐ—y = A₁—x̂₁ + Pₐ₂—y

    where Pₐ₂ projects onto R(A₂) (the part affecting only x₂, not x₁).

  • This is shown in Figure 4.18: Pₐ—y splits into a component in R(A₁) and a component in R(A₂).

📊 Bias-to-noise ratio lx̂₁

  • Define analogously to (164):

    (185) lx̂₁ = —x̂₁'Qx̂₁⁻¹—x̂₁

  • From (186):

    lx̂₁ = ||Pₐ₁—y||²

  • Applying Pythagoras to (184):

    lx̂ = lx̂₁ + ||Pₐ₂—y||²

  • Therefore:

    (188) lx̂₁ = lx̂ − ||Pₐ₂—y||² ≤ lx̂

  • Substituting (167):

    (189) lx̂₁ = ly − lx̂ − ||Pₐ₂—y||²

📏 Sharper bound for subset elements

  • For the i-th element x̂₁ᵢ of x̂₁:

    (190) |—x̂₁ᵢ| / σx̂₁ᵢ ≤ √lx̂₁

  • Since lx̂₁ ≤ lx̂, this bound is sharper (tighter) than (170).

  • Don't confuse: the bound (190) applies only to elements of x̂₁, not to all of .

🧩 Influence on a scalar function q̂ = a'x̂

🧩 Linear function bias

  • Consider an arbitrary scalar linear function q̂ = a'x̂ (e.g., a specific coordinate difference or distance).

  • The bias is:

    (193) —q̂ = a'—x̂

  • Using the cosine rule:

    (194) —q̂ = σq̂ · √lx̂ · cos(θ)

    where σq̂ = √(a'Qx̂a) is the standard deviation of and θ is the angle between a and —x̂.

📏 Universal upper bound

  • Since |cos(θ)| ≤ 1, the bound is:

    (195) |—q̂| / σq̂ ≤ √lx̂

  • This shows √lx̂ provides an upper bound for the bias-to-noise ratio of every linear function of .

  • Example: if you compute any derived quantity (distance, angle, area) as a linear function of , its bias-to-noise ratio cannot exceed √lx̂.

📋 Summary table

The excerpt provides Table 4.9 summarizing the three cases:

CaseBias vectorBias-to-noise ratioRelationshipIndividual bound
(i) Full —x̂ = (A'Qy⁻¹A)⁻¹A'Qy⁻¹—lx̂ = —x̂'Qx̂⁻¹—x̂lx̂ = ly − lx̂|—x̂ᵢ|/σx̂ᵢ ≤ √lx̂
(ii) Subset x̂₁—x̂₁ = (A₁'Qy⁻¹A₁)⁻¹A₁'Qy⁻¹—lx̂₁ = —x̂₁'Qx̂₁⁻¹—x̂₁lx̂₁ = ly − lx̂ − ||Pₐ₂—y||²|—x̂₁ᵢ|/σx̂₁ᵢ ≤ √lx̂₁
(iii) Function q̂ = a'x̂—q̂ = a'—x̂|—q̂|/σq̂|—q̂|/σq̂ ≤ √lx̂

🔍 Data snooping special case

When testing one observation at a time with diagonal Qy:

lx̂ = (δ₀² / σ²ŷᵢ) · rᵢ

where rᵢ is the local redundancy number.

🧷 Key distinctions

🧷 Internal vs external reliability

  • Internal reliability: describes the size of model error —y that can be detected with probability β₀.
  • External reliability: describes how an undetected model error —y biases the final parameters .
  • They are complementary: internal reliability is about detection power; external reliability is about impact if detection fails.

🧷 Full vs subset vs function

  • Full parameter vector: lx̂ measures overall parameter bias; provides a universal bound for all derived quantities.
  • Subset: lx̂₁ ≤ lx̂ gives a sharper bound for a specific subset of parameters (e.g., coordinates of one point).
  • Scalar function: |—q̂|/σq̂ ≤ √lx̂ applies to any linear combination of parameters; the bound uses the full lx̂, not a function-specific value.
  • Don't confuse: the bound (190) for subset elements is tighter than (170), but only applies to elements of x̂₁.

🧷 Orthogonal decompositions

  • First decomposition (Figure 4.17): —y = A—x̂ + Pₐ⊥—y splits the observation error into a part affecting parameters and a part that doesn't.
  • Second decomposition (Figure 4.18): Pₐ—y = A₁—x̂₁ + Pₐ₂—y further splits the parameter-affecting part into what affects x̂₁ vs x̂₂.
  • These decompositions are orthogonal (Pythagoras theorem applies), enabling the additive relationships lx̂ = ly − lx̂ and lx̂ = lx̂₁ + ||Pₐ₂—y||².
19

4.7 Reliability: an example

4.7 Reliability: an example

🧭 Overview

🧠 One-sentence thesis

The example of fitting a straight line to observations demonstrates how the detectability of blunders and their influence on parameter estimates depend on the geometric position of data points relative to the cluster center.

📌 Key points (3–5)

  • The model: a straight-line fit (intercept x₁ and slope x₂) to observations, where least-squares minimizes the sum of squared vertical distances.
  • Minimal detectable bias: blunders in observations near the cluster center (coordinate aᵢ close to the mean coordinate a̅) are easier to detect than blunders at the edges.
  • Bias-to-noise ratio for intercept: the effect of an undetected blunder on the intercept estimator is less significant for points with large coordinates aᵢ.
  • Bias-to-noise ratio for slope: the effect on the slope estimator is insignificant when aᵢ is close to the mean a̅, but increases as aᵢ differs more from a̅.
  • Common confusion: don't assume all observations contribute equally to reliability—position in the coordinate space matters for both detectability and influence.

📐 The straight-line model

📐 Model structure

The observation equations are:

  • E(yᵢ) = x₁ + aᵢ·x₂
  • where x₁ is the intercept and x₂ is the slope of the line.
  • The observables yᵢ are assumed normally distributed.
  • The variance matrix Qᵧ is diagonal (observations are uncorrelated).

🎯 Least-squares estimation

Least-squares estimates x̂₁ and x̂₂ follow from minimizing the sum of the squares of the vertical distances from the points (aᵢ, yᵢ) to the straight line.

  • The vertical distance from point (aᵢ, yᵢ) to the line is (yᵢ − x₁ − aᵢ·x₂).
  • The minimization problem is: min over x₁, x₂ of (1/σ²) Σ(yᵢ − x₁ − aᵢ·x₂)².

📊 Variance matrix properties

The variance matrix of the estimators reveals three geometric insights:

PropertyConditionInterpretation
Uncorrelated estimatorsa̅ = 0x̂₁ and x̂₂ are uncorrelated if coordinates aᵢ are symmetric about zero
Negative covariancea̅ > 0Points in first/fourth quadrant: increase in x₁ implies decrease in x₂ for optimal fit
Large variancesaⱼ close to a̅ for all jThe closer all coordinates are to the mean, the harder it is to estimate x₁ and x₂ (columns of design matrix become nearly dependent)

Don't confuse: the mean coordinate a̅ is not the origin; it is the average of the aᵢ values in the dataset.

🔍 Minimal detectable bias

🔍 Detectability depends on position

The minimal detectable bias for the i-th observable is given by formula (210):

  • Δᵢ is smaller (blunder is more detectable) when the coordinate aᵢ is closer to the mean a̅.
  • Δᵢ is larger (blunder is less detectable) when aᵢ is near the left or right edges of the cluster.

Example: If a cluster of points has coordinates ranging from a₁ = 1 to aₘ = 10 with mean a̅ = 5.5, a blunder in an observation with aᵢ ≈ 5.5 is easier to detect than a blunder in an observation with aᵢ ≈ 1 or aᵢ ≈ 10.

🧮 Why center points are more reliable

  • The formula shows that detectability improves as (aᵢ − a̅)² decreases.
  • Points at the center contribute more to the overall fit constraint, making deviations more noticeable.
  • Edge points have more "freedom" to deviate without strongly affecting the fit.

📉 Bias-to-noise ratios

📉 Effect on the intercept estimator (x̂₁)

Formula (213) gives the bias-to-noise ratio for the intercept:

  • l_x̂₁ is small (effect is less significant) when:
    • aᵢ is large (far from zero), and/or
    • aᵢ is close to a̅.
  • Interpretation: an undetected blunder in an observation with large coordinate aᵢ has less impact on the intercept estimate.

Example: For a line fit, a blunder in an observation at aᵢ = 20 affects the intercept less than a blunder at aᵢ = 2, because the intercept is the value at a = 0, and points farther from zero have less leverage on that value.

📈 Effect on the slope estimator (x̂₂)

Formula (215) gives the bias-to-noise ratio for the slope:

  • l_x̂₂ is zero (no effect) when aᵢ = 0.
  • l_x̂₂ is small (effect is insignificant) when aᵢ is close to a̅.
  • l_x̂₂ increases as aᵢ differs more from a̅.

Interpretation: an undetected blunder affects the slope most when the observation is far from the cluster center.

Example: If the mean coordinate is a̅ = 5, a blunder in an observation at aᵢ = 15 will have a much larger effect on the estimated slope than a blunder at aᵢ = 5.

🔄 Comparing intercept and slope sensitivity

  • Intercept: more sensitive to blunders in observations with small aᵢ (near the y-axis).
  • Slope: more sensitive to blunders in observations far from the mean a̅ (at the edges of the cluster).
  • Don't confuse: "large aᵢ" (distance from origin) vs. "aᵢ far from a̅" (distance from cluster center)—these affect intercept and slope differently.

🧩 Summary of reliability insights

🧩 Geometric interpretation

The example demonstrates that reliability is not uniform across observations:

  • Detectability (internal reliability): best at the cluster center.
  • Influence on intercept (external reliability): smallest for large aᵢ and near a̅.
  • Influence on slope (external reliability): smallest near a̅, largest at cluster edges.

🧩 Practical implications

  • Observations at the center of the coordinate range are more reliable for detecting blunders.
  • Observations at the edges have greater leverage on the slope estimate, so undetected blunders there are more damaging.
  • The rough approximation mentioned (formula 147 from Section 4.5) simplifies the minimal detectable bias calculation but may not capture these geometric effects.