Introduction to Statistics
1.1 Introduction to Statistics
🧭 Overview
🧠 One-sentence thesis
Statistics translates data into knowledge by collecting, analyzing, and interpreting information to help people make educated decisions in everyday life and professional contexts.
📌 Key points (3–5)
- What statistics does: organizes, summarizes, and draws conclusions from data through descriptive and inferential methods.
- Two main branches: descriptive statistics (organizing and summarizing) vs. inferential statistics (drawing conclusions using probability).
- Population vs. sample: studying an entire population is often impractical, so we select samples and use statistics to estimate parameters.
- Common confusion: statistic vs. parameter—a statistic describes a sample; a parameter describes the whole population.
- Why it matters: statistical methods help evaluate claims, make informed decisions, and determine confidence in conclusions.
📊 What statistics is and why we need it
📊 The core purpose
"Statistics' ultimate goal is translating data into knowledge." – Alan Agresti & Christine Franklin
- Statistics appears everywhere: news reports, weather forecasts, education, crime data, sports, real estate, and more.
- When you encounter sample information in media, statistical methods help you evaluate whether claims are correct.
- Example: deciding whether to buy a house, manage a budget, or trust a news report all involve analyzing statistical information.
🎓 Practical applications
- Many professions require statistical knowledge: economics, business, psychology, education, biology, law, computer science, police science, and early childhood development.
- The goal is not to perform endless calculations but to interpret data to gain understanding.
- Calculations can be done by calculators or computers; the understanding must come from you.
🔍 The two branches of statistics
🔍 Descriptive statistics
Descriptive statistics: organizing, summarizing, and presenting data.
- This is the foundation—learning how to organize and summarize data first.
- Data can be summarized with graphs or with numbers (e.g., finding an average).
- Example: calculating the average grade in one class is descriptive.
🔬 Inferential statistics
Inferential statistics: formal methods for drawing useful conclusions from data while filtering out noise.
- After studying probability and probability distributions, you use these formal methods.
- Effective inference depends on good data collection procedures and thoughtful examination.
- Statistical inference uses probability to determine how confident you can be that your conclusions are correct.
- Example: using a sample average to test whether a claim about the entire population is valid.
🎲 Probability and randomness
🎲 What probability measures
Probability: a mathematical tool used to study randomness; it deals with the chance (likelihood) of an event occurring.
- Individual outcomes are uncertain, but a regular pattern emerges with many repetitions.
- Example: tossing a fair coin four times may not yield two heads and two tails, but tossing it 4,000 times will produce results close to half heads and half tails.
- The expected theoretical probability of heads in one toss is one-half or 0.5.
🎯 Real-world uses
- Predictions take the form of probabilities: likelihood of an earthquake, rain, getting an A in a course.
- Doctors use probability to assess medical test accuracy.
- Stockbrokers use it to determine investment returns.
- You might use it to decide whether to buy a lottery ticket.
📖 Historical note
- Probability theory began with studying games of chance like poker.
- Example: Karl Pearson tossed a coin 24,000 times and got 12,012 heads; another researcher tossed 2,000 times and got 996 heads (fraction 996/2000 = 0.498, very close to 0.5).
🔑 Key terminology
🔑 Population and sample
Population: a collection of people or things under study.
Sample: a portion (or subset) of the larger population selected for study.
- Examining an entire population takes great resources (time, money, manpower), so we often study only a sample.
- Example: to compute overall GPA at a school, select a sample of students rather than surveying every single student.
- Example: presidential opinion polls sample 1,000–2,000 people to represent the entire country's population.
- The sample must contain the characteristics of the population to be a representative sample.
🔑 Parameter and statistic
Parameter: a number that describes a characteristic of the population.
Statistic: a number that represents a property of the sample.
- A statistic is an estimate of a population parameter.
- Example: the average points earned by students in one math class (sample) is a statistic; the average across all math classes (population) is a parameter.
- Don't confuse: statistic = sample property; parameter = population property.
| Term | What it describes | Example from excerpt |
|---|---|---|
| Parameter | Characteristic of the whole population | Average points across all math classes |
| Statistic | Characteristic of the sample | Average points in one math class |
🔑 Individuals, variables, and data
Individuals: the units about which we are collecting information (could be a person, animal, thing, or place).
Variable: a specific characteristic or measurement that can be determined for each individual (usually represented by capital letters like X or Y).
Values: the possible observations of the variable.
Data: the actual values of the variables of interest (may be numbers or words).
- If multiple variables are collected on an individual, the entire set may be called a case or observational unit.
📝 Example walkthrough
Study: We want to know the average amount of money first-year college students spend at ABC College on school supplies (excluding books). We randomly survey 100 first-year students. Three students spent $150, $200, and $225.
- Population: all first-year students at ABC College.
- Sample: the 100 students surveyed.
- Variable: amount of money spent on school supplies.
- Data: the actual values—$150, $200, $225, etc.
- Statistic: the average calculated from the 100 students.
- Parameter: the true average for all first-year students (unknown, estimated by the statistic).
🧮 The data analysis process
🧮 Four phases
The data analysis process consists of four phases:
1. Identify the research objective
- What questions are to be answered?
- What group should be studied?
- Have attempts been made to answer it before?
2. Collect the information needed
- Is data already available?
- Can you access the entire population?
- How can you collect a good sample?
3. Organize and summarize the information
- What visual descriptive techniques are appropriate?
- What numerical descriptive techniques are appropriate?
- What aspects of the data stick out?
4. Draw conclusions from the information
- What inferential techniques are appropriate?
- What conclusions can be drawn?
🎯 The main concern
- One of the main concerns in statistics is how accurately a statistic estimates a parameter.
- Accuracy depends on how well the sample represents the population.
- We are interested in both sample statistics and population parameters in inferential statistics.