Revision Notes/Unit 5 — Descriptive Statistics/Centre, Spread, Standardisation/Story

Centre, Spread, Standardisation

Unit 5 — Descriptive Statistics

Maya's Descriptive Summary

Maya has plotted her sore-throat data and convinced herself the distributions are reasonable. Now she needs *numbers* — the small set of summaries that compress the data while preserving what matters. This unit is the numerical companion to Unit 4's visualisation.

The descriptive statistics split into two families: measures of central tendency (the "typical" value) and measures of dispersion (how spread out the values are).

Central tendency

Mean

The arithmetic average, $\overset{x}{ˉ} = \sum x_{i} / n$ . The most sensitive and precise measure when data are roughly symmetric. Physically, the centre of mass — the balance point of the data on a number line.

Sensitive to extreme values. A single billionaire can drag up the mean income of a small town. This is why news articles say "Bill Gates walks into a bar and the average wealth becomes a billion".

Median

The middle value when the data are sorted. The 50th percentile. Robust to outliers — one billionaire moves the median by zero. Useful for skewed distributions like income, reaction times, or housing prices.

Mode

The most frequently occurring value. The only meaningful central tendency for nominal data — "average eye colour" is nonsense; mode = most common eye colour. For bimodal distributions, reporting both modes is informative.

Which measure to use, by variable type

The exam table you may need to fill in:

| Scale | Mean | Median | Mode | | --- | --- | --- | --- | | Nominal | ❌ | ❌ | ✅ | | Ordinal | (practice: ✅, strictly: ❌) | ✅ | ✅ | | Interval | ✅ | ✅ | ✅ | | Ratio | ✅ + geometric mean | ✅ | ✅ |

Advantages and disadvantages

Mean. *Advantage:* most sensitive and exact measure. Basis of significance testing and ANOVA. Lets us estimate population parameters from sample data. *Disadvantage:* a single extreme value can seriously distort it.

Median. *Advantage:* not susceptible to extreme values. *Disadvantage:* can be unrepresentative if the dataset is small (small samples have flickery medians).

Mode. *Advantage:* indicates the most typical value. Unaffected by extreme scores. Sometimes more informative than the mean. *Disadvantage:* not useful when several values occur equally frequently in a small dataset.

Dispersion / spread

Range

max − min. The simplest. Extremely sensitive to outliers — one typo of 999 turns your range into nonsense.

Interquartile range (IQR)

$IQR = Q_{3} - Q_{1}$ . The width of the middle 50% of the data. Robust to outliers. Used in boxplots and in outlier detection (anything > 1.5 × IQR beyond the quartiles is flagged).

Variance

The mean squared deviation from the mean.

s^{2} = \frac{1}{n - 1} i = 1 \sum n (x_{i} - \overset{x}{ˉ})^{2} (sample, with Bessel’s correction)

Why squared, not absolute? Two reasons. First, squaring keeps positive and negative deviations from cancelling. Second, the squaring gives the variance nice algebraic properties — additivity for independent variables, smooth derivatives — that make it the basis of significance testing and ANOVA. *Downside:* variance is in *squared units* (e.g., "minutes squared"), which makes it hard to interpret directly.

Standard deviation (SD)

$s = s^{2}$ . Back in the original units of the variable. The most common measure of spread.

Advantage: fundamental to significance testing and ANOVA. Disadvantage: distorted by extreme values (squared deviations amplify outliers). No information about distribution shape — same SD can come from a unimodal, bimodal, or skewed distribution.

MAD — Median Absolute Deviation

$MAD = median_{i} ∣ x_{i} - \tilde{x} ∣$ . Robust analog of SD. For Normal data, $σ \approx 1.4826 \cdot MAD$ .

Bessel's correction — why (n − 1)?

Sample variance divides by $n - 1$ , not $n$ . Why?

Dividing by $n$ would produce a *biased* estimator because $\overset{x}{ˉ}$ is itself computed from the data and fits the sample more tightly than the true population mean $μ$ would. The residuals $(x_{i} - \overset{x}{ˉ})$ are systematically smaller than $(x_{i} - μ)$ , so dividing by $n$ underestimates $σ^{2}$ .

The correction $(n - 1)$ — called Bessel's correction or the degrees-of-freedom adjustment — exactly compensates, making $E [s^{2}] = σ^{2}$ .

The intuition: one degree of freedom is "spent" estimating the mean, leaving $(n - 1)$ free pieces of information for estimating spread. This is the same DoF logic that appears throughout ANOVA, t-tests, and regression.

When measures fail to be representative

Worth memorising — direct exam fodder:

Highly skewed distributions — the mean is pulled toward the tail, no longer representative. Use the median.
Bimodal distributions — no single value is typical; both peaks are interesting. Report both modes and visualise.
Distributions with outliers — outliers dominate the mean and SD. Use median and IQR, or trim / winsorise the outliers.
Small samples — every measure becomes unstable.

Mean–median–mode relationship and skew

For a roughly symmetric, unimodal distribution: mean ≈ median ≈ mode.

Right-skewed (positive skew): long tail to the right. Mean > median > mode. Classic: reaction times, income, house prices.
Left-skewed (negative skew): long tail to the left. Mean < median < mode. Classic: test accuracy with a ceiling effect, lifespan.
Bimodal: mean and median fall *between* the modes — unrepresentative of any actual observation. Report both peaks.

This is the diagnostic for skew if you can't plot: compare mean and median.

z-scores — comparing across scales

z = \frac{x - μ}{σ}

The standardised value: how many SDs above or below the mean. Unit-less; preserves shape; allows comparison across scales / units.

*Example:* SAT 1400, $μ = 1050$ , $σ = 200 \Rightarrow z = 1.75$ → 96th percentile under Normal.

z-scores let Maya combine variables on different units (IQ + GPA + age) by first z-scoring each. Under Normal, $∣ z ∣ > 2$ is outlier-ish; $∣ z ∣ > 3$ is very extreme.

Coefficient of variation

$CV = s / \overset{x}{ˉ}$ — unitless relative dispersion. Useful when comparing variables with different units. Weight ( $\overset{x}{ˉ} = 70$ kg, $s = 10$ ): CV = 0.143. Height ( $\overset{x}{ˉ} = 170$ cm, $s = 7$ ): CV = 0.041. Weight is relatively more variable than height.

Data transformations

When data are heavily skewed or you need normality, transformations can rescue you:

Log — most common for right-skewed positive data. Pulls in the right tail; useful for incomes, reaction times.
Square root — milder than log.
Reciprocal (1/x) — dramatic for very right-skewed positive data.
Box-Cox — general $x^{λ}$ family; finds optimal $λ$ automatically.

A transformation:

1. Increases the applicability of statistical techniques based on the normality assumption. 2. Is *not* a guarantee of normality — check the transformed distribution. 3. Only works if all the data are positive and > 0. For zeros / negatives, add a constant first.

Caveat (exam-relevant): once you transform, you can only interpret in the *transformed* variable, not the original. If you log-transformed reaction times and then ran ANOVA, report "log reaction times differed across groups" — not "reaction times differed by $X$ seconds".

The Normal distribution — revisited as a descriptive shape

As a descriptive shape:

1. Bell-shaped, symmetric around the mean. 2. Mean = median = mode — they all fall on the midpoint. 3. 68 / 95 / 99.7 rule: $μ \pm 1 σ$ → ~68%, $μ \pm 2 σ$ → ~95%, $μ \pm 3 σ$ → ~99.7%. 4. Almost all values within 3 SDs of the mean.

Many parametric tests (t, ANOVA, regression) assume the data — or more precisely, the *residuals* or *sampling distributions* — are approximately Normal. When they're not, you either rely on the CLT (large samples), transform the data, or use non-parametric methods.

Putting it together — Maya's descriptive table

Before any test:

| Statistic | Value | | --- | --- | | n | … | | Mean | … | | Median | … | | SD | … | | IQR | … | | Min / Max | … | | Skew | … | | Missing | … |

This is what Maya assembles before running any inferential test. It tells her whether to use the mean / SD or the median / IQR, whether parametric or nonparametric, whether to transform.

What you carry into the exam

Mean / median / mode — which to use by scale (NOIR) and by shape (skew, outliers, bimodality).
Range / IQR / variance / SD / MAD — when each is appropriate. Robust trio: median + IQR + MAD.
Bessel's correction — divide by $(n - 1)$ to unbias sample variance. One DoF spent.
Squared vs absolute deviations — variance is L2 (mean centres it); MAD is L1 (median centres it).
z-score: $(x - μ) / σ$ , unit-less, preserves shape.
Coefficient of variation: $s / \overset{x}{ˉ}$ , for cross-unit comparison.
Skew diagnosis: mean > median > mode ⇒ right skew.
68 / 95 / 99.7 rule for Normal.
Data transformations (log, sqrt, 1/x, Box-Cox) and their interpretation caveats.

When you're ready, send "next" and we'll move into correlation and reliability quantified — Pearson r, Spearman ρ, partial correlations, Cohen's κ, Cronbach's α, and the rules for when each applies.

Behavioral Research: Statistical Methods