Revision Notes/Unit 9 — Non-parametric & Categorical Tests/Categorical & Rank-Based Tests/Story

Categorical & Rank-Based Tests

Unit 9 — Non-parametric & Categorical Tests

When Your Data Refuses to Behave

Maya has been working under an assumption that quietly underlies every test she's used so far: the data is approximately normal, the variance is reasonable, the variable is measured on an interval or ratio scale. Sometimes none of those things are true. Her data is skewed beyond rescue. Her sample is too small for the CLT to save her. Or her outcome is categorical: yes/no, red/blue/green. Different problems, same need: tests that don't make those assumptions.

This is the world of non-parametric tests. The exam will test you heavily on (a) when to use them, (b) which one to use in a given scenario, and (c) the basic mechanics of each. Memorise the decision tree at the start of this unit and the rest follows.

The decision tree — which test goes with which design

Every exam will give you scenarios and ask you to pick a test. Internalise this 2×3 grid first:

| Test type | Between-subjects | Within-subjects | | --- | --- | --- | | Parametric (interval/ratio, Normal) | Independent t-test | Paired t-test | | Non-parametric for ordinal data | Mann-Whitney U | Wilcoxon signed-rank | | Non-parametric for categorical data | Chi-square test | Binomial Sign Test |

That table is half the answer to half the exam. Three more pieces extend it:

More than two groups, independent → One-way ANOVA (parametric) or Kruskal-Wallis (non-parametric).
More than two groups, repeated measures → Repeated-measures ANOVA (parametric) or Friedman (non-parametric).
Two or more categorical variables → Chi-square for independence.

Selecting a statistical test — what determines the choice

Three questions decide:

1. Level of measurement of the DV — interval/ratio, ordinal, or categorical/nominal. 2. Number of groups or conditions — two vs more than two. 3. Design — between-subjects vs within-subjects.

And one filter for parametric vs non-parametric within the interval/ratio row:

4. Are the parametric assumptions met? Normality, homogeneity of variance, no severe outliers, large enough n.

If interval/ratio and assumptions hold → parametric (more power). Otherwise → non-parametric.

Chi-square test — the workhorse for categorical data

When your data are counts in categories, you use χ². Two main applications.

Chi-square goodness-of-fit

Tests whether observed frequencies match an expected distribution.

H₀: observed distribution matches expected.
H₁: observed differs.

Classic: a bag of M&Ms vs the manufacturer's colour claim. Count observed colours, compute expected from the claim, compute:

χ^{2} = i = 1 \sum k \frac{( O _{i} - E _{i} ) ^{2}}{E _{i}}

df = k − 1 (one DoF lost because the total count is fixed). Compare to χ² critical value at chosen α and df.

Chi-square test for independence

Tests whether two categorical variables are related.

H₀: independent.
H₁: associated.

Data: a contingency table. Expected under independence:

E_{ij} = \frac{row _{i} total \times col _{j} total}{grand total}

df = (r − 1)(c − 1). For 2×2: df = 1. For 2×3: df = 2.

Effect size for χ²

Phi (φ) for 2×2: $φ = χ^{2} / n$ .
Cramér's V for larger: $V = χ^{2} / (n \cdot min (r - 1, c - 1))$ .

Both range 0 to 1. Standard report: *"χ²(2, n = 150) = 8.23, p < 0.05, V = 0.23 (moderate)."*

Limitations of chi-square — exam staples

1. Each observation must fall in exactly one cell. Between-subjects only; for paired use McNemar. 2. Only frequencies — not means, percentages, or ratios. 3. Each cell should have expected count ≥ 5. Below that, use Fisher's exact (for 2×2) or combine categories. 4. Indicates association but not strength — report effect size.

The non-parametric pairings

Mann-Whitney U — non-parametric independent t

Rank all observations across both groups; sum ranks per group; compute:

U_{1} = R_{1} - \frac{n _{1} ( n _{1} + 1 )}{2}

Tests whether one group tends to have stochastically larger values. Use when comparing two independent groups on ordinal data, non-Normal continuous data, or small samples with outliers.

Wilcoxon signed-rank — non-parametric paired t

For each pair, compute the difference; rank the *absolute* differences; sum the ranks of positive differences → W. Tests whether differences are symmetrically distributed around zero. Use for paired or matched samples with non-Normal differences.

Kruskal-Wallis — non-parametric one-way ANOVA

Rank all observations across all groups; compute H from rank sums. Tests whether at least one group has stochastically larger/smaller values. Follow with Dunn's post-hoc with multiple-comparison correction.

Friedman's test — non-parametric RM-ANOVA

Rank within each subject across the k conditions; test whether rank sums differ across conditions.

McNemar's test — paired binary 2×2

Same subjects measured on two binary outcomes. Compares only the discordant cells:

χ_{McNemar}^{2} = \frac{( b - c ) ^{2}}{b + c}

where b and c are off-diagonal counts. df = 1.

Fisher's exact test

For 2×2 contingency with small expected counts (< 5). Uses the hypergeometric distribution to compute exact p-values. No asymptotic approximation. The right tool when χ² assumptions fail.

Binomial sign test

Simplest paired test: count signs of differences; test against H₀: P(+) = 0.5. Useful when differences aren't even rank-comparable.

Spearman ρ

Pearson r on ranks. Covered in Unit 6. Non-parametric correlation; robust to outliers; captures monotone associations.

When to drop to non-parametric

Severely non-Normal residuals + small n.
Ordinal DV.
Unfixable outliers.
Rank-based research hypothesis ("does one group tend to rank higher?").
Likert data with few scale points and unequal spacing.

Trade-off: less power when parametric assumptions hold. Don't drop unnecessarily.

Reporting non-parametric results

State the test, the test statistic (U, W, H, χ², τ, ρ), the p-value, and the effect size.

*"Mann-Whitney U = 40, p = 0.03, r = Z/√N ≈ 0.45 (moderate effect)."*

For rank tests, the effect size is typically $r = Z / N$ .

What you carry into the exam

Decision tree mapping IV/DV scale × design to test.
χ² goodness-of-fit (df = k − 1) vs χ² independence (df = (r−1)(c−1)).
Phi / Cramér's V for χ² effect sizes.
Mann-Whitney = independent t non-parametric. Wilcoxon signed-rank = paired t non-parametric.
Kruskal-Wallis = one-way ANOVA non-parametric. Friedman = RM-ANOVA non-parametric.
McNemar = paired χ² 2×2; Fisher's exact for small counts.
Limitations of χ² — independence, frequencies only, E ≥ 5, no strength info.
Power trade-off — non-parametric less powerful when parametric assumptions hold.

When you're ready, send "next" and we'll move into multicollinearity, PCA, and factor analysis — when predictors correlate, how to detect it, and how to reduce dimensions.

Behavioral Research: Statistical Methods