Categorical & Rank-Based Tests
When Your Data Refuses to Behave
Maya has been working under an assumption that quietly underlies every test she's used so far: the data is approximately normal, the variance is reasonable, the variable is measured on an interval or ratio scale. Sometimes none of those things are true. Her data is skewed beyond rescue. Her sample is too small for the CLT to save her. Or her outcome is categorical: yes/no, red/blue/green. Different problems, same need: tests that don't make those assumptions.
This is the world of non-parametric tests. The exam will test you heavily on (a) when to use them, (b) which one to use in a given scenario, and (c) the basic mechanics of each. Memorise the decision tree at the start of this unit and the rest follows.
The decision tree — which test goes with which design
Every exam will give you scenarios and ask you to pick a test. Internalise this 2×3 grid first:
| Test type | Between-subjects | Within-subjects | | --- | --- | --- | | Parametric (interval/ratio, Normal) | Independent t-test | Paired t-test | | Non-parametric for ordinal data | Mann-Whitney U | Wilcoxon signed-rank | | Non-parametric for categorical data | Chi-square test | Binomial Sign Test |
That table is half the answer to half the exam. Three more pieces extend it:
- More than two groups, independent → One-way ANOVA (parametric) or Kruskal-Wallis (non-parametric).
- More than two groups, repeated measures → Repeated-measures ANOVA (parametric) or Friedman (non-parametric).
- Two or more categorical variables → Chi-square for independence.
Selecting a statistical test — what determines the choice
Three questions decide:
1. Level of measurement of the DV — interval/ratio, ordinal, or categorical/nominal. 2. Number of groups or conditions — two vs more than two. 3. Design — between-subjects vs within-subjects.
And one filter for parametric vs non-parametric within the interval/ratio row:
4. Are the parametric assumptions met? Normality, homogeneity of variance, no severe outliers, large enough n.
If interval/ratio and assumptions hold → parametric (more power). Otherwise → non-parametric.
Chi-square test — the workhorse for categorical data
When your data are counts in categories, you use χ². Two main applications.
Chi-square goodness-of-fit
Tests whether observed frequencies match an expected distribution.
- H₀: observed distribution matches expected.
- H₁: observed differs.
Classic: a bag of M&Ms vs the manufacturer's colour claim. Count observed colours, compute expected from the claim, compute:
df = k − 1 (one DoF lost because the total count is fixed). Compare to χ² critical value at chosen α and df.
Chi-square test for independence
Tests whether two categorical variables are related.
- H₀: independent.
- H₁: associated.
Data: a contingency table. Expected under independence:
df = (r − 1)(c − 1). For 2×2: df = 1. For 2×3: df = 2.
Effect size for χ²
- Phi (φ) for 2×2: .
- Cramér's V for larger: .
Both range 0 to 1. Standard report: *"χ²(2, n = 150) = 8.23, p < 0.05, V = 0.23 (moderate)."*
Limitations of chi-square — exam staples
1. Each observation must fall in exactly one cell. Between-subjects only; for paired use McNemar. 2. Only frequencies — not means, percentages, or ratios. 3. Each cell should have expected count ≥ 5. Below that, use Fisher's exact (for 2×2) or combine categories. 4. Indicates association but not strength — report effect size.
The non-parametric pairings
Mann-Whitney U — non-parametric independent t
Rank all observations across both groups; sum ranks per group; compute:
Tests whether one group tends to have stochastically larger values. Use when comparing two independent groups on ordinal data, non-Normal continuous data, or small samples with outliers.
Wilcoxon signed-rank — non-parametric paired t
For each pair, compute the difference; rank the *absolute* differences; sum the ranks of positive differences → W. Tests whether differences are symmetrically distributed around zero. Use for paired or matched samples with non-Normal differences.
Kruskal-Wallis — non-parametric one-way ANOVA
Rank all observations across all groups; compute H from rank sums. Tests whether at least one group has stochastically larger/smaller values. Follow with Dunn's post-hoc with multiple-comparison correction.
Friedman's test — non-parametric RM-ANOVA
Rank within each subject across the k conditions; test whether rank sums differ across conditions.
McNemar's test — paired binary 2×2
Same subjects measured on two binary outcomes. Compares only the discordant cells:
where b and c are off-diagonal counts. df = 1.
Fisher's exact test
For 2×2 contingency with small expected counts (< 5). Uses the hypergeometric distribution to compute exact p-values. No asymptotic approximation. The right tool when χ² assumptions fail.
Binomial sign test
Simplest paired test: count signs of differences; test against H₀: P(+) = 0.5. Useful when differences aren't even rank-comparable.
Spearman ρ
Pearson r on ranks. Covered in Unit 6. Non-parametric correlation; robust to outliers; captures monotone associations.
When to drop to non-parametric
- Severely non-Normal residuals + small n.
- Ordinal DV.
- Unfixable outliers.
- Rank-based research hypothesis ("does one group tend to rank higher?").
- Likert data with few scale points and unequal spacing.
Trade-off: less power when parametric assumptions hold. Don't drop unnecessarily.
Reporting non-parametric results
State the test, the test statistic (U, W, H, χ², τ, ρ), the p-value, and the effect size.
*"Mann-Whitney U = 40, p = 0.03, r = Z/√N ≈ 0.45 (moderate effect)."*
For rank tests, the effect size is typically .
What you carry into the exam
- Decision tree mapping IV/DV scale × design to test.
- χ² goodness-of-fit (df = k − 1) vs χ² independence (df = (r−1)(c−1)).
- Phi / Cramér's V for χ² effect sizes.
- Mann-Whitney = independent t non-parametric. Wilcoxon signed-rank = paired t non-parametric.
- Kruskal-Wallis = one-way ANOVA non-parametric. Friedman = RM-ANOVA non-parametric.
- McNemar = paired χ² 2×2; Fisher's exact for small counts.
- Limitations of χ² — independence, frequencies only, E ≥ 5, no strength info.
- Power trade-off — non-parametric less powerful when parametric assumptions hold.
When you're ready, send "next" and we'll move into multicollinearity, PCA, and factor analysis — when predictors correlate, how to detect it, and how to reduce dimensions.