Revision Notes/Unit 11 — ANOVA (one-way, RM, two-way)/Partition, F-test, Sphericity, Post-hoc/Story

Partition, F-test, Sphericity, Post-hoc

Unit 11 — ANOVA (one-way, RM, two-way)

Maya, Three Therapies, and the F-test that Saved Her Mental Health

It's 11pm on a Tuesday. Maya has three columns of numbers staring at her in a spreadsheet — anxiety scores after 8 weeks of treatment. Group C got counselling. Group M got medication. Group B got both. The means are 26, 24.4, 22.6 (lower = less anxious). Her advisor wants the analysis by Friday.

Her first instinct, the one every undergrad has, is to run three t-tests. C vs M. C vs B. M vs B. Pick the smallest p, declare victory. She types up the first comparison, then stops cold. She remembers Session 7 on multiple comparisons.

*"Three tests at α = 0.05 each. What's the probability of at least one false positive under all-true-nulls?"* she mutters.

She does the arithmetic on a Post-it: $1 - 0.9 5^{3} = 0.143$ . 14.3%. Almost triple the nominal α. And four groups would be 6 tests, $1 - 0.9 5^{6} \approx 0.265$ . Five groups would be 10 tests, $1 - 0.9 5^{10} \approx 0.40$ . The error rate explodes.

She remembers Bonferroni — divide α by m, compare each p to α/3 = 0.0167. It works, but it's blunt. There has to be a way to do all the comparisons at once with one α.

Of course there is. Her notes from yesterday's lecture have the answer underlined in red: One-way ANOVA.

---

The Trick — Stop Comparing Means, Start Comparing Variances

The name confused Maya at first. "Analysis of *Variance*"? But she's trying to compare *means*. How does variance help?

She walks through the logic with the cold focus of someone who's about to make sense of something fundamental.

*"If the treatments work, the three group means scatter widely around the grand mean. If they don't, they cluster tightly around it."*

So she can detect mean differences by looking at between-group variance — how much group means scatter around the grand mean. That's the signal.

The catch: there's always noise. Within each group, individuals vary. People in the counselling-only group don't all end up at exactly 26. Some end at 22, some at 30, naturally. That's within-group variance — the noise floor.

The brilliant move is to express the question as a ratio:

F = \frac{between-group variance}{within-group variance} = \frac{MSB}{MSW}

If treatments do nothing, between-group variance and within-group variance are both just $σ^{2}$ , so F ≈ 1. If treatments work, MSB grows while MSW stays roughly the same — F > 1. The signal-to-noise ratio. That's all F is.

*"It's a t-test that scales,"* she writes in her margin. *"t compares 2 means using their difference / SE. F compares k means using their scatter / within-group scatter."*

---

The Partition That Holds Everything Together

The next page of her notes makes her smile. It has one equation circled three times:

SS_{Total} = SS_{B} + SS_{W}

Total variability splits *exactly* — no remainder — into between-group and within-group pieces. She tries to derive it herself and gets stuck at the cross term until she remembers that within each group $\sum_{i} (x_{ij} - \overset{x}{ˉ}_{j}) = 0$ by construction. The cross-product collapses. The partition is exact.

She computes for her data:

| Source | SS | df | MS | |---|---|---|---| | Between (treatment) | 173 | 2 | 86.5 | | Within (error) | 1631 | 87 | 18.75 | | Total | 1804 | 89 | — |

F = \frac{86.5}{18.75} = 4.61

She looks up the critical F for df = (2, 87) at α = 0.05: about 3.10. Her F = 4.61 > 3.10. p = 0.013.

Reject H₀. The treatments are not all equal.

She types into her notebook: "F(2, 87) = 4.61, p = .013."

Then stops. Adds, "η² = 173/1804 = 0.096." That's the proportion of total variance explained by treatment. About 10%. Medium effect by Cohen's bands (.06–.14). She remembers: *always report effect size with F*. Half her cohort will forget. She won't.

---

Which Pair Differs? Enter the Post-hocs

The F-test told her that *somewhere* the means differ. It didn't tell her where. That's by design — F is omnibus.

She runs Tukey HSD, the standard post-hoc for equal-n one-way ANOVA. It controls the family-wise error rate across all $(2 3) = 3$ pairwise comparisons simultaneously, with the studentized range distribution:

HSD = q_{α, k, d f_{W}} \cdot MSW / n

For α = 0.05, k = 3, df_W = 87, the q from a table is about 3.37. So:

HSD = 3.37 \cdot 18.75/30 = 3.37 \cdot 0.79 \approx 2.66

Any pairwise mean difference exceeding 2.66 is significant.

| Pair | Diff | Sig? | |---|---|---| | Counselling − Both | 26 − 22.6 = 3.4 | ✓ p = .008 | | Counselling − Meds | 26 − 24.4 = 1.6 | ✗ p = .12 | | Meds − Both | 24.4 − 22.6 = 1.8 | barely ✓ p = .047 |

*"Combination beats counselling alone. Meds alone aren't different from counselling. Combination edges out meds, barely."*

Useful, clean, defensible. She writes the conclusion:

Using a one-way ANOVA, we observed a significant effect of treatment on anxiety scores at week 8, F(2, 87) = 4.61, p = .013, η² = .10. Tukey HSD post-hoc tests revealed that the combination treatment resulted in lower anxiety than counselling alone (p = .008) and, marginally, than medication alone (p = .047). Medication and counselling alone did not differ significantly.

---

Maya Asks: What if the Same Person Tried All Three?

Then a question hits her — what if instead of three groups, she had measured the *same 30 people* under all three treatments (with a long washout)? Same n, but a different design.

That's repeated-measures ANOVA. And it would be more powerful.

Why? Because between-subjects variance — the fact that some people are just generally more anxious than others, regardless of treatment — can now be *pulled out* of the error term. The partition gets finer:

SS_{Total} = SS_{Between conditions} + SS_{Subjects} + SS_{Error}

Then $F = MS_{Between} / MS_{Error}$ — with MS_Error being a *smaller* number than MSW would be in between-subjects design (because subject variance has been subtracted out). Same effect, smaller denominator, bigger F, more power.

But there's a price: a new assumption called sphericity. The variances of *pairwise differences* between conditions must be equal across all pairs. Variance(C − M) ≈ Variance(C − B) ≈ Variance(M − B). Maya squints at this — it's like homogeneity of variance, but for paired data.

Mauchly's test of sphericity — H₀: sphericity holds. p < 0.05 means it doesn't.

If sphericity is violated, you don't switch tests. You apply a correction:

Greenhouse-Geisser (more conservative; ε < 0.75)
Huynh-Feldt (less conservative; ε > 0.75)

Both adjust the df by multiplying by ε (a number between 1/(k−1) and 1). F stays the same; the reference distribution becomes wider, p grows a little.

*"So sphericity violations don't kill RM-ANOVA. They just make it more conservative. Good to know."*

---

A Tangle of Cousins

She finishes her main analysis but reads ahead. The session has *eight more ANOVA flavors* she'll need by Friday. She makes a table:

| Design | DV | IV | Between/Within | Test | |---|---|---|---|---| | 3+ groups | 1 | 1 | Between | One-way ANOVA | | 3+ groups | 1 | 1 | Within | Repeated-measures ANOVA | | Non-normal | 1 | 1 | Between | Kruskal-Wallis | | Non-normal | 1 | 1 | Within | Friedman | | Unequal variances | 1 | 1 | Between | Welch ANOVA | | With covariate | 1 | 1 (+cov) | Between | ANCOVA | | 2+ IVs | 1 | 2+ | Between | Factorial ANOVA | | 2+ DVs | 2+ | 1+ | Between | MANOVA | | Both kinds of IVs | 1 | 2+ | Mixed | Mixed ANOVA |

Three things she circles in red because they're the most likely exam questions:

(1) Factorial ANOVA interactions. Two IVs → main effect A, main effect B, *interaction A × B*. The interaction asks: *does the effect of A depend on B?* Non-parallel lines in the interaction plot = interaction. She remembers the example from class: caffeine helps in the morning, hurts in the afternoon (disrupts sleep). Crossing lines. The interaction is usually the most interesting part of the analysis — and it qualifies any main effect interpretation.

(2) MANOVA vs RM-ANOVA — the distinction the exam loves. RM = *same* DV measured many times on same people (sphericity matters). MANOVA = *different* DVs measured once, on people at one time (homogeneity of covariance matrices matters). Maya scribbles a mnemonic: "RM = same thing many times. MANOVA = different things once." She underlines it twice.

(3) ANCOVA — the power-booster. ANOVA + a continuous covariate that confounds the IV-DV link. Maya thinks of it like this: imagine testing time of day on reaction time, but the AM group happened to sleep more last night. The IV effect is contaminated by sleep. ANCOVA partials sleep out *first*, then runs ANOVA on the residualised RT. Result: the IV effect *net of* the covariate. Bonus: the error term shrinks (because some variance has been explained by the covariate), so power goes up.

*"ANCOVA gives you a cleaner question and more power. Two birds, one model."*

---

What Maya Got Out of This

By 1am she has a full draft. The methods section reads like she actually knows what she's doing — because she does now. F-ratio. SS partition. Effect size. Post-hocs with FWER control. Sphericity check. She has a hierarchy of ANOVA flavors mapped to designs, and she knows which one she'd reach for if the data shifted.

She thinks about how F is really just the t-test grown up. With two groups, F = t². Same idea: signal over noise. ANOVA is the language for when 'signal' is more than one comparison and 'noise' is everything within groups.

She closes the laptop. Tomorrow she'll do the factorial — they want to see whether the effect of treatment depends on patient age group. That's another full evening. But she has a hunch about what she'll find: an interaction. And now she knows how to read it.

*"F just answers the omnibus. Tukey tells you where. Interactions tell you whether the story even makes sense. And η² tells you whether to care."*

She turns off the desk lamp. The therapies *do* work, mostly. Especially in combination. And so does ANOVA.

Behavioral Research: Statistical Methods