Partition, F-test, Sphericity, Post-hoc
Maya, Three Therapies, and the F-test that Saved Her Mental Health
It's 11pm on a Tuesday. Maya has three columns of numbers staring at her in a spreadsheet — anxiety scores after 8 weeks of treatment. Group C got counselling. Group M got medication. Group B got both. The means are 26, 24.4, 22.6 (lower = less anxious). Her advisor wants the analysis by Friday.
Her first instinct, the one every undergrad has, is to run three t-tests. C vs M. C vs B. M vs B. Pick the smallest p, declare victory. She types up the first comparison, then stops cold. She remembers Session 7 on multiple comparisons.
*"Three tests at α = 0.05 each. What's the probability of at least one false positive under all-true-nulls?"* she mutters.
She does the arithmetic on a Post-it: . 14.3%. Almost triple the nominal α. And four groups would be 6 tests, . Five groups would be 10 tests, . The error rate explodes.
She remembers Bonferroni — divide α by m, compare each p to α/3 = 0.0167. It works, but it's blunt. There has to be a way to do all the comparisons at once with one α.
Of course there is. Her notes from yesterday's lecture have the answer underlined in red: One-way ANOVA.
---
The Trick — Stop Comparing Means, Start Comparing Variances
The name confused Maya at first. "Analysis of *Variance*"? But she's trying to compare *means*. How does variance help?
She walks through the logic with the cold focus of someone who's about to make sense of something fundamental.
*"If the treatments work, the three group means scatter widely around the grand mean. If they don't, they cluster tightly around it."*
So she can detect mean differences by looking at between-group variance — how much group means scatter around the grand mean. That's the signal.
The catch: there's always noise. Within each group, individuals vary. People in the counselling-only group don't all end up at exactly 26. Some end at 22, some at 30, naturally. That's within-group variance — the noise floor.
The brilliant move is to express the question as a ratio:
If treatments do nothing, between-group variance and within-group variance are both just , so F ≈ 1. If treatments work, MSB grows while MSW stays roughly the same — F > 1. The signal-to-noise ratio. That's all F is.
*"It's a t-test that scales,"* she writes in her margin. *"t compares 2 means using their difference / SE. F compares k means using their scatter / within-group scatter."*
---
The Partition That Holds Everything Together
The next page of her notes makes her smile. It has one equation circled three times:
Total variability splits *exactly* — no remainder — into between-group and within-group pieces. She tries to derive it herself and gets stuck at the cross term until she remembers that within each group by construction. The cross-product collapses. The partition is exact.
She computes for her data:
| Source | SS | df | MS | |---|---|---|---| | Between (treatment) | 173 | 2 | 86.5 | | Within (error) | 1631 | 87 | 18.75 | | Total | 1804 | 89 | — |
She looks up the critical F for df = (2, 87) at α = 0.05: about 3.10. Her F = 4.61 > 3.10. p = 0.013.
Reject H₀. The treatments are not all equal.
She types into her notebook: "F(2, 87) = 4.61, p = .013."
Then stops. Adds, "η² = 173/1804 = 0.096." That's the proportion of total variance explained by treatment. About 10%. Medium effect by Cohen's bands (.06–.14). She remembers: *always report effect size with F*. Half her cohort will forget. She won't.
---
Which Pair Differs? Enter the Post-hocs
The F-test told her that *somewhere* the means differ. It didn't tell her where. That's by design — F is omnibus.
She runs Tukey HSD, the standard post-hoc for equal-n one-way ANOVA. It controls the family-wise error rate across all pairwise comparisons simultaneously, with the studentized range distribution:
For α = 0.05, k = 3, df_W = 87, the q from a table is about 3.37. So:
Any pairwise mean difference exceeding 2.66 is significant.
| Pair | Diff | Sig? | |---|---|---| | Counselling − Both | 26 − 22.6 = 3.4 | ✓ p = .008 | | Counselling − Meds | 26 − 24.4 = 1.6 | ✗ p = .12 | | Meds − Both | 24.4 − 22.6 = 1.8 | barely ✓ p = .047 |
*"Combination beats counselling alone. Meds alone aren't different from counselling. Combination edges out meds, barely."*
Useful, clean, defensible. She writes the conclusion:
Using a one-way ANOVA, we observed a significant effect of treatment on anxiety scores at week 8, F(2, 87) = 4.61, p = .013, η² = .10. Tukey HSD post-hoc tests revealed that the combination treatment resulted in lower anxiety than counselling alone (p = .008) and, marginally, than medication alone (p = .047). Medication and counselling alone did not differ significantly.
---
Maya Asks: What if the Same Person Tried All Three?
Then a question hits her — what if instead of three groups, she had measured the *same 30 people* under all three treatments (with a long washout)? Same n, but a different design.
That's repeated-measures ANOVA. And it would be more powerful.
Why? Because between-subjects variance — the fact that some people are just generally more anxious than others, regardless of treatment — can now be *pulled out* of the error term. The partition gets finer:
Then — with MS_Error being a *smaller* number than MSW would be in between-subjects design (because subject variance has been subtracted out). Same effect, smaller denominator, bigger F, more power.
But there's a price: a new assumption called sphericity. The variances of *pairwise differences* between conditions must be equal across all pairs. Variance(C − M) ≈ Variance(C − B) ≈ Variance(M − B). Maya squints at this — it's like homogeneity of variance, but for paired data.
Mauchly's test of sphericity — H₀: sphericity holds. p < 0.05 means it doesn't.
If sphericity is violated, you don't switch tests. You apply a correction:
- Greenhouse-Geisser (more conservative; ε < 0.75)
- Huynh-Feldt (less conservative; ε > 0.75)
Both adjust the df by multiplying by ε (a number between 1/(k−1) and 1). F stays the same; the reference distribution becomes wider, p grows a little.
*"So sphericity violations don't kill RM-ANOVA. They just make it more conservative. Good to know."*
---
A Tangle of Cousins
She finishes her main analysis but reads ahead. The session has *eight more ANOVA flavors* she'll need by Friday. She makes a table:
| Design | DV | IV | Between/Within | Test | |---|---|---|---|---| | 3+ groups | 1 | 1 | Between | One-way ANOVA | | 3+ groups | 1 | 1 | Within | Repeated-measures ANOVA | | Non-normal | 1 | 1 | Between | Kruskal-Wallis | | Non-normal | 1 | 1 | Within | Friedman | | Unequal variances | 1 | 1 | Between | Welch ANOVA | | With covariate | 1 | 1 (+cov) | Between | ANCOVA | | 2+ IVs | 1 | 2+ | Between | Factorial ANOVA | | 2+ DVs | 2+ | 1+ | Between | MANOVA | | Both kinds of IVs | 1 | 2+ | Mixed | Mixed ANOVA |
Three things she circles in red because they're the most likely exam questions:
(1) Factorial ANOVA interactions. Two IVs → main effect A, main effect B, *interaction A × B*. The interaction asks: *does the effect of A depend on B?* Non-parallel lines in the interaction plot = interaction. She remembers the example from class: caffeine helps in the morning, hurts in the afternoon (disrupts sleep). Crossing lines. The interaction is usually the most interesting part of the analysis — and it qualifies any main effect interpretation.
(2) MANOVA vs RM-ANOVA — the distinction the exam loves. RM = *same* DV measured many times on same people (sphericity matters). MANOVA = *different* DVs measured once, on people at one time (homogeneity of covariance matrices matters). Maya scribbles a mnemonic: "RM = same thing many times. MANOVA = different things once." She underlines it twice.
(3) ANCOVA — the power-booster. ANOVA + a continuous covariate that confounds the IV-DV link. Maya thinks of it like this: imagine testing time of day on reaction time, but the AM group happened to sleep more last night. The IV effect is contaminated by sleep. ANCOVA partials sleep out *first*, then runs ANOVA on the residualised RT. Result: the IV effect *net of* the covariate. Bonus: the error term shrinks (because some variance has been explained by the covariate), so power goes up.
*"ANCOVA gives you a cleaner question and more power. Two birds, one model."*
---
What Maya Got Out of This
By 1am she has a full draft. The methods section reads like she actually knows what she's doing — because she does now. F-ratio. SS partition. Effect size. Post-hocs with FWER control. Sphericity check. She has a hierarchy of ANOVA flavors mapped to designs, and she knows which one she'd reach for if the data shifted.
She thinks about how F is really just the t-test grown up. With two groups, F = t². Same idea: signal over noise. ANOVA is the language for when 'signal' is more than one comparison and 'noise' is everything within groups.
She closes the laptop. Tomorrow she'll do the factorial — they want to see whether the effect of treatment depends on patient age group. That's another full evening. But she has a hunch about what she'll find: an interaction. And now she knows how to read it.
*"F just answers the omnibus. Tukey tells you where. Interactions tell you whether the story even makes sense. And η² tells you whether to care."*
She turns off the desk lamp. The therapies *do* work, mostly. Especially in combination. And so does ANOVA.