Saral Shiksha Yojna
Courses/Behavioral Research: Statistical Methods

Behavioral Research: Statistical Methods

CG3.402
Vinoo AlluriMonsoon 2025-264 credits

Decision Tree, Confusions, Report Checklist

NotesStory
Unit 15 — Rapid Revision & Exam Strategy

Maya's Final Walk-Through — One Page That Holds the Whole Course

Five days before the exam. Maya is on her bed surrounded by 14 sessions of notes, three half-eaten samosas, and her cat, who is sitting on Session 11 because cats know which session matters most.

She doesn't need to relearn the material. She needs to retrieve it. The exam will give her one-line scenarios and ask the same question over and over: *which test, why, what are your assumptions, what would the result look like?* That's the whole game.

She closes her notebook and writes one sentence on a fresh page:

The exam is not about formulas. It's about whether I can run a decision tree in 30 seconds.

That sentence is the entire revision strategy. Everything she does for the next five days is in service of it.

---

The Master Decision Tree

She draws it on a single A4 page and pins it above her desk:

Question 1: How many DVs?

>

- One → ANOVA family.
- Multiple → MANOVA.

>

Question 2: What scale is the DV?

>

- Categorical → χ² family or logistic regression.
- Ordinal / non-normal continuous → rank-based (Mann-Whitney, Wilcoxon, Kruskal-Wallis, Friedman, Spearman).
- Continuous, parametric → t / ANOVA / regression.

>

Question 3: How many groups or conditions?

>

- 1 vs 2 vs 3+.

>

Question 4: Between-subjects or within-subjects?

>

- Independent samples vs paired / repeated.

Four questions. They funnel any scenario into one of about a dozen specific tests. Maya promises herself she'll run this tree on every exam question, even the ones that look obvious. *Especially* the ones that look obvious — those are where overconfidence kills marks.

---

The 3×4 Lookup Grid

Underneath the tree, she draws the table everyone in BRSM eventually memorises:

| | 2 indep | 2 paired | 3+ indep | 3+ repeated | |---|---|---|---|---| | Parametric | Indep t | Paired t | One-way ANOVA | RM-ANOVA | | Ordinal / non-normal | Mann-Whitney U | Wilcoxon signed-rank | Kruskal-Wallis | Friedman's | | Categorical | χ² | McNemar | χ² extended | (rare) |

Twelve cells. Twelve tests. If she can fill in any cell from memory, she can pass the test-selection section blind.

Then she adds extensions in the margin:

  • One IV + covariate → ANCOVA.
  • Two+ IVs all between → factorial ANOVA.
  • Mixed between/within → mixed ANOVA.
  • 2+ DVs → MANOVA.
  • Continuous predictor on continuous DV → Pearson r + simple regression.
  • Continuous predictors on continuous DV → multiple regression.
  • Continuous/categorical predictors on binary DV → logistic regression.
  • Continuous/categorical predictors on count DV → Poisson regression.

She remembers Session 14: *everything is GLM*. The grid is a GLM in disguise — the rows are the *distribution of Y* (Normal → parametric, ranks → non-parametric, Bernoulli → categorical/logistic). The grid isn't a list of unrelated tools; it's a structured menu of one family.

---

The Interpretation Traps That Lose Easy Marks

Maya turns to her second page. This one is for *traps*. The exam will not just test 'pick the test.' It will test 'don't say the wrong thing about p, CI, and effect size.'

She writes wrong-vs-right pairs:

p-value. *Wrong:* "p = 0.03 means there's a 3% chance H₀ is true." *Right:* "p = 0.03 means that, assuming H₀ is true, there's a 3% chance of data this extreme or more extreme."
Confidence interval. *Wrong:* "There's a 95% chance the parameter is in this CI." *Right:* "95% of intervals constructed this way would contain the parameter across many repeated studies. We don't know if our specific interval is one of them."
Non-significant p. *Wrong:* "Failing to reject H₀ means H₀ is true." *Right:* "Failing to reject means insufficient evidence to reject. The null might still be false; we just couldn't detect the effect."
Correlation. *Wrong:* "A correlates with B, so A causes B." *Right:* "Correlation reveals association. Causation requires experiment, randomisation, or careful confound control."
Statistical vs practical significance. *Wrong:* "p < .001 means the effect is huge." *Right:* "Statistical significance and effect size are separate. Always report both. Huge n makes trivial effects significant."
Normality. *Wrong:* "My sample is normally distributed." *Right:* "The *sampling distribution of the mean* is approximately normal (by CLT). The raw data may or may not be."
One-tailed test post-hoc. *Wrong:* "I switched to one-tailed because my two-tailed wasn't significant." *Right:* "Switching directionality post-hoc to attain significance is p-hacking. One-tailed tests must be pre-specified."

She circles all of these in red. They appear on every BRSM paper. They are the cheap marks.

---

The Assumption Pairings

Page three. For every parametric test, the assumptions it makes and the diagnostic that checks each. She memorises the column-pairings:

| Test | Assumption | Diagnostic | |---|---|---| | t / ANOVA / regression | Normality of residuals | Shapiro-Wilk, Q-Q | | t / ANOVA | Homogeneity of variance | Levene's | | RM-ANOVA | Sphericity | Mauchly's W | | Regression | Linearity | Residual vs fitted plot | | Regression | Homoscedasticity | Residual plot, ncvTest | | Regression | Multicollinearity | VIF (> 5–10 is severe) | | Regression | Outlier influence | Cook's distance (> 1) | | MANOVA | Homogeneity of covariance | Box's M | | χ² | Adequate expected counts | min E ≥ 5 per cell |

*"For every test I propose on the exam, I must name its assumptions and how I'd check them. The slides literally said that."*

---

The 10-Point Framework for Open-Ended Questions

The 10-mark questions are where the exam separates careful students from panicked ones. They'll give a paragraph-long scenario and ask for a complete analysis plan. Maya practises the framework on a fresh prompt:

*"60 elderly patients randomised to memory drug or placebo, with age and education recorded as control variables. Recommend a statistical analysis."*

She answers in 10 numbered points:

1. Question: Does the drug improve recall, after controlling for age and education? 2. H₀ / H₁: H₀ — no effect on recall, controlling for covariates. H₁ — drug increases recall. 3. Variables: IV — drug condition (nominal, binary, between). DV — words recalled (ratio, continuous). Covariates — age, education (interval/ratio). 4. Design: Between-subjects with two covariates. 5. Test: ANCOVA. 6. Assumptions: Normality of DV per group; homogeneity of variance; independence; linearity of covariate-DV; homogeneity of regression slopes; covariate measured pre-IV; no severe multicollinearity. 7. Diagnostics: Shapiro-Wilk per group; Levene's; covariate-DV scatter; check for interaction between covariates and IV; VIF. 8. Fallback: Non-normal → Quade test or robust ANCOVA. Heterogeneous variance → Welch-style correction. Non-linear covariate → polynomial / transformation. Heterogeneous slopes → don't use ANCOVA; report stratified analyses. 9. Effect size: Partial η² for the drug effect (with 95% CI). 10. Reporting: "After adjusting for age and education, the drug group recalled significantly more words than placebo, , , partial , 95% CI [.02, .27]."

Ten points. Even if she gets the *number* wrong, she earns marks for every step. That's the system.

---

Quick-Fire Pattern Recognition

She writes a list of 15 one-sentence scenarios with their answers and times herself. 30 seconds per question.

  • *"Are men taller than women?"* → 2 independent, continuous → indep t (or Mann-Whitney if skewed).
  • *"Same patients pre/post intervention?"* → paired t (Wilcoxon if non-normal).
  • *"Exam scores across three schools?"* → one-way ANOVA (Kruskal-Wallis).
  • *"Reaction time at 3 caffeine doses, same people?"* → RM-ANOVA, check sphericity.
  • *"Gender × political affiliation associated?"* → χ² independence.
  • *"M&M colours match advertised distribution?"* → χ² goodness-of-fit.
  • *"Education predicts income?"* → Pearson + simple regression.
  • *"Education predicts income controlling for IQ?"* → multiple regression.
  • *"Click 'buy' or not from experimental condition?"* → logistic regression.
  • *"Therapy reduces depression accounting for baseline?"* → ANCOVA.
  • *"Treatment effect depends on age?"* → factorial ANOVA (main effects + interaction).
  • *"Therapy improves *both* depression *and* anxiety?"* → MANOVA.
  • *"Session type (between) × week (within)?"* → mixed ANOVA.
  • *"Memory at age 20, 30, 40, 50 (skewed)?"* → Friedman's, then post-hoc.
  • *"Smoking and lung cancer (yes/no × yes/no)?"* → χ² 2×2.

She does 15 of them in 7 minutes. She'll do 15 more tomorrow, and 15 more the day after. By the exam she'll do them in under 4 minutes.

---

Effect Sizes That Must Always Appear

A small card she'll tape inside her exam pad cover:

| Test | Effect size | Small / Medium / Large | |---|---|---| | t-test | Cohen's d | 0.2 / 0.5 / 0.8 | | ANOVA | η², partial η² | .01 / .06 / .14 | | Correlation / Regression | r, R² | 0.1 / 0.3 / 0.5 | | χ² (2×2) | φ | 0.1 / 0.3 / 0.5 | | χ² (larger) | Cramér's V | depends on df | | Logistic | Odds ratio | ~1.5 / 2.5 / 4 | | Bayesian | Bayes Factor | 3–10 / 10–30 / >30 |

She stares at this card a lot. *Without effect size, p-value is half the story.* It's the single most repeated message of the entire course, and it's where the exam routinely tests vigilance.

---

Common Confusions to Drill

The next page is a list of pairs the exam loves to test as "explain the difference between…":

  • PCA vs FA. PCA: variance compression, no error term, eigendecomposition. FA: latent constructs, communality + uniqueness, modelled error.
  • FWER vs FDR. FWER controls P(any false positive); FDR controls expected *proportion* of false positives. FWER conservative; FDR powerful.
  • Reliability vs validity. Reliability = consistency; validity = accuracy. You can be reliably wrong (a scale always 5 kg high). You can't be unreliably valid.
  • Type I vs Type II. α = false alarm (reject true null); β = miss (fail to reject false null). Power = 1 − β.
  • Confidence interval vs credible interval. CI is a frequentist procedure; credible interval is a Bayesian posterior probability range.
  • SD vs SEM. SD describes data spread. SEM describes the *mean's* sampling variability: SEM = SD / √n.
  • One-tailed vs two-tailed. One-tailed requires pre-specified direction (and gives twice the power *if* the direction is right). Two-tailed is the default.
  • Independent vs paired t. Different participants vs same participants.
  • Statistical vs practical significance. Detectability vs magnitude. Always report both.
  • Correlation vs causation. The line that should be tattooed on every researcher's wrist.

---

Maya's Exam-Day Plan

She writes it out, in order, as a sticky note for her pencil case:

1. Skim the whole paper first. Identify easy questions vs hard. 2. Easy descriptive first. Bank quick marks. 3. For open-ended: 10-point framework. Even with shaky numbers, structure earns partial credit. 4. Show work on calculations. Steps win points. 5. Define terms in your own words. Examiners want understanding, not memorised wording (unless it's the canonical phrasing for p/CI traps — those, recite verbatim). 6. State assumptions for every test. Non-negotiable. 7. If stuck, run the decision tree. Always. 8. Watch the clock. 1 mark ≈ 1 minute. 9. Recognise the four pitfalls (p, CI, statistical-vs-practical, correlation-vs-causation) in MCQs. 10. Read every question twice before writing.

---

The Self-Test Before Bed

Two nights before the exam, she runs through twenty questions without notes:

  • The four scales of measurement, with examples.
  • Reliability vs validity, one without the other.
  • FWER for 5 independent tests at α = .05.
  • Is r = 0.7, n = 10 significant? Strong?
  • CLT in one sentence.
  • Type I vs Type II.
  • FWER vs FDR.
  • Cohen's d for M₁ = 75, M₂ = 70, SD = 10.
  • 2×2 χ² calculation and df.
  • Assumptions of multiple linear regression.
  • Difference between paired and independent t.
  • When to use Mann-Whitney U vs indep t.
  • Cook's distance — what it measures, threshold.
  • R² vs adjusted R².
  • Bayes' rule — prior / likelihood / posterior / evidence.
  • Why optional stopping kills frequentist inference but not Bayesian.
  • Logistic regression's exponentiated coefficient of 2.5 — meaning?
  • Simpson's paradox with an example.
  • Three independent skewed groups — which test?
  • Power and conventional minimum.

She gets 18 of 20 right without notes. The two she missed (sphericity correction and the specific Bonferroni threshold) she fixes on the spot.

*"Walk in calm. Read every question twice. State assumptions. Show work. Trust the framework."*

---

The Quiet Closing

Two weeks ago Maya was drowning in fragmented tools — t-tests on one page, ANOVA on another, regression a continent away. Now they fit on a single page above her desk. They aren't a kit anymore. They're a *language*.

She thinks of the umbrella, of the goalkeepers, of the three anxiety treatments, of the airline survey factor analysis. The course had a story, and she was the protagonist. Every session was a tool she'd actually use — and tomorrow she'll use them under timed pressure.

She closes the laptop. The cat is now on her pillow. The exam paper, she knows, will be a sequence of carefully chosen scenarios, and her job is to respond with disciplined thinking: decision tree first, assumptions next, effect size always, traps avoided.

*"BRSM isn't a course about p-values. It's a course about being honest with data, careful with claims, and humble about uncertainty. The exam is testing whether I learned that."*

She turns out the light. The page above the desk stays pinned. Decision tree. Lookup grid. Traps. Assumptions. 10-point framework.

Whole course. One page.