Courses/Behavioral Research: Statistical Methods

Behavioral Research: Statistical Methods

CG3.402

Vinoo Alluri•Monsoon 2025-26•4 credits

Formulas & Diagrams

High-ROI section — formulas improve marks, diagrams improve recall.

Formulas

z-score

—

z = \frac{X - μ}{σ}

Standardise a value: how many SDs above/below the population mean.

One-sample t

—

t = \frac{x ˉ - μ _{0}}{s / n}

Test sample mean against a hypothesised value when σ is unknown.

Standard error of the mean

—

SEM = \frac{σ}{n} (est. s / n)

How precisely the sample mean estimates μ. Shrinks as √n.

Cohen's d

—

d = \frac{M _{1} - M _{2}}{SD _{pooled}}

Standardised mean difference. 0.2 / 0.5 / 0.8 = small / medium / large.

Pearson χ²

—

χ^{2} = \sum \frac{( O - E ) ^{2}}{E}

Goodness-of-fit / independence on categorical data. df=(r−1)(c−1) for independence.

F (ANOVA)

—

F = \frac{MS _{between}}{MS _{within}}

Ratio of between-group to within-group variance. F < 1 → no effect; F ≫ 1 → effect.

ANOVA SS partition

—

SS_{total} = SS_{between} + SS_{within}

Total variability splits into group differences + within-group residual.

Eta-squared (η²)

—

η^{2} = \frac{SS _{between}}{SS _{total}}

Effect size for ANOVA. 0.01 / 0.06 / 0.14 = small / medium / large.

R²

—

R^{2} = 1 - \frac{SS _{res}}{SS _{tot}}

Proportion of variance in Y explained by the model. Always ↑ as predictors added — use adjusted R² for honest comparison.

Pearson r

—

r = \frac{\sum ( x _{i} - x ˉ ) ( y _{i} - y ˉ )}{\sum ( x _{i} - x ˉ ) ^{2} \sum ( y _{i} - y ˉ ) ^{2}}

Strength of linear association in [−1, 1]. Sensitive to outliers; assumes linearity.

Variance Inflation Factor

—

VIF_{j} = \frac{1}{1 - R _{j}^{2}}

Severity of multicollinearity for predictor j. VIF > 5–10 is problematic.

Bayes' rule

—

P (H ∣ D) = \frac{P ( D ∣ H ) \cdot P ( H )}{P ( D )}

Updates prior P(H) to posterior P(H|D) using likelihood and evidence.

Bayes Factor

—

BF_{10} = \frac{P ( D ∣ H _{1} )}{P ( D ∣ H _{0} )}

Continuous evidence ratio. 3–10 moderate · 10–30 strong · >30 very strong evidence for H₁.

PSNR

—

PSNR = 10 lo g_{10} \frac{R ^{2}}{MSE}

(Not BRSM-core but shared with image quality contexts.)

Binomial PMF

—

P (X = k) = (k n) p^{k} (1 - p)^{n - k}

Probability of k successes in n independent Bernoulli(p) trials.

Logistic regression (logit)

—

lo g \frac{p}{1 - p} = β_{0} + β_{1} x_{1} + \dots + β_{k} x_{k}

Link function: linear in log-odds, bounded p ∈ [0,1].

Odds ratio

—

OR = e^{β}

Multiplicative change in odds per unit increase in predictor. Exam essential.

Bonferroni per-test α

—

α_{per-test} = α_{FW} / m

Strict FWER control. With m=20, α=.05 → per-test α = 0.0025.

Benjamini-Hochberg threshold

—

p_{(i)} \leq \frac{i}{m} Q

Rank p-values; significant if below the BH line. Q is the target FDR (e.g., 0.05).

95% CI for mean

—

\overset{x}{ˉ} \pm t_{α /2, n - 1} \cdot \frac{s}{n}

Sample mean ± t-critical times SEM. Frequentist: procedure has 95% long-run coverage.

χ² independence df

—

df = (r - 1) (c - 1)

Contingency table degrees of freedom. 2×3 table → df = 2.

χ² goodness-of-fit df

—

df = k - 1

Goodness-of-fit on k categories.

Phi coefficient (2×2)

—

φ = χ^{2} / n

Effect size for 2×2 χ². Larger tables use Cramér's V.

Sphericity violation correction

—

df_{adj} = ϵ \cdot df

Greenhouse-Geisser (ε estimated) or Huynh-Feldt adjust df when sphericity is violated in RM-ANOVA.

Wilcoxon signed-rank W

—

W = i : d_{i} > 0 \sum rank (∣ d_{i} ∣)

Nonparametric paired test. Ranks the absolute differences then sums ranks of positive diffs.

Diagrams

Which test do I use?

Decision flow on IV scale × DV scale × #groups × independent/paired. Categorical DV → χ²; 2 groups continuous → t; 3+ groups continuous → ANOVA; continuous-continuous → r/regression.

Sketch on paper while reading

NOIR scales of measurement

Four-row table: Nominal / Ordinal / Interval / Ratio with order, equal intervals, true zero, examples, allowable statistics.

Sketch on paper while reading

Type I / Type II error 2×2

Rows: reject vs fail-to-reject. Cols: H₀ true vs false. Cells: α / power / correct / β.

Sketch on paper while reading

FWER vs FDR

Side-by-side: FWER controls P(any false positive) — conservative — Bonferroni / Holm. FDR controls expected proportion of FPs among rejections — Benjamini-Hochberg.

Sketch on paper while reading

ANOVA SS partition

Total variability split into between-group + within-group. F = MS_between / MS_within.

Sketch on paper while reading

95% CI long-run coverage

Many simulated samples, each producing a CI. Roughly 95% of intervals contain μ. Coverage is a property of the procedure.

Sketch on paper while reading

Prior → Posterior update

Likelihood × Prior / Evidence → Posterior. Sequential updating across studies.

Sketch on paper while reading

Regression diagnostic plots

Residuals vs fitted (linearity, heteroscedasticity), Q-Q (normality), scale-location, residuals vs leverage.

Sketch on paper while reading

Scree plot + parallel analysis

Eigenvalues vs factor #. Retain factors before the elbow; parallel analysis adds a random-data baseline.

Sketch on paper while reading

Two-way ANOVA interaction plot

Cell means with one IV on x-axis, other as colored lines. Non-parallel lines = interaction.

Sketch on paper while reading

Anscombe's quartet

Four datasets sharing mean / SD / r / regression line — wildly different scatter plots. Lesson: always plot your data.

Sketch on paper while reading

Boxplot + 1.5×IQR outlier rule

Box from Q1 to Q3, median line, whiskers to 1.5×IQR; points beyond flagged as outliers.

Sketch on paper while reading

Logistic S-curve

p = 1/(1 + e^(−η)) where η = β₀ + β·x. Saturates to 0/1 at extremes.

Sketch on paper while reading

Sampling distribution of the mean

Repeated samples from population → distribution of sample means. CLT → Normal with mean μ, SD σ/√n.

Sketch on paper while reading