Saral Shiksha Yojna
Courses/Behavioral Research: Statistical Methods

Behavioral Research: Statistical Methods

CG3.402
Vinoo AlluriMonsoon 2025-264 credits
Sample Papers/200-mark mock paper (Set 3) · Paper FOUR

200-mark mock paper (Set 3) · Paper FOUR

Duration: 180 min • Max marks: 200

Section A — 0.5 mark MCQs (20 × 0.5 = 10 marks)

10 marks
  1. 1.A standard normal distribution has mean and SD: (a) 0 and 1 (b) 1 and 0 (c) 100 and 15 (d) 0.5 and 0.50.5 m
  2. 2.P(A and B) = P(A) × P(B) implies A and B are: (a) Mutually exclusive (b) Independent (c) Conditional (d) Identical0.5 m
  3. 3.A regression has residuals showing **clear upward trend** when plotted against fitted values. This indicates: (a) Normality (b) Likely model misspecification or omitted variable (c) Homoscedasticity (d) No autocorrelation0.5 m
  4. 4.Sumiran has variables IQ (continuous) and income (continuous). Best summary statistic to describe their relationship: (a) Mean of IQ (b) Pearson correlation r (c) Median income (d) Chi-square0.5 m
  5. 5.A **histogram** is best for displaying: (a) Two categorical variables (b) The distribution of one continuous variable (c) Time series (d) Multivariate clusters0.5 m
  6. 6.A study with **only 6 participants** is best described as: (a) A typical randomized trial (b) A pilot/feasibility study; severely underpowered for confirmatory inference (c) Always conclusive (d) Standardly used in meta-analyses0.5 m
  7. 7.A **balanced design** in ANOVA means: (a) Equal cell sizes across factor levels (b) Equal means (c) Equal variances (d) Equal sample sizes total0.5 m
  8. 8.A confounding variable is one that: (a) Has no effect on the DV (b) Affects both the IV and the DV, creating a non-causal association (c) Is randomly assigned (d) Only matters in correlational studies0.5 m
  9. 9.Karan reports "ANOVA F(3, 56) = 4.2, p = 0.009, η² = 0.18." η² indicates: (a) The model fits poorly (b) The factor explains 18% of variance — large effect (c) p-value (d) Sample size0.5 m
  10. 10.A boxplot shows **two outliers** above Q3 + 1.5·IQR. These should be: (a) Always removed (b) Investigated for data-entry errors or genuine extremes before deciding (c) Imputed (d) Ignored0.5 m
  11. 11.A **stratified random sample** with equal allocation across strata maximizes: (a) Sample variance (b) Representativeness of minority groups (c) Speed of data collection (d) Confidence interval width0.5 m
  12. 12.When Cronbach's α drops dramatically if one item is removed, that item is: (a) Likely a poor fit with the scale (b) An excellent item contributing greatly (c) Random noise (d) Reverse-coded0.5 m
  13. 13.A **scree plot** in factor analysis is used to: (a) Test sphericity (b) Identify the number of factors via the "elbow" in eigenvalues (c) Test normality (d) Compute factor scores0.5 m
  14. 14.**Likert items** are typically considered which scale? (a) Nominal (b) Ordinal (technically) but often treated as interval for parametric tests (c) Interval (strict) (d) Ratio0.5 m
  15. 15.A **two-tailed p = 0.07** with α = 0.05 means: (a) The result is significant (b) Reject H₀ (c) Fail to reject H₀; result is "marginal" but not significant (d) p-hacking0.5 m
  16. 16.The **central limit theorem** is most reliable for: (a) Small samples (n < 5) (b) Larger samples (typically n ≥ 30 unless the distribution is wildly non-normal) (c) Discrete data only (d) Categorical data0.5 m
  17. 17.A **causal mediation analysis** can establish causation under: (a) Any conditions (b) Random assignment of treatment AND no unmeasured confounders of mediator–outcome relationship (c) Cross-sectional data alone (d) Bootstrap CIs alone0.5 m
  18. 18.**Cohen's f²** = 0.35 in regression indicates: (a) Small effect (b) Medium effect (c) Large effect (d) No effect0.5 m
  19. 19.**Bias** of an estimator is: (a) Difference between estimator's expected value and true parameter (b) Sampling variance (c) Standard error (d) Random fluctuation0.5 m
  20. 20.A test's power approaches 1 as: (a) α decreases (b) Effect size increases without bound, holding all else fixed (c) Variance increases (d) Sample size decreases0.5 m

Section B — 1 mark MCQs (20 × 1 = 20 marks)

20 marks
  1. 1.Mansi compares **air-pollution attitudes** across 4 Indian cities (each city sampled separately, n = 80 each). DV is a continuous score. Best test: (a) Paired t-test (b) One-way ANOVA (c) Mixed ANOVA (d) Chi-square1 m
  2. 2.A logistic regression of `voted (1/0)` on `age` and `education_years` gives β_age = 0.04 (SE = 0.01). The OR per year of age is: (a) 0.04 (b) e^0.04 ≈ 1.041 (c) 0.96 (d) 4%1 m
  3. 3.Tushar runs **a paired t-test** but finds residuals (the differences) are non-normal with outliers. Best alternative: (a) Independent t-test (b) Wilcoxon signed-rank (c) Mann-Whitney U (d) Chi-square1 m
  4. 4.A linear regression's `Income = 25,000 + 5,000·Years_Experience + 8,000·Has_Degree (Y/N)`. For someone with degree and 5 years experience, predicted income is: (a) ₹38,000 (b) ₹50,000 (c) ₹58,000 (d) ₹63,0001 m
  5. 5.Varun fits OLS with VIF values 12.3 (Income) and 11.8 (Education). This signals: (a) Excellent fit (b) Severe multicollinearity — drop one or combine them (c) Heteroscedasticity (d) Outliers1 m
  6. 6.A regression with `Y ~ X` finds residuals that are autocorrelated. The correct response: (a) Ignore — OLS still works (b) Use ARIMA, GLS, or HAC-adjusted standard errors (c) Compute Cohen's d (d) Apply Bonferroni1 m
  7. 7.A research design measuring 80 students across **3 conditions × 3 time points** (each student in all 3 conditions × time points) is: (a) Between-subjects (b) Within-subjects, factorial (c) Mixed (d) Cross-sectional1 m
  8. 8.Kanika has 4 conditions × 4 time points fully within-subjects. **Friedman's test** would be appropriate when: (a) Data are continuous and normal (b) Data are ordinal or non-normal, single factor with > 2 levels (c) Multiple between-subjects factors (d) Comparing variances1 m
  9. 9.Misha models `survival_time` (days until event) with possible censoring (some participants didn't have the event by study end). Best model: (a) OLS (b) Logistic regression (c) Cox proportional hazards (d) Poisson1 m
  10. 10.A logistic regression's discrimination is evaluated via: (a) R² (b) Cohen's d (c) AUC (area under ROC curve) (d) Cronbach's α1 m
  11. 11.**Confirmatory FA** with CFI = 0.97, TLI = 0.96, RMSEA = 0.04 indicates: (a) Poor fit (b) Acceptable fit (c) Good-to-excellent fit (d) Cannot tell1 m
  12. 12.A two-way ANOVA: Factor A (3 levels) × Factor B (2 levels), n = 20 per cell. Total N and df_total: (a) 60, 59 (b) 120, 119 (c) 90, 89 (d) 24, 231 m
  13. 13.A **between-subjects** design has 30 participants per group, 4 groups. Df_within for one-way ANOVA: (a) 116 (b) 119 (c) 120 (d) 31 m
  14. 14.A **between-subjects** design comparing 5 groups via one-way ANOVA: significant. Best post-hoc: (a) Tukey HSD (b) Bonferroni (c) Either, depending on number of pairs and goals (d) Both work but report only the most significant1 m
  15. 15.A **Bayesian credible interval** of [0.3, 0.7] for a proportion means: (a) The procedure has 90% coverage (b) Given the data and prior, the probability the true proportion is in [0.3, 0.7] is 95% (or whatever level was used) (c) Frequentist coverage (d) The point estimate is 0.51 m
  16. 16.**BF₁₀ = 0.07** indicates: (a) Anecdotal evidence for H₁ (b) Strong evidence for H₀ (BF₀₁ = 1/0.07 ≈ 14) (c) Inconclusive (d) Reject H₀1 m
  17. 17.A regression with **interactions and main effects** must include the main effects whenever the interaction is in the model. This is called: (a) Marginality / hierarchy principle (b) Multicollinearity (c) Robust regression (d) Bias1 m
  18. 18.**Adjusted R²** decreases when: (a) An added predictor explains substantial variance (b) An added predictor doesn't add enough explanatory power to offset the penalty for added parameters (c) Sample size grows (d) p-value drops1 m
  19. 19.**Mahalanobis distance** in regression diagnostics measures: (a) Distance of an observation from the centroid of predictor space, accounting for covariance (b) Residual size (c) Cook's distance (d) Multicollinearity1 m
  20. 20.**Multilevel modeling** correctly handles: (a) Single-level data (b) Hierarchically nested data (students in schools, patients in hospitals) (c) Categorical outcomes only (d) Small samples1 m

Section C — 2 mark short answers (15 × 2 = 30 marks)

30 marks
  1. 1.State three differences between **observational** and **experimental** studies.2 m
  2. 2.Define **Type S error** and **Type M error** (Gelman & Tuerlinckx).2 m
  3. 3.Why is **random sampling** important for external validity?2 m
  4. 4.Distinguish **moderator** from **covariate**.2 m
  5. 5.Define **causal mediator** and **statistical mediator**.2 m
  6. 6.Define **convergent** and **divergent** thinking and discuss how researchers operationalize each.2 m
  7. 7.Define **measurement error** and distinguish **random** from **systematic**.2 m
  8. 8.State two reasons why **R² should not be the primary metric** for evaluating regression.2 m
  9. 9.Define **principal stratification** in causal inference.2 m
  10. 10.Explain why **p-value distribution under H₀** is **uniform on [0, 1]**.2 m
  11. 11.State the **assumptions** of Pearson correlation.2 m
  12. 12.Why might **adjusted R²** decrease when adding a new predictor?2 m
  13. 13.State the **Ladder of Causation** (Pearl) levels.2 m
  14. 14.Briefly explain why **standardization (z-scoring) of variables before regression** doesn't affect significance but changes interpretation.2 m
  15. 15.Define **floor effect** and **ceiling effect** in measurement.2 m

Section D — 5 mark questions (12 × 5 = 60 marks)

60 marks

    Section E — 10 mark long descriptive (8 × 10 = 80 marks)

    80 marks

      Track your attempt locally — score and time are recorded in your browser. (Coming soon: timed-attempt mode.)