Behavioral Research: Statistical Methods
CG3.402Vinoo Alluri•Monsoon 2025-26•4 credits
PYQ-style paper · Paper A
Duration: 120 min • Max marks: 50
Section 1 — Objective Questions (15 marks)
0 marks- 1.Priya studies food preferences at IIIT-H by surveying friends in her hostel mess who then recommend their friends, who then recommend more friends. The biggest threat to her sample is: (a) Equal gender representation ensures validity (b) Snowball sampling undermines randomness; sample reflects social network, not population (c) Hidden populations cannot be quantitatively studied (d) Stratification corrects referral bias
- 2.Ranjit is measuring "level of agreement" on a 7-point scale from "Strongly Disagree" to "Strongly Agree". Which scale is this? (a) Nominal (b) Ordinal (c) Interval (d) Ratio
- 3.A clinical psychologist measures depression at the start of therapy and again 8 weeks later. Patients who scored extremely high initially show large reductions even before therapy begins. This is likely: (a) Maturation effect (b) Testing effect (c) Regression to the mean (d) History effect
- 4.Kavya posts a survey about reading habits on a literature-focused Instagram page. Which bias most likely affects her results? (a) Non-response bias (b) Selection bias (c) Experimenter bias (d) Belief bias
- 5.The sample standard deviation `s = √[Σ(xᵢ − x̄)² / (n − 1)]` is: (a) An unbiased estimator of the population SD σ (b) A biased estimator of σ, though it uses Bessel's correction (c) Always smaller than σ (d) Always larger than σ
- 6.The R command `pnorm(2)` returns approximately: (a) 0.05 (b) 0.95 (c) 0.975 (d) 1.96
- 7.Standard error of the mean (SEM) is: (a) σ × √n (b) σ / √n (c) σ² / n (d) The same as SD of the population
- 8.A weather forecaster combines past rainfall data with personal judgment from current cloud patterns to estimate tomorrow's rain probability. This best fits: (a) Frequentist probability (b) Bayesian probability (c) Empirical probability (d) Classical probability
- 9.A 95% CI for the mean exam score is [62, 71]. Which is the correct interpretation? (a) 95% of students score between 62 and 71 (b) The true mean is between 62 and 71 with probability 0.95 (c) If we repeated this procedure, 95% of such intervals would contain the true mean (d) The sample mean is exactly 66.5
- 10.Reaction-time data from a perceptual task are typically: (a) Symmetric (b) Right-skewed (c) Left-skewed (d) Uniform
- 11.Aisha runs a between-subjects study comparing focus levels (1=very low, 5=very high) between meditation (n=30) and control (n=30) groups. Data are heavily non-normal with severe outliers. The best test is: (a) Independent t-test (b) Mann-Whitney U test (c) Paired t-test (d) ANOVA
- 12.A pharma company tests three drug doses (low, medium, high) on the same 30 patients across three sessions one week apart. Which test is appropriate? (a) One-way ANOVA (b) Repeated-measures ANOVA (c) Mixed ANOVA (d) Independent t-tests
- 13.A regression model predicts `salary` from `years_of_experience`, `years_of_education`, and `age`. VIFs are 9.2, 8.7, 11.3. Most likely problem: (a) Heteroscedasticity (b) Multicollinearity (c) Non-linearity (d) Outliers
- 14.A high Cook's distance for an observation indicates: (a) Perfect collinearity (b) An influential point that disproportionately affects coefficients (c) Normal residuals (d) Heteroscedasticity
- 15.Score = 20 + 6×(Hours_Studied) + 1.5×(Hours_Slept). The best interpretation of the coefficient on `Hours_Studied`: (a) Studying more always increases score (b) Each extra hour of study raises predicted score by 6 points, holding sleep constant (c) Score increases by 7.5 per hour studied (d) Sleep has no effect
- 16.Rajat fits a linear regression of `anxiety_score` on `daily_phone_use_hours` and finds R² = 0.36. The correct interpretation: (a) 36% of users have anxiety (b) 36% of the variance in anxiety is explained by phone use (c) The correlation is 0.36 (d) 64% of predictions are wrong
- 17.A researcher tests 8 separate hypotheses at α = .05 each. Using Bonferroni, the per-test threshold becomes: (a) 0.05 (b) 0.00625 (c) 0.4 (d) 0.025
- 18.Saumya fits a logistic regression of `attended_class (yes/no)` on `commute_distance` and gets a negative coefficient for distance. The interpretation: (a) Distance doesn't matter (b) Greater commute distance is associated with lower probability of attending, holding other vars constant (c) Distance reduces actual attendance hours (d) The model is misspecified
- 19.A nutritionist wants to model `number_of_sick_days_per_year` as a function of `BMI` and `stress_score`. The count outcome is best modelled with: (a) Linear OLS regression (b) Logistic regression (c) Poisson regression / GLM with log link (d) Chi-square
- 20.AIC values for four candidate models are 421.5, 408.2, 415.8, 430.1. The preferred model has AIC: (a) 421.5 (b) 408.2 (c) 415.8 (d) 430.1
Section 2 — Short Descriptive (15 marks)
0 marksSection 3 — Long Descriptive (20 marks)
0 marksTrack your attempt locally — score and time are recorded in your browser. (Coming soon: timed-attempt mode.)