Courses/Behavioral Research: Statistical Methods

Behavioral Research: Statistical Methods

CG3.402

Vinoo Alluri•Monsoon 2025-26•4 credits

Sample Papers/PYQ-style paper · Paper A

PYQ-style paper · Paper A

Duration: 120 min • Max marks: 50

Section 1 — Objective Questions (15 marks)

0 marks

1.Priya studies food preferences at IIIT-H by surveying friends in her hostel mess who then recommend their friends, who then recommend more friends. The biggest threat to her sample is: (a) Equal gender representation ensures validity (b) Snowball sampling undermines randomness; sample reflects social network, not population (c) Hidden populations cannot be quantitatively studied (d) Stratification corrects referral bias
2.Ranjit is measuring "level of agreement" on a 7-point scale from "Strongly Disagree" to "Strongly Agree". Which scale is this? (a) Nominal (b) Ordinal (c) Interval (d) Ratio
3.A clinical psychologist measures depression at the start of therapy and again 8 weeks later. Patients who scored extremely high initially show large reductions even before therapy begins. This is likely: (a) Maturation effect (b) Testing effect (c) Regression to the mean (d) History effect
4.Kavya posts a survey about reading habits on a literature-focused Instagram page. Which bias most likely affects her results? (a) Non-response bias (b) Selection bias (c) Experimenter bias (d) Belief bias
5.The sample standard deviation s = √[Σ(xᵢ − x̄)² / (n − 1)] is: (a) An unbiased estimator of the population SD σ (b) A biased estimator of σ, though it uses Bessel's correction (c) Always smaller than σ (d) Always larger than σ
6.The R command pnorm(2) returns approximately: (a) 0.05 (b) 0.95 (c) 0.975 (d) 1.96
7.Standard error of the mean (SEM) is: (a) σ × √n (b) σ / √n (c) σ² / n (d) The same as SD of the population
8.A weather forecaster combines past rainfall data with personal judgment from current cloud patterns to estimate tomorrow's rain probability. This best fits: (a) Frequentist probability (b) Bayesian probability (c) Empirical probability (d) Classical probability
9.A 95% CI for the mean exam score is [62, 71]. Which is the correct interpretation? (a) 95% of students score between 62 and 71 (b) The true mean is between 62 and 71 with probability 0.95 (c) If we repeated this procedure, 95% of such intervals would contain the true mean (d) The sample mean is exactly 66.5
10.Reaction-time data from a perceptual task are typically: (a) Symmetric (b) Right-skewed (c) Left-skewed (d) Uniform
11.Aisha runs a between-subjects study comparing focus levels (1=very low, 5=very high) between meditation (n=30) and control (n=30) groups. Data are heavily non-normal with severe outliers. The best test is: (a) Independent t-test (b) Mann-Whitney U test (c) Paired t-test (d) ANOVA
12.A pharma company tests three drug doses (low, medium, high) on the same 30 patients across three sessions one week apart. Which test is appropriate? (a) One-way ANOVA (b) Repeated-measures ANOVA (c) Mixed ANOVA (d) Independent t-tests
13.A regression model predicts salary from years_of_experience, years_of_education, and age. VIFs are 9.2, 8.7, 11.3. Most likely problem: (a) Heteroscedasticity (b) Multicollinearity (c) Non-linearity (d) Outliers
14.A high Cook's distance for an observation indicates: (a) Perfect collinearity (b) An influential point that disproportionately affects coefficients (c) Normal residuals (d) Heteroscedasticity
15.Score = 20 + 6×(Hours_Studied) + 1.5×(Hours_Slept). The best interpretation of the coefficient on Hours_Studied: (a) Studying more always increases score (b) Each extra hour of study raises predicted score by 6 points, holding sleep constant (c) Score increases by 7.5 per hour studied (d) Sleep has no effect
16.Rajat fits a linear regression of anxiety_score on daily_phone_use_hours and finds R² = 0.36. The correct interpretation: (a) 36% of users have anxiety (b) 36% of the variance in anxiety is explained by phone use (c) The correlation is 0.36 (d) 64% of predictions are wrong
17.A researcher tests 8 separate hypotheses at α = .05 each. Using Bonferroni, the per-test threshold becomes: (a) 0.05 (b) 0.00625 (c) 0.4 (d) 0.025
18.Saumya fits a logistic regression of attended_class (yes/no) on commute_distance and gets a negative coefficient for distance. The interpretation: (a) Distance doesn't matter (b) Greater commute distance is associated with lower probability of attending, holding other vars constant (c) Distance reduces actual attendance hours (d) The model is misspecified
19.A nutritionist wants to model number_of_sick_days_per_year as a function of BMI and stress_score. The count outcome is best modelled with: (a) Linear OLS regression (b) Logistic regression (c) Poisson regression / GLM with log link (d) Chi-square
20.AIC values for four candidate models are 421.5, 408.2, 415.8, 430.1. The preferred model has AIC: (a) 421.5 (b) 408.2 (c) 415.8 (d) 430.1

Section 2 — Short Descriptive (15 marks)

0 marks

Section 3 — Long Descriptive (20 marks)

0 marks

Track your attempt locally — score and time are recorded in your browser. (Coming soon: timed-attempt mode.)