Exercise 13: Resampling methods#
This homework assignment is designed to give you practice with bootstrapping and permutation tests.
You will need to download the unrestricted_trimmed_1_7_2020_10_50_44.csv file from the Homework/hcp_data folder in the class GitHub repository.
This data is a portion of the Human Connectome Project database. It provides measures of cognitive tasks and brain morphology measuresments from 1206 participants. The full description of each variable is provided in the HCP_S1200_DataDictionary_April_20_2018.csv file in the Homework/hcp_data folder in the class GitHub repository.
1. Loading & Visualizing the Data (1 point)#
Use the setwd
and read.csv
functions to load data from the unrestricted_trimmed_1_7_2020_10_50_44.csv file.
(a) Using the tidyverse tools, make a new dataframe d1
that only inclues the subject ID (Subject
), gender (Gender
, self reported at time of data collection), Flanker Task performance (Flanker_Unadj
), total intracranial volume (FS_IntraCranial_Vol
), total white matter volume (FS_Tot_WM_Vol
), and total grey matter volume (FS_Total_GM_Vol
) variables and remove all na values.
Use the head
function to look at the first few rows of each data frame.
# WRITE YOUR CODE HERE
(b) Plot grey matter volume (x axis) against intracranial volume (y axis) and Gender (point color).
# WRITE YOUR CODE HERE
What patterns do you observe in the scatter plot?
Write your response here
2. Logistic classifier (2 points)#
We want to try predicting gender using the neural data you have loaded.
(a) Run a logisic regression model to predict gender from total white matter volume, total grey matter volume, and intracranial volume.
# WRITE YOUR CODE HERE
Which factors are signficantly associated with gender?
Write your response here
(b) Estimate the prediction accuracy of your model (Note: this is the training set accuracy). Set your prediction threshold to 0.5.
# WRITE YOUR CODE HERE
What is the prediction accuracy for gender from the full model?
Write your response here
3. Bootstrapped accuracy (3 points)#
Use bootstrapping to estimate the confidence intervals of the prediction accuracy of your model (i.e., the confidence of the correlation between \(\hat{y}\) and \(y\)). Plot the histogram of the bootstrapped prediction accuracies and estimate the confidence intervals off of the standard deviation from the bootstrap.
# WRITE YOUR CODE HERE
How robust is the prediction accuracy of the full model?
Write your response here
4. Permutation test for grey matter effects (3 points)#
Now run a permutation test, with 1000 iterations, to evaluate how much grey matter volume contributes to the prediction accuracy. Compare the prediction accuracy of the full (unpermuted model) with the distribution of accuracies you get with a randomized grey matter volume term using a histogram (Hint: use the abline
function to show the original accuracy on the histogram).
# WRITE YOUR CODE HERE
How much does the grey matter volume influence the prediction accuracy of the model?
Write your response here
5. Reflection (1 point)#
Differentiate the bootstrap from a permutation test. Describe each and when is it appropriate to each.
Write your response here
DUE: 5pm EST, March 27, 2024
IMPORTANT Did you collaborate with anyone on this assignment? If so, list their names here.
Someone’s Name