William H. Knapp III

You will not be able to submit your work for credit, because you are not logged in. Log in!

This homework was due on Friday, December 21 at 06:00 a.m. Turkish time. Late submissions receive half credit.

By checking the box below, you certify that the answers you will submit here represent your own work.

1. Imagine you're interested in studying the relationship between of social economic status (SES) and intrinsic motivation at a very expensive school where there are no students with a low SES. What problem does your study suffer from?
Homoscedasticity
Partial coefficient deflation
Partial coefficient inflation
Regression towards the Mean
Restriction of Range

2. Imagine you're interested in studying the relationship between of social economic status (SES) and intrinsic motivation at a very elite school where all of the students are highly motivated. What problem does your study suffer from?
Homoscedasticity
Partial coefficient deflation
Partial coefficient inflation
Regression towards the Mean
Restriction of Range

3. Imagine you have developed a new treatment for Anorexia. To test your treatment, you ask for volunteers from new patients entering your clinic who have been diagnosed with anorexia. You measure their body weight before and after your treatment and find that there was a statistically significant increase in body weight. What problem does your study suffer from?
Homoscedasticity
Partial coefficient deflation
Partial coefficient inflation
Regression towards the Mean
Restriction of Range

4. Problems with restriction of range can occur when you restrict the range of your independent variable.
True
False

5. Problems with restriction of range can occur when you restrict the range of your dependent variable.
True
False

6. Regression towards the mean will occur in which of the following situations. (Choose all that apply)
A perfect negative correlation
A perfect positive correlation
A strong negative correlation
A strong positive correlation
A weak negative correlation
A weak positive correlation

7. In which situation will regression towards the mean occur most strongly?
A perfect correlation
A strong correlation
A weak correlation

8. In which situation will regression towards the mean occur most strongly?
With mildly extreme scores
With very extreme scores
With non-extreme scores

9. Regression towards the mean does not occur in the real world. It is only an artifact of fitting a linear model to real world data.
True
False

10. According to regression to the mean, one individual with an extreme score on one variable will necessarily have less extreme scores on another variable.
True
False

11. According to regression to the mean, on average, individuals with extreme scores on one variable will typically have less extreme scores on another variable.
True
False

12. This is the fifth homework that you'll use data about the class' performance on homework and exams. The data contain a lot of different variables. I encourage you to use str to take a look at the data. stud contains a number linking students to their various scores. hw1-hw26 contains the scores students got on the homework assignments up to this point. t1 and t2 are the test scores for the first and second exams. hwavg contains the mean score for the 26 assignments. submissions contains the number of times students have submitted homework. Finally, hwcompleted contains the number of homework assignments that students completed (i.e. homework that they submitted regardless of the score they received.)
For the next few questions, we'll be concerned with predicting test 2 scores from the test 1 scores.
In previous homework we have seen that the correlation between test 1 & 2 scores equals 0.6701013 and that the corresponding p-value for a hypothesis of no population correlation is 8.874e-07. Let's see how restricting the range of values can affect what we find. To do this, we need to get some of the data. For the first two questions lets restrict the data to test one scores between 60 and 80. If you loaded the data set into a variable called data, you can use the following code to get the restricted data set.
rdata=data[data$t1>60 & data$t1<80,]
If you did everything correctly and type in str(rdata) you should see that you now have 13 observations of 32 variables.
What's the correlation between test 1 and test 2 in this restricted data set?
If you used cor.test to find the correlation, you should see that the p-value is now 0.5841.
NOTICE: This correlation is weaker and less significant than the correlation for the entire population of this semester's test 1 and test 2 scores.

13. What is the correlation if we restrict our observations to individuals who got more than a 70 on the first exam?
HINT: Instead of using something like rdata=data[data$t1>60 & data$t1<80,], you should use something like rdata=data[data\$t1>70,]. If you do everything correctly and type in str(rdata), you should see that you now have 31 observations of 32 variables. If you used cor.test to find the correlation, you should see that the p-value is now 0.001028.
NOTICE: This correlation is weaker and less significant than the correlation for the entire population of this semester's test 1 and test 2 scores.

14. What is the correlation if we restrict our observations to individuals who got less than a 70 on the first exam?
HINT: If you do everything correctly and use str() for your new data frame, you should see that you now have 12 observations of 32 variables. If you used cor.test to find the correlation, you should see that the p-value is now 0.06402.
NOTICE: This correlation is weaker and less significant than the correlation for the entire population of this semester's test 1 and test 2 scores.

15. Let's see how restriction of range on a dependent variable affects our correlations. What is the correlation if we restrict our observations to individuals who got less than a 70 on the SECOND exam?
HINT: If you do everything correctly and use str() for your new data frame, you should see that you now have 21 observations of 32 variables. If you used cor.test to find the correlation, you should see that the p-value is now 0.003318.
NOTICE: This correlation is weaker and less significant than the correlation for the entire population of this semester's test 1 and test 2 scores.

16. Let's see how restriction of range on a dependent variable affects our correlations. What is the correlation if we restrict our observations to individuals who got more than a 70 on the SECOND exam?
HINT: If you do everything correctly and use str() for your new data frame, you should see that you now have 22 observations of 32 variables. If you used cor.test to find the correlation, you should see that the p-value is now 0.07124.
NOTICE: This correlation is weaker and less significant than the correlation for the entire population of this semester's test 1 and test 2 scores.

17. For the last few questions, let's take a look at regression to the mean. To make this easier, we're first going to need to convert our test one and test two scores to z-scores. Although I prefer using n in the denominator for z-scores than n-1, using either method will give us comparable results (one in terms of sample standard deviations and the other in terms of estimated population standard deviations). If you have the original data attached and available, you can use the following code to get your z-scores (we're using the population estimation to simplify the expression in R).
zt1=(t1-mean(t1))/sd(t1)
zt2=(t2-mean(t2))/sd(t2)
NOTE: When I refer to scores throughout the next set of questions, I mean the z-transformed scores that we just computed.
To see regression to the mean, we'll compare the means of our scores for observations that are 'extreme' on one variable. If regression to the mean occurs for our samples, we should se that the mean scores on the 'extreme' variable are abolutely greater (i.e. more extreme) than the mean scores on the paired variable. To help you get started, I'll do an example myself.
The mean score for t1 scores that are more than .5 standard deviations below the mean is:
mean(zt1[zt1<(-.5)])
-1.348251
The mean score for t2 scores that are associated with t1 scores that are more than .5 standard deviations below the mean is:
mean(zt2[zt1<(-.5)])
-0.7820995
Here we see that the mean of the 'extreme' scores was more extreme than the mean of the scores of the other variable. Thus, on average, extreme scores on one variable were associated with less extreme scores on another variable.
NOTE: Inside the brackets, both formulas are referring to the variable we use to define extremity. Make sure you use the appropriate variables and indicies of extremity in the following questions.
WARNING: The parentheses (i.e. '(-.5)') are important for negative values here because '<-' means something special in R.
What is the mean score for t2 scores that are more than .5 standard deviations below the mean?

18. What is the mean score for t1 scores that are associated with t2 scores that are more than .5 standard deviations below the mean?
HINT: If you did this question and the previous one correctly, you should find that the answer to this question is absolutely smaller (i.e. less extreme) than mean for the 'extreme' t2 scores.

19. What is the mean score for t1 scores that are more than .5 standard deviations above the mean?

20. What is the mean score for t2 scores that are associated with t1 scores that are more than .5 standard deviations above the mean?
HINT: If you did this question and the previous one correctly, you should find that the answer to this question is absolutely smaller (i.e. less extreme) than mean for the 'extreme' t1 scores.
OPTIONAL: Play around with different amounts of extremity. On average you should see regression towards the mean to have a greater effect for more extreme values. You might also find that for some sets of 'extreme' scores, we don't see the pattern predicted by regression to the mean.
OPTIONAL: For example compare:
OPTIONAL: mean(zt1[zt1>.25 & zt1<.5])
OPTIONAL: and
OPTIONAL: mean(zt2[zt1>.25 & zt1<.5])
OPTIONAL: Why do you think this is?