William H. Knapp III

You will not be able to submit your work for credit, because you are not logged in. Log in!

This homework was due on Monday, December 10 at 06:00 a.m. Turkish time. Late submissions receive half credit.

By checking the box below, you certify that the answers you will submit here represent your own work.

1. For this homework, you're going to analyze data about the class' performance on homework and exams. The data contain a lot of different variables. I encourage you to use str to take a look at the data. stud contains a number linking students to their various scores. hw1-hw26 contains the scores students got on the homework assignments up to this point. t1 and t2 are the test scores for the first and second exams. hwavg contains the mean score for the 26 assignments. submissions contains the number of times students have submitted homework. Finally, hwcompleted contains the number of homework assignments that students completed (i.e. homework that they submitted regardless of the score they received.)
What's the covariance for the first and second tests?

2. What's the correlation for the first and second tests?
When doing correlations, I encourage you to use scatterplots to visualize the data. Visualizing the data will help you to understand the relationship better and allow you to develop an intuitive sense for what different correlations look like. To visualize the data, use plot(your_independent_variable, your_dependent_variable)
We'll see how to test the correlations later, but the correlation you'll find if you do it correctly is highly significant.

3. What's the covariance of the mean homework scores and the second test?

4. What's the correlation between the mean homework scores and the second test?
This correlation is also highly significant.

5. What's the covariance of the number of homework submissions and the second test?

6. What's the correlation between the number of homework submissions and the second test?
This correlation is also highly significant.

7. What's the covariance of the number of completed homework assignments and the second test?

8. What's the correlation between the number of completed homework assignments and the second test?
This correlation is also highly significant.

9. When analyzing data, it is often common to exclude highly atypical scores that might exert undue influence on the results. Let's do a box plot to see if there are any outliers. To create a box plot type in boxplot(whatever_your_test_2_variable_name_is) and use whatever your test 2 variable name is in place of whatever_your_test_2_variable_name_is.
If you attached the data, you can just use the following:
boxplot(t2)
How many outliers are identified by the box plot?

10. We're going to use a different definition of atypical here. We're going to remove the data for any students who scored more than 2 deviations away from the mean on the second exam. Before we can do this we need to identify these students. To do this we can use the following code if we make the appropriate substitutions for the variable names.
your_student_variable[your_test_2_variable>(mean(your_test_2_variable)+2*sd(your_test_2_variable)) | your_test_2_variable<(mean(your_test_2_variable)-2*sd(your_test_2_variable))]
'|' means 'or'
Thus, the code says get the students who have scores greater than 2 standard deviations above the mean OR 2 standard deviations below the mean.
If you attached the data, you can just use the following:
stud[t2<(mean(t2)-2*sd(t2)) | t2>(mean(t2)+2*sd(t2))]
How many students meet these criteria?

11. Ok, let's get rid of those students. If you named the data frame data and attached it you can just use the code below. If not, you'll need to adjust the code. data2=data[t2>(mean(t2)-2*sd(t2)) & t2<(mean(t2)+2*sd(t2)),]
Inside the brackets is something very similar to what we had before. But instead of saying
scores LESS THAN 2 standard deviations below the mean OR MORE THAN 2 standard deviations above the mean, we are now saying
scores MORE THAN 2 standard deviations below the mean AND LESS THAN 2 standard deviations above the mean.
The final comma means that we want to get all of the data in the data frame that meet our criteria. Thus this code will get us all the data associated with test scores that are within 2 standard deviations of the mean. If you did this properly and use str(data2), you should see that you have "'data.frame': 40 obs. of 32 variables:". If you don't see that, check your code and try again.
For the rest of this homework, when I say analyze some variables, you should analyze the variables from this new data frame.
Note: for people attaching data frames. If you attach the new data frame you may get an error that the "objects are masked." This means that attaching the data frame has hidden the earlier versions of the named variables. So if you submit the homework and get something wrong on the first part and want to try again, just make sure you attach the right data frame so you're working with the right information.
What's the covariance for the first and second tests?

12. What's the correlation for the first and second tests?
This correlation is also highly significant.

13. What can you conclude from this last correlation? Remember it was highly statistically significant.
Scoring higher on test 1 causes higher test 2 scores.
Scoring higher on test 1 is related to higher test 2 scores.
None of the above.

14. What's the covariance of the mean homework scores and the second test?

15. What's the correlation between the mean homework scores and the second test?
This correlation is also highly significant.

16. What's the covariance of the number of homework submissions and the second test?

17. What's the correlation between the number of homework submissions and the second test?
This correlation is significant.

18. What's the covariance of the number of completed homework assignments and the second test?

19. What's the correlation between the number of completed homework assignments and the second test?
This correlation is also highly significant.

20. What can you conclude from this last correlation? Remember it was highly statistically significant.
Completing more homework causes increases in test 2 scores.
Completing more homework is related to increases in test 2 scores.
None of the above.