# Optional Projects for an Introductory Statistics Course

**Nicholas Myers** and **Angus Phimister,** University of Edinburgh

Nick.Myers@ed.ac.uk and angusdav@outlook.com

Published July 2016

Economics students at the University of Edinburgh are required to take an introductory statistics course in the first semester of their second year. As is the problem with many introductory statistics courses, significant heterogeneity in prerequisite knowledge and engagement amongst students makes course design complicated. It is not uncommon for lecturers to see contradictory complaints on course questionnaires, where students simultaneously complain that a course is both too easy and that it is too hard.

Finding the balance in this trade-off between the needs of more- and less-able/motivated students can be exceptionally difficult. Whilst on the one hand it is important to challenge students at every part of their academic careers, there are negative longer-term consequences if a significant proportion of students do not have a sound understanding of concepts upon which later courses are based.

To address this problem, we have introduced a number of extra credit projects that students complete on an optional basis - targeted at those students who do not feel sufficiently challenged by the core course material. These projects allow the most motivated students to further engage with the subject, without over-complicating the core course material.

We offer a total of three optional projects which are due at various points in the semester. These projects explore the topics covered in lectures and tutorials in more detail, generally by asking students to apply the concepts they have been learning in class to (either real or generated) data. Students are awarded a one mark bonus to their overall grade for each project that is completed to a sufficient standard. If all three projects are completed, an additional one mark bonus is given for a total of four possible bonus marks. Out of nearly 260 students in 2015, approximately 13% of students completed all three projects, 4% completed two projects, and 5% completed just one project. As intended, these projects were completed by students who performed above average on other components of the course.

Each project can be completed using the Analysis ToolPak in Excel. Excel was recommended here because most students enter the course with a basic understanding of the software and thus minimal support is needed. Statistical software is formally introduced in subsequent econometrics courses. Parallel instructions were provided to students who were using StatPlus for Mac. Instructional documentation (consisting primarily of links to other sources) was provided alongside each project. A summary version of each assignment is provided below.

On the whole these projects were well received by students, and are a part of the course structure that we are likely to keep moving forward. Other than a small number of cases where students misunderstood the instructions, we did not encounter many problems.

#### Project 1 - Data Presentation

- Using data from the World Development Indicators, choose at least 15 countries and one series/year of interest. For example, you might select countries that use the euro and “internet users per 100 people” in 2005. List the countries, series, and the year that you have chosen.
- Report the mean, median, the first quartile, the third quartile, and the standard deviation for the series you have chosen.
- Include a histogram for the series you have chosen. Describe the shape of the distribution (e.g. is it right skewed?). Note that you will need to play around to find a bin size that works best with your data.
- Include one scatterplot where the chosen series is on the vertical axis and GDP per capita in natural logs is on the horizontal axis. Describe the relationship. If there is a discernible relationship, briefly speculate why it exists. If there is no discernible relationship, state whether you think there should be one.

#### Project 2 - The Central Limit Theorem

- Generate 200 random samples each with n observations. Generate this data from a uniform probability distribution with an upper bound of 1 and a lower bound of 0. Calculate the mean for each of the 200 samples. Use a histogram to plot the distribution of sample means for n=1, n=5, and n=30. That is, you will have three distributions of sample means.
- Calculate the mean and standard deviation for each of the three distributions.
- Briefly comment on your findings.
- For no extra marks, you can repeat steps A-C using a different parent distribution.

#### Project 3 - Regression

The original datasets for this assignment were downloaded from the Journal of Statistics Education data archive. Students have a choice between the “Woodard” dataset (predicting house prices) and the “Kuiper” dataset (predicting used car prices). Before releasing the datasets to students, some predictor variables were deleted to make the project more straightforward.

- Choose from one of the two datasets available on the course website. Run a regression with all possible predictor variables included in the model. Interpret one of the parameter estimates in context of the data and test the hypothesis that the associated population parameter is equal to zero. Provide regression output in your answer.
- Calculate the residual for one observation. Interpret the residual in context of the data.
- Eliminate one of the predictor variables and provide the new regression output. Do the parameter estimates change? Does the R-squared or the adjusted R-squared change? Briefly describe (in general) why these changes might occur.

### References

Kuiper, Shonda. "Introduction to Multiple Regression: How Much Is Your Car Worth?" *Journal of Statistics Education* 16.3 (2008).

Woodard, Roger, and Jason Leone. "A Random Sample of Wake County, North Carolina Residential Real Estate Plots." *Journal of Statistics Education* 16.3 (2008).

World Development Indicators, The World Bank

- 5767 reads