Use of statistical and econometric software

This is one of three topics in Theme 2: Teaching with data online of the 2020 virtual symposium.

Increasingly students and employers see value in students having practical data handling and coding skills. The material below has a list of popular pieces of software. You will most likely, in the short-run, want to stick to the software which you have been using. It is also likely that there are unsolved local issues when it comes to using licenced software which would normally be available on campus.

The material below outlines some of the generic challenges we are facing when delivering coding training and a few tips on the use of R and Excel.

One major issue we are facing is that of replacing face-to-face computer labs. Go to the Discussion Board on Piazza - (Access Code: C19 in case you log in for the first time) and let us know how you are thinking of replacing computer labs if we cannot run them on campus.

Which software?

 

Advantages

Disadvantages

Excel

- familiar interface

- everyone needs working knowledge

- typically free for students

- data are always visible

- only simple stats and econometrics is possible

- reproducibility

- lack of documentation of the work process

Eviews

- menu-driven, hence sort of familiar interface

- cheap student version

- pre-programmed common procedures

- reproducibility

- lack of documentation of the work process

- free version with limitations

STATA

- common in academic and government use

- powerful in terms of types of analysis and size of datasets

- reproducibility

- large online support

- menu interface is available

- learning curve

- Student version (£80 annual)

R

- increasingly common in government and business use

powerful in terms of types of analysis and size of datasets

- reproducibility

- large online support

- free

- R can do much more than econometrics, e.g. visualisations

- steep learning curve

- setup needs support

- have to deal with potentially different versions unless using RStudio Cloud

Do classes have to be live?

In traditional, face-to-face teaching of econometrics, the computer lab classes are a very good opportunity for students to get hands-on experience of data analysis.  In an in person class, instructors can monitor students by looking at what is appearing on their screen, and offering help, if needed, or a student can easily show the instructor their screen, if they think that they've made an error, or have got stuck.

In an online environment, this is harder to achieve, but there are still some opportunities; with live classes on platforms like Zoom or Collaborate, students can share their screen with the rest of the class (or you could make use of break-out rooms, where small numbers of students share their screens with each other).

Students can already make use of online resources - there is an online R-community, and lots of online Stata resources (such as those at UCLA, or Princeton), which can help students to analyse data, particularly if they are not as time-constrained, which they may have been in live classes.  Similarly, if you make use of a message board, students can collaborate together to help to find solutions to problems of data analysis.

There is not a simple answer for whether lab classes need to be live, or could be held asynchronously.  If you have any thoughts, or any questions that you want to address, you could post them on the Piazza message boards (Access code C19).

Designing computer lab classes

Running computer labs online will be a serious challenge for any econometrics course with a practical component. It is likely that we will have to provide students with more detailed instructions on how to achieve what we want them to achieve with an econometric software. This is as we are likely to have to rely more on students working through problems by themselves. The quick look over the shoulder in a computer lab is not a tool we may have available.

This also means that we will have to improve the coding resilience of our students, meaning that we will have to help them develop and practice the skills and tools required to overcome difficulties when coding. This is true whether you are asking your students to learn a menu-driven econometric software (like EViews or SPSS) or a code based software (like STATA, R, Python or MATLAB).

The generic skills which are likely to help your students are

  • Using the help function
  • Searching on the internet for help
  • Understanding error messages and
  • Finding errors or debugging

It is therefore important to not only present your students with pristine and fully working code or instructions on how to achieve certain things, but you also need to expose them to the difficulties and frustrations they are likely to encounter and to the above strategies which will eventually allow them to overcome these difficulties.

In this online video (YouTube, 11.29 min) Ralf Becker (University of Manchester) discusses how to include such elements into a computer lab. He uses R as an example but the same principles apply to any other software.

The following material is used in this clip:

Datafiles: Mobility Data, Covid-19 policy and case data

Basic Computer Lab: Worksheet, Rmd code for worksheet

Skill-based Computer Lab: Worksheet, Rmd code for worksheet

You could also think of providing your students with a basic cheat sheet which does have a section on the generic coding skills.

RStudio Cloud

In case you are using R as your econometric software, you may be wondering how to run computer labs online.

As R and its most commonly used front-end, RStudio, are free softwares, all students can download these and install them on their own computers. But there is an issue. If you do this, then you need to expect to help many of your students in that process. And while the installation process is normally quite straightforward, it also means that you and your students have to spend time on this process before you have even added 2 + 2 in your software. When you have face to face classes the result is that I am typically happy if at the end of the first hour all students have a datafile loaded into the software.

There is a solution to this problem. You and your students can use R and RStudio in the cloud. All your students do need is a login from https://rstudio.cloud/. That is for free. You can then all use R on the web.

Importantly it means that you can ensure that all your students have access to exactly the same computing environment. You don’t have to worry about whether they have downloaded all required files. If you made them available they will be there. The same with packages, if you have them made available they will be there. All of this means is that that you can start doing cool stuff right from the start.

There are only two downsides:

  1. For the time being the service is free. By default you can only have 10 people in a space (something like a class). But you can ask RStudio to give you more space. In the medium run they may start charging for that service.
  2. All of that annoying stuff you avoid, installing, downloading packages etc. students will still have to learn for when they work by themselves. But if you use RStudio Cloud you can delay this pain until your students have understood what the value of the coding skill is. At that point it will be easier to get students to engage with that process.

So here are two places for you to start.

Introduction to RStudio Cloud by Mel Gregory from RStudio (YouTube, 24.05 min)

A cheat sheet for Teachers using RStudio Cloud.

Excel

Most students arrive at University having gained some experience of spreadsheet packages, such as Excel (although they may not have used it for some time).

Because of this familiarity, it can be useful to introduce OLS regressions to students through Excel, not least as OLS has functionality built in to allow you to estimate OLS, and to carry out simple statistical testing (such as paired t-tests), using the “Data Analysis” toolkit.

As a default, the data analysis toolkit is deactivated in Excel, and you would need to activate it; details of how to activate this for Windows versions of Excel is given below, or is available online. (Mac instructions are also available online).

There are some significant advantages to using Excel for introductory econometrics;

  • students can see the data, and get a real feel for what the variables they are using are.
  • For simple, univariate, regressions, you can ask students to use simple functions within excel (such as =sumproduct) to manually construct the OLS estimates, and then instantly compare with the regression results
  • You can perform transformations to the variables, and students can instantly see the impact of the transformation on the variables, and then see the impact on the regression results.
  • Unlike Stata, or R, students don’t need to learn any new commands to use the software; if they have any familiarity with Excel, it is usually quite easy for them to produce regression results, (even if they make a few errors along the way).

Whilst Excel can carry out simple OLS, with multiple explanatory variables, as a package for more advanced courses, it is more limited:

  • You are largely limited to OLS, although that does mean that you can use Excel to estimate relationships with simple identification strategies, such as difference in difference, and regression discontinuity designs.  If you wish to use other estimation techniques (as, for instance, you have limited dependent variables), then Excel provides much more limited use.
  • The interface, whilst familiar to students, is a little more limited; because data is not necessarily linked to a variable name, it is not easily possible (as in Stata, for instance) to call up regressions, simply by defining variable names.

Because of students’ familiarity with Excel, I find that this provides a very good introduction to estimation using OLS, and allows students to get hands-on experience of estimation, without large amounts of up-front investment in time.

EVIEWS

 

Next topic: How to motivate students in quants courses