Economics is an empirical science and results of empirical investigations are frequently cited and discussed in undergraduate level courses in many fields. Labour economics is among the subjects where econometric research plays an especially important role - any modern labour text contains references to dozens if not hundreds of papers reporting results of empirical studies. However, rarely (if ever) are students involved in doing empirical research in their own during their undergraduate studies by, e.g., replicating published investigations using the same or other, more recent data.
Often this lack of experience with "doing empirics" leads to a lack of "feeling" for the merits and limits of empirical research, hindering a deeper understanding of what's going on in the subject, and leading to a lack of motivation for conducting their own econometric research during graduate studies. Many teachers, especially if they are involved in empirical research in their day to day work, agree with this, but point to difficulties related to any course that gets students to perform their own statistical investigations: lack of experience with econometric theory and statistical software, lack of suitable data, lack of computer facilities, etc.
This paper reports on a one semester course in PC-aided empirical labour market research using "real life" micro data to show that due to the availability of low cost PCs, easy to handle econometric software packages, and large scale public use micro data any of these obstacles can quite easily be overcome.
Students have the usual background in micro and macroeconomics, maths and statistics. All students had an introductory course in computing, teaching them to use a PC (or Mac) for, eg., word processing or working with a spreadsheet program; the degree of experience with computers, however, varies greatly among students. Furthermore, only few students take an introductory course in econometrics.
The aim of the course is to familiarize students with doing empirical research on a PC using large scale micro data sets and econometric methods. Given the background described in the last section and the time constraint of 15 two-hour sessions, I proceed as follows:
In the first session students are introduced to the use of the PCs in the computer room - there are eight computers with an i486 processor, and three to four students have to share a PC. Furthermore, those who are not familiar with the DOS editor are introduced to TED, a small and easy to learn editor that comes with SHAZAM, the econometrics package used in the course. During this session, students create a file with a small sample data set consisting of eight variables (income, sex, age, education, etc.) for 30 people.
The second session introduces students to SHAZAM (WHITE 1993), a widely used econometrics package that is available for a number of platforms, and that can be used not only to perform descriptive statistics and econometric analyses, but to solve linear systems of equations (e.g. input-output models) and microeconomic general equilibrium models as well (cf. WAGNER 1994). The University of Lueneberg bought a campus license for SHAZAM, so students are allowed to copy the program and to use it on their own PCs at home legally.
To introduce students to SHAZAM I prepared and distributed an example job (with extensive lines of comment in it) that uses the sample data set created in session one to illustrate the following facilities: creating an output file; reading data from a file and printing data on the screen and to a file; generating new variables; doing conditional transformations (if - then - else); skipping observations; computing descriptive statistics.
Students are expected to have a copy of the SHAZAM handbook available, and they are asked to solve some problems using the sample data set mentioned during the session and after.
The third session introduces the micro data set. I use data from the 1990 wave of the ALLBUS, a household survey performed every second year since 1980 for a representative sample of the adult German population. These data are public use data available at low costs, and they are widely used in social science research. Moreover, the data can be distributed to students for teaching purposes (other than e.g. the strictly confidential data from the German household panel). The data come as an SPSS-PC export file with a voluminous printed codebook of several hundred pages. I prepared a raw data file covering all 3051 persons and a selection of 154 variables. These data sets were given to the students together with a copy of the relevant parts of the codebook.
Students are asked to write a SHAZAM job that reads the data and performs descriptive statistics on a number of variables. Afterwards, results from the computations are compared to the statistics printed in the codebook. By doing this students get familiar with the data documentation and the various codes used for 'do not know', 'refused to answer', 'question not relevant for person' etc..
During the next five sessions students perform descriptive empirical analyses with the microdata on two topics: (1) How does labour supply vary by sex, age and family background (e.g. married or not, age of youngest child in the household)? (2) How does income vary by sex and schooling? These questions are directly related to parts of the course in "Labour Market Theory and Policy", and the empirical results generated with the micro data are compared with what is expected from theory, and what is reported in published descriptive statistics. Moreover, students become more and more familiar with the data set, and with using SHAZAM for transforming the raw data given into the variables needed for the statistical anlaysis.
At the end of this part of the course it should be obvious to students that descriptive analysis can only be part of empirical research, because we need a way that allows us to test theoretical hypotheses, and to perform analysis ceteris paribus. This motivates an intoduction to multiple linear regression analysis. Given that all students had courses in statistics(including some material on the simple linear regression model) but only a few took an introductory econometrics course, I allocate two sessions to some kind of "child's guide to the multiple linear regression model" based on chapters 5 - 10 of GRIFFITHS, HILL and JUDGE (1993). Topics include: the simple and mulitple linear regression model; estimation of parameters and significance testing; goodness of fit; functional form; multicollinearity and the dummy variable trap; consequences of the exclusion of relevant variables vs. inclusion of irrelevant variables.
Students are now in a position to estimate earnings functions using the ALLBUS data and SHAZAM. During the next four sessions we start with a simple schooling function (explaining differentials in the log of earnings by different years of schooling) familiar to students from the labour market course. The model is estimated using data for full-time blue and white collar workers. We interpret the results produced by SHAZAM and compare them to the published results.
Next the model is augmented by the inclusion of an experience variable plus its squared value (i.e., a MINCER-function), and we discuss the consequences of inclusion of another relevant variable, and the inversely u-shaped experience-earnings profile familiar from the textbook. Afterwards, several variants of a "MINCER-plus type" model can be estimated that included further variables, e.g., hours worked, family status, number of children in the household, working in the public sector, dummy for white collar workers, information on unemployment periods in the past, etc..
Results may be compared, and related to theory and results from other studies. We end up with a "preferred specification", and use the results from this model to compute expected incomes for various types of workers to see how much it pays (or costs) to have one more year of schooling, to get married, to work in the public sector, etc.
The last session is a look back and a look ahead, i.e., a summary of what has been done in the course, and a sketch of what else could be done in a longer course - industry and regional wage differentials, compensating wage differentials for unpleasant and hazardous jobs, male/female wage differentials, problems of testing for the correct model specification, problems related to extreme observations and robust estimators, etc. Here the central aim is to warn students to start empirical work on their own straight ahead without taking an econometrics course before, and to motivate them to do this.
My experience with this course is quite positive: Many students get some kind of "feeling" for how empirical researchis done, how fragile the results often are, and that published results should be interpreted carefully unless they are replicated with various data sets and models. Some of them, hopefully, are even motivated to eventually do their own empirical research as part of their diploma thesis. Therefore, I recommend to colleagues to try a similar course: given todays cheap powerful PCs, easily available real world public use micro data sets, and easy-to-handle econometrics software, empirical economics should no longer consist solely of reading what others did - it should be learning by doing, too.