3.6 Data issues

The other task that must be tackled at an early stage is data hunting. Students embarking on empirical work – probably for the first time – almost always have over-optimistic views of the data that are likely to be available. Perhaps a student has been to a course in development economics that has stressed the importance of human capital formation in stimulating improvements in agricultural productivity. An interesting project might be to examine the effect of primary schooling on agricultural productivity in rural Zanzibar. Or to examine the effect of overseas assistance on the provision of health care in Papua New Guinea. Panic then sets in when it transpires that, with only a few weeks remaining, there are no data to be found.

Again, this is partly a question of managing student expectations – and of getting students to hunt for their data as early as possible.

Of course, there is a time inconsistency problem here. We tell the students that they must look for data as soon as possible… but we also tell them that they should think about the underlying economics of their topic first, in order that they know what data they will require. Without this proviso, the danger of data-mining is high. Students told to look for data early may well see what they can find, run a few regressions and then see if they can find a theory that will match their results.

There is a lot of data readily available on the internet. This brings good and bad news. The good news is that there are more data accessible on a wide range of economic topics that students can readily obtain. This expands the range of topics on which they can undertake empirical work – and they are aided and abetted in this by the software at their disposal to enable them to produce lots of results. The bad news is that the scope for doing foolish things and getting nonsense results is also much expanded. The ease of use of today’s software makes it very easy to produce results that go way beyond the competence and understanding of the students. Indeed, a key part of the supervisor’s role may be to rein in the over-enthusiastic student to ensure that the work undertaken is appropriate for the topic being investigated, and the reasonable ambition of the student given knowledge and understanding of statistical and/or econometric methodology. This reining in has to done in a sensitive way, so as not to discourage or dishearten. A fine line to tread.

Top Tip

Provide web links to the most relevant data sources.

Providing web links to key recommended data sources is wise. This can be accomplished through a dedicated dissertation webpage or VLE. The links can then be tailored to the needs of a particular cohort of students. There is also a helpful section on the Economics Network website that provides links to freely available data.

One obvious situation in which this can be an issue is where a student has received no training in econometrics, but has heard of ‘regression’ and perceives that no dissertation is complete without it. There may be some bright students out there who can teach themselves regression along the way and produce sensible results. But for every one such student, there are likely to be countless others who will be unable to produce coherent results. For the econometrically untrained, more modest objectives need to be set for the analysis of empirical data. However, the collection of data, and the marshalling of evidence in support (or not) of an hypothesis, is a central part of research in economics. In some cases, students may sign up for an optional course in econometrics for which they are ill-prepared. This has a doubly damaging impact, as they may fail the module as well as finding themselves no better off for the research.

Another pitfall is where a student with some econometric training collects data and runs some regressions, but is unable to produce results that are consistent with any known economic theory. Panic then sets in. Can economic theory really be so wrong? It takes confidence for a novice researcher to look at a set of seemingly meaningless results with equanimity. It may then be for the supervisor to reassure, and to point out how many possible explanations there are for seemingly contradictory results. Perhaps the data do not measure what the model demands. Perhaps a more sophisticated econometric methodology is required. Perhaps there are omitted variables. And so on. The student researcher may then need to be persuaded that it is perfectly OK to present weak results, so long as some awareness is shown that the analysis has limitations, and that there are many possible reasons for the seeming contradictions.

It is important to remind students of the key objective of the dissertation – namely, to showcase what they have assimilated during their degree programme. If they can show competence in applying economic analysis and (perhaps) econometric techniques in a topic area of their choice, then they are on their way to a reasonable mark. They will not be submitting their dissertation to Econometrica.

‘The secret of happiness lay in limiting the aspirations.’ Thomas Hardy in The Woodlanders.