2.2 Summary of the structural issues
In a widely influential paper, Walstad (2001) summarises several major factors that impact on the discipline’s undergraduate assessment. It is these factors that are appropriated here to structure the general issues raised by assessment. The following concerns are particularly pivotal: test selection; written versus oral assignments; grade evaluation; opportunities for self-assessment and feedback; testing for higher ordered thinking; and psychology of the economics student.
2.2.1 Test selection
A cursory sweep of standard assessment methods would find reference to the following forms: essays; short answers; numerical problems; multiple-choice; and true-false. There are numerous criteria that then can be adopted to determine the preferred option. Whilst it should be noted that these will be rated differently by the individual assessor, the issues presented in Figure 1 can be considered carefully when designing the most appropriate assessment methods. This is particularly important as there may be conflict between specific criteria.
Figure 1: Choosing a type of assessment
Here, to introduce these key issues, we compare the advantages and disadvantages associated with two common assessment strategies: the multiple-choice examination and the essay-based examination. It should be noted that this choice does not reflect some pedagogical preference for multiple-choice examinations over alternative methods such as short problem-solving questions, as ably discussed further in Walstad (2001), which can provide an excellent means to assess a student’s understanding of economic analysis. Instead, this reflects the choice of our Level 1 Principles of Economics case study where we assess how shifting from a common framework of a combination of multiple-choice and essay assessment to a greater focus on continuous assessment impacts on student achievement.
a) Multiple-choice assessment
Arguably, in this case many a lecturer will be particularly motivated by the ease of the assessment construction. The economy of scoring merely reinforces the convenience of multiple-choice testing, especially when one considers that ‘Principles of Economics’ modules tend to be relatively large. In addition to these two quite persuasive influences that favour the use of multiple-choice assessment, the possibility of subjective grading is removed and therefore students are not left questioning the rationale behind their final mark. Certainly, this simple means of diminishing the possibility of student grievance is a benefit that cannot be underestimated; however it is not a justification for overuse. Unfortunately, the cumulative level of expediency this form of assessment offers, has led to a staple and unquestioning reliance on multiple-choice testing in situations where other variations of evaluation may be equally, if not more, appropriate.
Because of the intricacy involved when measuring the propriety of its use, a couple of issues associated with the use of multiple-choice assessment are considered in more detail below, these being the frequency of assessment and the use of negative marking:
Misuse (overuse and inappropriate use) has exposed the multiple-choice method of assessment to the possibility of criticism in a variety of forms, the most common of which is that it is a crude instrument of assessment. This would suggest that, not only should it not be the sole means of evaluating student knowledge, neither should it operate as the primary assessment mechanism. Moreover, when it is used there should be meticulous consideration of how its educational value can be maximised. A traditional, annual, multiple-choice exam, whilst providing flexibility to the instructor in terms of allowing a broader coverage of material considered in the lectures, provides limited direct feedback. It also encourages the kind of mnemonically driven learning pathways that can hinder a more flexible and creative response to the material. However, an innovation by Kelley (1973) does offer a system of frequent multiple-choice testing designed to give detailed feedback in a large lecture context. This has reinvigorated the role of multiple-choice and made it particularly attractive as advancements in technology, particularly clickers, facilitate an interactive environment in which student responses can be immediately accessed. Enabling instantaneous feedback and offering opportunities for students to respond to and learn from their mistakes is considered by many leading voices to be crucial for student perceptions of their own learning experience. In relation to this, Light (1992) reports on the types of courses students appreciate (or in his terminology, ‘respect’) and which they feel they are learning most from. With the bonus of unique numbers assigned to individual clickers this revived assessment method can also be used in conjunction with other more practical departmental demands such as attendance monitoring.
ii. Negative marking
Multiple-choice testing is of course notoriously plagued by the nature of gambling odds. Considering the positive opportunities presented by guessing, students can potentially derive marks despite possessing a weak knowledge of the subject material. ‘Negative marking’ is one response to this inherent problem, as a system which punishes incorrect answers to deter guessing. However, such apparent solutions create problems in themselves. Consider, for example, this instance of student feedback obtained by the authors in response to a survey of student opinion:
‘I do not consider negative marking to be fair, with regard to essays that are positively marked… students may not give an answer to a question they are reasonably sure of the answer because they are afraid of getting it wrong and losing marks they have already gained…All the students I have spoken to do not like negative marking as it means the time is a greater constraint as it puts extra pressure on each answer… and mistakes are a greater threat.’
It is more difficult to counteract this psychological effect of negative marking which can actually impinge upon and inhibit the instinctive thought processes of certain student groups. The fact that, for example, different (often unfairly lower) marks will be generated for groups who are on average risk averse and less likely to answer questions deemed to be difficult, introduces the potential for issues of discrimination.
b. Essay-based assessment
The opportunities for guessing discussed above can also be considered in a different context. Particularly effective when constructing a short answer, there can be a tendency in the more confident student to ‘bluff’ knowledge. Unsurprisingly, therefore, to counteract such strategy, lecturers will tend to prefer setting more involved essay questions that will isolate the genuinely studious from the opportunist. Whilst short answers and multiple-choice can be carefully designed to test both comprehension and analytical skills, there is a widespread tendency to view essays as a more versatile and accurate method to measure higher levels of cognitive learning. In terms of Bloom et al.’s six levels of learning, the multiple-choice exam – whilst focused on knowledge and comprehension – can also be carefully designed to partially test application, analysis and evaluation. In contrast, Walstad (2006) summarises how essay questions can successfully cover all levels of learning:
‘An essay question challenges students to select, organize, and integrate economics material to construct a response—all features of synthesis. An essay question is also better for testing complex achievement related to the application of concepts, analysis of problems, or evaluation of decisions. This demonstration of complex achievement and synthesis is said to be of such importance as a learning objective that it is used to justify the extra time and energy required by the instructor for grading essay tests.’
There are, however, numerous pitfalls that should be considered before blindly accepting the essay as the ultimate testing method. Other than the additional pressures on staff time, these include:
i. Unreliability of grading
Questions will not necessarily enable the student to adequately demonstrate the genuine level of their achievement, or facilitate their expression of what they know. Consider, for example, the question ‘How does the monopoly union model compare with the other models of union activity?’ This structure should be considered indistinct on two levels. In the first instance, it is unclear how many models the student is expected to consider. Secondly, the language does not convey the economic criteria that should be used in any comparison being made. That a good student should accurately discern the terminology of an imprecise or ambiguous question is a fallacy, as excellent students are as prone as any to fall into the (mis)interpretation trap. In fact it is fairer to assume that all students regardless of ability are inherently disadvantaged by the chasm between what the examiner expects and yet so often fails to articulate, and what they select as relevant under pressure.
The reaction to problems generated through ambiguity, however, should not necessarily involve being overly precise in the vocabulary used in examination questions. A question such as ‘Does the monopoly union model or the XXX model better describe the UK car industry in the 1980s?’ avoids question ambiguity but arguably becomes a matter of rote learning that allows for insufficient testing of higher order skills or independent reading. Instead pre-examination guidance becomes crucial. The student should appreciate that there is no unique means to rank the relevance of specific economic models. Students who have shown more initiative in their independent reading will then have more to discuss and therefore greater means to demonstrate in-depth knowledge and their ability to meet the module’s learning outcomes.
Providing detailed student feedback for essays is inevitably time consuming. There is, and perhaps should not be, any escape from this fact. Indeed most available solutions to the interminable issue of time are inadequate. For instance, team-based marking introduced in order to ensure that temporal demands are minimised, has the potential to interfere substantially with the reliability of the resultant scoring.
iii. Coverage of content
It has been mentioned previously that the multiple-choice test can encompass a much wider range of taught material. In contrast, essay based assessment necessarily encourages uneven content coverage. As essays are generally used in examination periods, this can inadvertently endorse an alternative type of success by chance, where the fortunate students correctly guess which particular subset of module content to revise. This can be particularly prevalent when students are pressured, facing frequent examinations in a short period of time. In these circumstances, anticipating examination content can become a game of chance arising as a result of time constraints with inaccurate prediction of revision topics potentially resulting in dramatic reductions in module marks. Clearly, the structure of degree schemes can impact upon this, with schemes involving a clustering of assessment across a range of modules causing particular problems.
The basic conclusion from this brief description is that there is no single assessment method that is ideal in every respect in all circumstances. The available research does not conclude that any one assessment method is somehow superior in the teaching of economics. All have both advantages and disadvantages, and a combination of assessment techniques must be recommended in order to ensure a system that is, at least, approaching fair. The issue of assessment therefore begins to pivot around programme level variation. Whilst the extensive use of multiple-choice testing in principles of economics courses can be particularly understood, other modules must be flexible and explore the appropriateness of alternative methods. It is only through such variation in assessment practices that students will be able to maximise their performance and fully embrace their learning experience. It is imperative to aspire to such a system for, as has been alluded to previously, it is poor assessment performance that directly influences comparably poor evaluation of the instructor, institution and discipline.
2.2.2 Written versus oral assessment
For certain students, essays can hamper rather than assist self-expression. In response to this, lecturers should consider other methods that allow students more flexibility over how they express their views and critique economic orthodoxy. For example, it is now standard for economics to offer dissertation modules. Such modules allow students to explore more profound levels of writing, where critique becomes the core objective, and time facilitates a sophisticated textual response to the material. Arguably of equal significance, is consideration of how the assessment interacts with learning support material. The ‘problem set’, for example, provides an alternative assessment method that is regularly employed in economics teaching. The main advantage is the clear means it provides to direct the student, as such assessment is typically tied to textbook. However, the dissertation module permits the student to shift away from this comfortable environment and strike out alone into research, as the textbook is rightly sidelined and the student is able to embrace a more eclectic and wide ranging set of economic sources.
To demonstrate a skill set in economics, writing proficiency is clearly vital. However, it should also be noted that a mastery of economics will also encompass a proficiency in speaking skills: ‘speaking ability may be more useful for students because they are more likely to have to speak about economic issues than write about them’ (Walstad, 2001). A similar argument is present in Smith’s (1998) reference to ‘learning by speaking’ with reference to the teaching of statistics. In response to this need, a plethora of approaches is available. Assuming class size is not an issue, case studies can be used promote the oral exploration of economic ideas. Alternatively, but often unpopular with students due to fears of free riding behaviour, are the use of group presentations. This becomes more straight-forward in a business school environment, where economists can utilise their multidisciplinary skills set and be ably supported by business students who are focused on more specific disciplines such as accountancy, marketing and entrepreneurship. The group presentation also minimises the costs to the lecturer in terms of student evaluation. However, careful thought is advised. Siegfried (1998) in a review of US provision in the 1990s, for example, concluded ‘the amount and type of student writing assignments and oral presentations in many programs not only fail to prepare students for the demands they will encounter after graduation, but they also limit the ability of students to demonstrate their mastery of economics while still in college’. (p.67).
It could be argued that the potential of oral assessment has been reinvigorated by the introduction of interactive engagement. For example, more spontaneous activities such as ‘think-pair-share’, where discussion points can be offered by the instructor in order to ignite immediate discussion and critical reasoning. It is interactive engagement which is more effective in generating fundamental conceptual understanding. It also provides an opportunity to develop valuable, quick-thinking, life-skills for the world beyond academia.
2.2.3 Grade evaluation
Once the type of assessment has been determined, perhaps the most time consuming aspect for the economics instructor is the determination of marking criteria. This in itself should not be treated as a unidirectional issue. Being able to describe complex ideas in a short period of time will encourage a specific marking criterion that also celebrates the in-class test. In contrast, if the objective is to enable a more flexible approach to a wider range of topics then reports or presentations may be more apposite.
Perhaps of greater importance is how grading can be used to fully appreciate the student’s economic proficiencies; a particularly pertinent issue for maximising employability opportunities. A reaction, as championed by Walstad (2001), is the use of portfolio assessment. This involves a representative collection of work that more comprehensively displays a student’s progress-in-learning and achievement. Such a study compilation necessarily lends itself to an increased variation in assessment methodology. For example, reflective learning exercises can be regularly employed, as described below in the ‘Topics in Contemporary Economics’ case study presented below. Assessment timings should also be considered, with any example of exit velocity in beginning-of-course and end-of-course marks providing further means to advertise improvements in the student’s ability.
2.2.4 Self-assessment and feedback
It is common within the higher education sector to ensure a rigid separation between the formative and summative elements of assessment. With the chapter’s focus on summative assessment, a clear motivation of assessment becomes the grading of students and therefore the justification for the degree classifications at graduation. However, its practices should not be limited to this core aim:
a. Frequency (again)
Frequent assessment provides a greater opportunity for students to assess their own progress. As mentioned earlier, this is offered in the seminar systems that are typically adopted in principles of economics courses. Regular small multiple-choice examinations, rather than one large exam during key examination periods, afford a means for students to familiarise themselves with their development and adapt accordingly. This is neatly demonstrated by the evolution in undergraduate provision at Swansea University, as summarised in Case Study 1.
Frequent assessment also provides invaluable feedback to instructors, who can respond immediately to the exposed needs by adapting their lecture line-up to augment the effectiveness of their teaching. There are of course also more involved means to provide these opportunities. Walstad (2001), for example, refers to students being asked to keep a journal on current economic events. This can open a continuous dialogue between staff and student that also presents a fertile source for more fluid opportunities for assessment. The use of ‘reflective diaries’ provides a means to further enhance this dialogue and offers more innovative means to assess student progress.
b. Scoring key
Assessment must be seen as more than just testing or grading acquired knowledge with a numerical signifier. Therefore the instructor should always ensure that feedback mechanisms are integral and carefully constructed. By creating a feedback sheet that combines a scoring key with detailed comment, the student is more likely to find feedback valuable at the same time as understanding exactly how the subsequent mark is derived. It is also imperative that the nature of this feedback sheet should be discussed prior to any assessment deadlines to ensure that students are less likely to fall foul of any common pitfalls. Appreciating what is expected is itself part of the learning process, as riddling assessment with snares does not necessarily isolate the gifted. Such discussion about the nature of the feedback sheet also assists instructors in avoiding grading bias, alerts them to the possible ambiguities/misinterpretation of the question and facilitates a simple and direct means to justify the differences in grade margins.
2.2.5 Testing higher ordered thinking: retention
A potentially puzzling result exposed by previous research is the doubt raised over the long-term impact of economics instruction. Stigler (1963), an early critic of teaching in principles courses, posited that if an essay test on current economic problems was given to graduates five years after attending a university, there would be no difference in performance between alumni who had taken a ‘conventional’ one-year course in economics and those who had never taken a course in economics. Trials testing the ‘Stigler hypothesis’ are mixed. For instance, Walstad and Allgood (1999) find that those who possess a background in economics will outperform their non-economic counterparts, but also find that the overall difference in test scores is relatively small.
Given this potential problem posed by knowledge deterioration, testing for higher ordered thinking becomes a vital element of any assessment strategy. Progression is crucial, and this necessitates early assessment that measures whether basic concepts have been mastered, followed by subsequent analysis that is focused on demonstrating whether students have revealed themselves to be ‘thinking like an economist’. This process advocates and reiterates the original work by Hamlin and Janssen (1987), that assessment should be constructed to encourage, if not ensure, ‘active learners’. Such philosophy is founded on the premise that, when students are asked to write in conjunction with reading lecture materials, they are more likely to have a deeper understanding of the concepts and connections between theory and economic outcome. Crowe and Youga (1986), for example, advocate the use of short writing assignments (typically up to 10 minutes written in the lecture room).
2.2.6 Psychology of the economics student
Behavioural economics provides a means to appreciate the theoretical limitations of the assumption of ‘rational economic man’. By referring to the behaviour of the student, it also offers considerable potential for improving the learning experience. One example of this can be found in Rabin (1998), who explores how concepts from behavioural economics can be employed to illuminate weaknesses in the student outlook (such as low attendance and the determinants of poor test performance through inadequate preparation). Allgood (2001) goes beyond the discursive and constructs a utility maximising model based on achieving target grades. Once the grade threshold is achieved, effort falls. This can help us in the appreciation of low attendance or why course innovations may not necessarily lead to improvements in results. There are, however, further lessons to be learnt for assessment practices.
One assessment method which is arguably underused is the application of experimental economics to take advantage of the classroom as forum for competitive economic gaming. This recent innovation has been encouraged by the development of key resources such as Bergstrom and Miller (1997). The underutilisation can be partially explained by the belief that these games, rather than providing assessment opportunities, represent a means to introduce motivational and pedagogical exercises. The first problematic issue is that these mechanisms are arguably reliant on technology and that without that technology finalising grading can be prohibitive. The second difficulty is that there are inherent equity issues raised by generating marks through competitive games. Thirdly, experimental economists have voiced the need for cash payments to ensure clear motivation behind student behaviour. Whilst the higher fees that will be paid by students may offer further opportunities to introduce such activities, the expense involved will not be attractive to departmental heads.
Considering the limited duration of the knowledge instilled by economics teaching, there are also opportunities, through the careful design of assessment, to celebrate and reaffirm the skills that the discipline of economics develops. Walstad (2001), for example, notes how the psychology of investors in the stock market can be used to encourage a perception that economics teaching is an investment rather than a consumption good.
In a US data study between 1995 and 2005, Schaur et al. (2012) evaluate the factors determining the choice of assessment methods for US universities. As would be expected, variables such as class size and staff teaching loads are significant determinants of the preference for essays and longer written forms of assessment. Despite these obvious reasons for resisting other methods of assessment the primary goal of any testing should still be to motivate students to think, and therefore, write like economists. Whilst it is vital that an economics programme should utilise a wide range of assessment methods, there are also problems that occur if the bias leans towards the other end of the assessment spectrum. Complete reliance on highly structured tests will not adequately develop the tools required to think and write like an economist either. Ultimately these shorter tests will fail to suitably challenge students and therefore restrict module performances to processes of basic recall and ‘brain-training’. The need for an alternative to both of these extremes of assessment, that unites only their positive aspects, is virtually palpable and it is posited that the case study in the next section may offer one such hybrid solution. The study describes how data analysis can be combined with literature review methodology in a manner that ensures delivery of the required cognitive skills is at the centre of the assessment experience.