An alternative to problem sets coursework: using hand-written annotations, students become markers
In the end of 2023 calendar year, as a module leader for a large (about 400 students) intermediate-level Microeconomics module at the University of Birmingham I was facing several challenges when setting the in-term assessment (25% coursework) for my students.
The advent of Generative AI ("GenAI") technology (in addition to the existing strategies of essay mills, collusion, and other unethical behaviour) made the conventional assessments, such as essays and problem sets, even more vulnerable to integrity concerns. One of the possible answers, as described by Arico (2021), is a viva voce (oral examination). However, this option was not open to me given the institutional quality assurance deadlines to change the type of assessment on a core module; equally I was not convinced of being able to conduct the required number of vivas. Perhaps more importantly, students were not prepared for oral examinations, having little to no experience of this type of assessment. This would raise concerns of equity, fairness, and impact on mental health.
In the short term, I had to implement the assessments as stated in the module description (coursework 25%, test 25%, final examination 50%). In private conversations, many students reported that, due to the COVID-19 pandemic lockdowns, they were not comfortable with closed-book examinations due to lack of previous experience. In addition to pedagogical concerns about the benefits of this type of assessments (see review and evidence by French et al., 2023 or Williams, 2006), the lack of practice signified an even lower than usual (that is, relative to 5 or more years ago) understanding of how to answer a problem-set based exam. This included how to manage time, resilience, level and detail of the explanations required, etc. While we provide practice and formative feedback in seminars and mock papers, these concerns are added to the already significant list of pre-existing ones. International students, coming from different educational practices may not be quite aware of the importance of providing explanations and steps (rather than just the final answer); unfamiliar requirements are more stressful for home and overseas students with mental health concerns (which have also been exacerbated by the lockdowns and uncertainty).
Noting that there could be arguments in favour of subjecting students to tests with a controlled degree of uncertainty at least at the lower levels of Bloom’s taxonomy (including to test resilience and practical, rather than discipline problem-solving, skills) as summarised in the review by Bengtsson (2019) or Roediger (2011), this was not my aim for second year Microeconomics module, with a very heterogenous student body and students anxious to perform well on their first assessment that counts towards the final degree classification. Thus, I decided to reflect on what would be the most useful skills (beyond the knowledge of microeconomics) for my students to practice in a smaller-stakes coursework.
Together with the GenAI-generated issues of written text, diagrams, and code being vulnerable to knowingly or unwittingly unethical practices, I have adopted a more innovative approach to (part of) coursework assessment, by asking students to step in the shoes of a marker. Aside of integrity concerns, in an environment where content is more easily generated, criticality towards any information is becoming a crucial transferable skill. GenAI tools are and will produce content more easily; verifying the information with precision, rigour, and attention to detail should be a valuable capability for any future-ready graduate.
Thus, I have asked students to mark an answer of a hypothetical student studying at the same level as them, that I have prepared. Students had to use hand-written annotations to identify errors, areas of improvement, and areas of good practice; where necessary, they had to provide corrections; finally, they had to provide a revision plan for the hypothetical student they were marking. This approach enabled me to give a better understanding to students concerning how an exam is marked, building their confidence.
To ask students to review & mark an answer to a problem set using hand-written annotations. This answer, presented as an answer of a hypothetical student, is prepared by the module leader, so that it includes carefully curated errors and areas of improvement for students to identify and correct.
Benefits: (i) at the time of writing, the majority of easily accessible GenAI tools were not able to provide hand-written annotations on a hand-written/drawn picture, (ii) provided students with better insight as to how to view their own exam answers and thus improve their understanding of marking criteria, and (iii) train students in a more critically rigorous approach to written information – a skill crucial in an environment of GenAI and information overload.
As this was a novel form of assessment, I have scheduled additional support and Q&A sessions: a one-hour additional session where I have gone through an example, by annotating, providing corrections, etc. to another example, taking questions at the end; and another shorter 20-30 mins Q&A as part of a lecture.
I have prepared a problem set, in style and level of difficulty comparable to examination (and seminar) questions. The problem set contained three sub-questions, on optimisation of insurance-buying for a defined risk; the first question didn’t require any calculations; the second required a simple explanation and calculation; and the third was the optimisation question. I have also prepared the answer of the hypothetical student, which was hand-written with a stylus on a tablet. The series of pictures with this answer were included in the assessment remit. The answer contained required and superfluous diagrams, partially labelled; a relevant, but poorly explained real-life example, correct and incorrect calculations with and without explanations. The marking criteria were provided to match the three tasks, broadly: (i) identifying of all errors and omissions; (ii) providing necessary corrections, and (iii) providing an appropriate revision plan targeting the weaker areas of the hypothetical student. The feedback from internal and external moderators was cautiously positive.
To meet the first marking criteria students had to provide hand-written annotations, either as a picture or scan of a printed and annotated paper, or using a stylus with tablet. To meet the second criteria students had to include the appropriate corrections, for errors, omissions and unclear or underdeveloped answers. To meet the third criteria students had to propose a detailed revision plan, which had to include both suggestions of content revision (references to our VLE or textbook were accepted, as well as any other material) and suggestions concerning the skills (time management, drawing of diagrams, accuracy of calculations) that the hypothetical student should pay particular attention to. Aside from the institutional cover sheet which is the same for all assessment submissions, there were no particular requirements of formatting or presentation; the only one was to submit a PDF file.
When I have presented this task to students in detail and with the use of an example in the first assessment support session, there were a few practical questions: how to save in a PDF format, what is the required number of references, can annotations be typed or written in different colours, etc., I had a three questions (out of about 400 students) when students said that they were not able to read my hand-writing - I have asked them to behave as a marker would in a similar situation. Overall, no major misunderstandings have occurred. Students have approached the task correctly: in the future, I will provide some more technical guidance (how to scan/save, more examples of annotations, etc.).
Marking and feedback for students
I have found the marking to be quicker and easier than marking a usual problem set. A more visual representation (annotations) made it easier for me to identify the key aspects of the answer; I also felt that students’ answers were more personal on many occasions – some gave the hypothetical student a name or addressed them directly, sometimes the tone was quite harsh or, on the contrary, more supportive than just a bland answer. Of course, not all students fully “bought in” to this assessment type; equally, this approach cannot be generalised to all assessments. However, it can provide some variety.
As I was marking all the assessments (about 370 students, excluding late submissions, etc.) within 15 required days, my individual feedback was mostly limited to the rubric and the marking criteria. However, I have also provided video recordings of about one hour to go through the answer and highlight the common areas of improvement. One such common area was that students forgot to request explanations: on one of the sub-parts the answer of the hypothetical student didn’t have any explanations and very few steps (including an error); however, the final numerical answer was correct. I have underlined in my feedback to students that such an answer wouldn’t receive full grades – this made my feedback clearly forward-looking, as it led to an advice concerning their approach of the final exam.
I have also encouraged students to contact me directly to ask for additional feedback, and a few did. My impression is that there were fewer questions and misunderstandings concerning the feedback than on other occasions (but this may be a cohort or time-specific effect).
The key to successful — that is, efficient — testing is in the type of the errors introduced in the answer of the hypothetical student. To examine both the in-depth understanding and the rigour of the critical analysis, different types are useful:
- Superficial errors, such as mistakes in simple mathematical calculations, typos, lack of clear labelling in the diagrams, etc.
- Errors relating to explanations, from lack of explanations where they should be provided to confusing explanations to incorrect explanations or inappropriate examples.
- Errors relating to methodology, from the wrong choice of method to wrong application of the method.
- Errors relating specifically to diagrams, from incorrect or confusing labelling to wrong illustrations.
Notably, the areas of improvement for the hypothetical students can include direct errors, but also omissions and perhaps even more interestingly irrelevant or less relevant answers. Identifying the latter should help students to consider the importance of being concise and manage time efficiently, as well as show the importance of addressing the question.
Final reflections and ad hoc feedback from students
As expected, students’ first reaction to a different type of assessment was a mix of confusion and concern. They have reported that they were afraid of not doing well on an assessment that they haven’t attempted before, however, some students have found the novelty interesting. This initial scepticism eventually gave way to a better understanding of my goal of giving them exam practice. This was visible in the questions and discussions I had with students during the revision period before the exam: there was much less confusion as to what level of detail is expected in the answers and overall understanding of how marking will be done. While this is only my individual perception of a limited number of interactions (I estimate it at about 30 individual students), I consider this to be promising at this stage.
I believe that providing a detailed example and a dedicated assessment support session was essential in mitigating the feeling of uncertainty of students. Students were given a better chance to understand the marking process; while such an exercise could be given as a formative assessment, I consider that it is better implemented as a low-stakes summative, to give everyone, especially the less engaged students, a chance to practice. Finally, I believe that this assessment helps students to practice their critical skills in a very direct and visible way, by identifying errors and omissions, and providing corrections on a seemingly coherent answer.
Aricò, F. R. (2021). Evaluative conversations: unlocking the power of viva voce assessment for undergraduate students. Assessment and Feedback in a Post-Pandemic Era: A Time for Learning and Inclusion, pp 47-56.
Bengtsson, L. (2019). Take-home exams in higher education: A systematic review. Education Sciences, 9(4), 267. https://doi.org/10.3390/educsci9040267
French, S., Dickerson, A., & Mulder, R. A. (2023). A review of the benefits and drawbacks of high-stakes final examinations in higher education. Higher Education, pp 1-26. https://doi.org/10.1007/s10734-023-01148-z
Roediger III, H. L., Putnam, A. L., & Smith, M. A. (2011). Ten benefits of testing and their applications to educational practice. Psychology of learning and motivation, 55, 1-36. https://doi.org/10.1016/B978-0-12-387691-1.00001-6
Williams, J. B. (2006). The place of the closed book, invigilated final examination in a knowledge economy. Educational Media International, 43(2), 107-119. https://doi.org/10.1080/09523980500237864Back to top